Conclusions and recommendations¶
The case studies included here are of seven successful transcription projects. We can divide them into several different groupings:
- By funding structure: Two large commercial contracts (Royal Navy WW2 and East India Company), against three citizen science projects (oldWeather1&2, oldWeather3, and UK DWR stations), and one academic small project group (Marine expeditions).
- By difficulty of source: contrast Royal Navy WW2 and oldWeather1&2 (more modern, easy to read source documents) with East India Company and oldWeather3 (older, hard to read source documents).
- By type of source data: contrast UK DWR stations (land station records) with all the others (marine observations).
Cost matrix¶
Project | Date run | Observations rescued | Financial cost (per ob.) | Effort (per ob.) | Elapsed time (per ob.) |
---|---|---|---|---|---|
Royal Navy WW2 | 2005 - 2008 | 8,086,529 | £1 million (£0.12) | 16 person years (0.19 person minutes) | 3.5 years (0.23 minutes) |
East India Company | 2008 - 2011 | 614,583 | £1.1 million (£1.80) | 14 person years (2.2 person minutes) | 3.5 years (3 minutes) |
oldWeather1&2 | 2010-2012 | 7,132,659 | £100,000 (£0.014) | 21 person-years (0.3 person minutes) | 2 years (0.15 minutes) |
oldWeather3 | 2012-2018 | 5,200,000 | £110,000 (£0.02) | 62 person years (1.1 person minutes) | 6 years (0.6 minutes) |
UK DWR stations | 2017 - 2018 | 1,800,000 | £15,000 (£0.008) | 8 person years (0.5 person minutes) | 0.6 years (0.2 minutes) |
Marine expeditions | 2007 - present | 426,813 | N/A (N/A) | N/A (N/A) | N/A (N/A) |
Conclusions¶
For academic-scale projects, where transcription is a small part of the total work, including a transcriber in the project team works well.
Citizen science works works pretty much as well as large-scale commercial and has a much lower financial cost (contrast Royal Navy WW2 with oldWeather1&2, and East India Company with oldWeather3). It’s also encouraging that citizen science has been successful not only with ship’s logbooks (plenty of human interest) but also with the intrinsically less appealing data tables used for the UK DWR stations. A lot of the credit for the success of citizen science in this field is due to Zooniverse.
Participation rates in citizen science projects are sensitive to the difficulty of the exact task requested of the volunteers. Making the requested unit of work smaller - perhaps by presenting only a fragment of a page to be transcribed, rather than the whole thing, can increase participation a lot.
The speed, cost and efficiency of transcription depends most on the difficulty of the task: observations in hard-to-read older documents took several times as much time and effort to read as those in easier, more modern documents. (Contrast oldWeather1&2 with oldWeather3, and Royal Navy WW2 with East India Company).
Transcription is fundamentally slow - speeds in these projects vary from 6 observations/minute to 1 observation every 3 minutes. It also consumes a lot of work - from 0.2 to 2.2 person-minutes for each observation. This is the major current limitation: the number of observations remaining to be transcribed is unknown, but 1 billion (1,000,000,000) is a reasonable planning number. At typical rates shown above, this will take of order 100 years elapsed, (and 500 person-years effort). This is too long - we must go faster.
Document transcription is not a climate-specific problem - it is valuable to many fields. It’s worth trying to get more people working on transcription in general, perhaps by emphasising its importance to climate research.
Recommendations¶
For the smallest transcription tasks, just do it - get the transcription done by the project team.
For larger tasks, a citizen science project has a good chance of success - current best practice is exemplified by the UK DWR stations project. This can be excellent value for money, but don’t underestimate the effort required in managing and marketing the project.
To get the job done, we need to come up with a much faster technique than anything that has been tried so far. In principle, we could run a much bigger citizen science project (or a family of projects), but it’s not clear how to make them much bigger (so far we have not managed it) - we should encourage research in this area. An alternative approach is to come up with an automated system to do some or all of the job - we should encourage research in this area as well.