Conclusions and recommendations

The case studies included here are of seven successful transcription projects. We can divide them into several different groupings:

Cost matrix

Project Date run Observations rescued Financial cost (per ob.) Effort (per ob.) Elapsed time (per ob.)
Royal Navy WW2 2005 - 2008 8,086,529 £1 million (£0.12) 16 person years (0.19 person minutes) 3.5 years (0.23 minutes)
East India Company 2008 - 2011 614,583 £1.1 million (£1.80) 14 person years (2.2 person minutes) 3.5 years (3 minutes)
oldWeather1&2 2010-2012 7,132,659 £100,000 (£0.014) 21 person-years (0.3 person minutes) 2 years (0.15 minutes)
oldWeather3 2012-2018 5,200,000 £110,000 (£0.02) 62 person years (1.1 person minutes) 6 years (0.6 minutes)
UK DWR stations 2017 - 2018 1,800,000 £15,000 (£0.008) 8 person years (0.5 person minutes) 0.6 years (0.2 minutes)
Marine expeditions 2007 - present 426,813 N/A (N/A) N/A (N/A) N/A (N/A)

Methodology.

Conclusions

For academic-scale projects, where transcription is a small part of the total work, including a transcriber in the project team works well.

Citizen science works works pretty much as well as large-scale commercial and has a much lower financial cost (contrast Royal Navy WW2 with oldWeather1&2, and East India Company with oldWeather3). It’s also encouraging that citizen science has been successful not only with ship’s logbooks (plenty of human interest) but also with the intrinsically less appealing data tables used for the UK DWR stations. A lot of the credit for the success of citizen science in this field is due to Zooniverse.

Participation rates in citizen science projects are sensitive to the difficulty of the exact task requested of the volunteers. Making the requested unit of work smaller - perhaps by presenting only a fragment of a page to be transcribed, rather than the whole thing, can increase participation a lot.

The speed, cost and efficiency of transcription depends most on the difficulty of the task: observations in hard-to-read older documents took several times as much time and effort to read as those in easier, more modern documents. (Contrast oldWeather1&2 with oldWeather3, and Royal Navy WW2 with East India Company).

Transcription is fundamentally slow - speeds in these projects vary from 6 observations/minute to 1 observation every 3 minutes. It also consumes a lot of work - from 0.2 to 2.2 person-minutes for each observation. This is the major current limitation: the number of observations remaining to be transcribed is unknown, but 1 billion (1,000,000,000) is a reasonable planning number. At typical rates shown above, this will take of order 100 years elapsed, (and 500 person-years effort). This is too long - we must go faster.

Document transcription is not a climate-specific problem - it is valuable to many fields. It’s worth trying to get more people working on transcription in general, perhaps by emphasising its importance to climate research.

Recommendations

For the smallest transcription tasks, just do it - get the transcription done by the project team.

For larger tasks, a citizen science project has a good chance of success - current best practice is exemplified by the UK DWR stations project. This can be excellent value for money, but don’t underestimate the effort required in managing and marketing the project.

To get the job done, we need to come up with a much faster technique than anything that has been tried so far. In principle, we could run a much bigger citizen science project (or a family of projects), but it’s not clear how to make them much bigger (so far we have not managed it) - we should encourage research in this area. An alternative approach is to come up with an automated system to do some or all of the job - we should encourage research in this area as well.