Transcription cost methodology

The perfect transcription methodology would be accurate, cheap, easy, and fast. Ideally we would measure each of these for each transcription project so we could compare their effectiveness. In practice, we don’t have precise measurements, but we can usually make a useful estimate.


Perhaps surprisingly, for most projects transcription accuracy is the least useful parameter to measure. It’s very diffficult to measure precisely, because we don’t have a known true value to compare project transcriptions against. Also, the manual transcription methods have very high accuracy - tests of the transcribed data quality show problems with the transcribed data, but those problems do not generally come from poor transcription, but from problems in the originaly-recorded observations. So the transcription accuracy is more-than-good-enough - improving transcription accuracy would not improve the resulting data quality much.

So for the manual projects we don’t give accuracy numbers. Unless otherwise mentioned, the accuracy is sufficiently good that there is little to gain by improving it.

Machine transcription approaches still have low accuracy - for those we are proposing benchmark assessments to measure accuracy precisely.

Counting the output

To intercompare transcription approaches, we need to count the project outputs in a consistent fashion. Here we are counting the number of observations produced, where one observation is a single measurement of dry-bulb temperature, wet-bulb temperature, sea-surface temperature, air pressure, wind speed, wind direction, or precipitation. (Some projects also transcribed other valuable data, this is not included in the count).


We estimate the total financial cost of each project to the sponsoring organisations - the total cost of staff time and contract expenses. This is only the transcription costs - costs for finding and imaging the source documents, or doing science with the output are not included. Dividing this cost by the number of observations produced gives the cost-per-observation.

Costs are reported in 2018 pounds sterling (£).


Citizen science projects, in particular, incur a large non-financial cost in the effort put in by the participants, and it’s important to minimize this. So we also estimate the effort required by the project in full-time-person years (FTPY). One FTPY is 200 8-hour working days: 1600 hours, or 96,000 minutes.

Dividing the total effort by the number of observations gives the effort per observation. We report this in minutes where there are 96,000 working minutes in the year (so it is an estimate of the actual time required for one observation). Effort-per-ob = n.FTPY*96000/n.obs.


A project that gets the job done in one year is probably better than one that takes 10 years, even if it is more expensive. We report the total duration of the project (in years) and divide this by the number of observations produced to get the elapsed-time-per observation (in minutes, and here 1 year = 365*24*60=525,600 minutes).