OCR-weatherrescue benchmark comparison

The OCR-weatherrescue benchmark is a test dataset for document transcription systems. It contains 81 document images, each of a table of numbers, and quality-controlled transcriptions for each. Google Vision can be run on each of the images, and scored on its ability to reproduce the known results.

../_images/1898-021.png

Google Vision results for a sample month. Green blocks are entries sucessfully read by Textract. Filled red blocks are entries inacurately read, and hatched red blocks are entries missed altogether.

Summary

Of 59,136 entries:
  • 53,086 (90%) were read successfully
  • 3,241 (5%) were read inaccurately
  • 2,803 (5%) were missed altogether

In pure character reading accuracy, Google Vision is better than this suggests, but to extract the data successfully we need to link the characters into numbers and associate the numbers with their location in the table. Google Vision is bad at this - to get even 90% accuracy took quite some post-processing to correct GV’s incorrect clustering of the characters it read into words. This means that overall accuracy is not great.

But its speed advantage over manual transcription is enormous. Transcribing this dataset took the citizen science project weatherrescue.org many days of human effort, spread over weeks of elapsed time. Google took only a few minutes (and parallelising calls could reduce this to seconds).