OCR-weatherrescue benchmark comparison¶
The OCR-weatherrescue benchmark is a test dataset for document transcription systems. It contains 81 document images, each of a table of numbers, and quality-controlled transcriptions for each. Google Vision can be run on each of the images, and scored on its ability to reproduce the known results.
Comparisons by month¶
1898 | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
1899 | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
1900 | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
1901 | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
1902 | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
1903 | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
1904 | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep |
Summary¶
- Of 59,136 entries:
- 53,086 (90%) were read successfully
- 3,241 (5%) were read inaccurately
- 2,803 (5%) were missed altogether
In pure character reading accuracy, Google Vision is better than this suggests, but to extract the data successfully we need to link the characters into numbers and associate the numbers with their location in the table. Google Vision is bad at this - to get even 90% accuracy took quite some post-processing to correct GV’s incorrect clustering of the characters it read into words. This means that overall accuracy is not great.
But its speed advantage over manual transcription is enormous. Transcribing this dataset took the citizen science project weatherrescue.org many days of human effort, spread over weeks of elapsed time. Google took only a few minutes (and parallelising calls could reduce this to seconds).