Auto-transcription benchmark 1: Fort William pressuresΒΆ

This is a benchmark dataset for document transcription tools. It contains a set of digital images - each showing a table of printed numbers, from an old report - and a transcription of all the numbers (from the weatherrescue volunteer transcription project).

The objective is to facilitate development and testing of automated tools for document transcription. Any such tools can be run on this set of images, and their results validated by comparison to the existing transcription.

This dataset is distributed under the terms of the Open Government Licence.