Washington Database

Data Set

The Washington database was created from the George Washington Papers at the Library of Congress and has the following characteristics:

18th century
English language
two writers
longhand script
ink on paper

The original manuscript images [4] have already been used, for example, by Rath and Manmatha in [3]. The Washington database contains our own text line and word images alongside with their transcription. Altogether, the manuscript data is given by:

binarized and normalized text line images
binarized and normalized word images

The ground truth contains:

transcription at line-level
transcription at word-level

Statistics

The Washington database includes:

20 pages
656 text lines
4,894 word instances
1,471 word classes
82 letters

Download

If not already done, we ask you to register before downloading the database. Once registered, you can download the Washington database here:

washingtondb-v1.0.zip

The archive contains a README file with detailed information about the data formats used. We also provide the training, validation, and test set IDs that were used, for example, in [1] and [2].

Terms of Use

The Washington database may be used for non-commercial research and teaching purposes only. If you are publishing scientific work based on the Washington database, we request you to include a reference to our paper [1] A. Fischer, A. Keller, V. Frinken, and H. Bunke: "Lexicon-Free Handwritten Word Spotting Using Character HMMs," in Pattern Recognition Letters, Volume 33(7), pages 934-942, 2012.

References

Printed versions of the papers are linked by DOI. Additionally, we provide accepted preprint versions as PDFs. The preprints are intended for convenient online browsing only.

[1] A. Fischer, A. Keller, V. Frinken, and H. Bunke: "Lexicon-Free Handwritten Word Spotting Using Character HMMs," in Pattern Recognition Letters, Volume 33(7), pages 934-942, 2012. [doi] [pdf]

[2] V. Frinken, A. Fischer, R. Manmatha, and H. Bunke: "A Novel Word Spotting Method Based on Recurrent Neural Networks," in IEEE Trans. PAMI, Volume 34(2), pages 211-224, 2012. [doi] [pdf]

[3] T. M. Rath and R. Manmatha: "Word Spotting for Historical Documents," in Int. Journal on Document Analysis and Recognition, Volume 9, pages 139-152, 2007.

[4] George Washington Papers at the Library of Congress from 1741-1799, Series 2, Letterbook 1, pages 270-279 and 300-309