Washington Database
Data Set
The Washington database was created from the George Washington Papers at the Library of Congress and has the following characteristics:
- 18th century
- English language
- two writers
- longhand script
- ink on paper
The original manuscript images [4] have already been used, for example, by Rath and Manmatha in [3]. The Washington database contains our own text line and word images alongside with their transcription. Altogether, the manuscript data is given by:
- binarized and normalized text line images
- binarized and normalized word images
The ground truth contains:
- transcription at line-level
- transcription at word-level
Statistics
The Washington database includes:
- 20 pages
- 656 text lines
- 4,894 word instances
- 1,471 word classes
- 82 letters
Download
If not already done, we ask you to register before downloading the database. Once registered, you can download the Washington database here:
The archive contains a README file with detailed information about the data formats used. We also provide the training, validation, and test set IDs that were used, for example, in [1] and [2].
Terms of Use
The Washington database may be used for non-commercial research and teaching purposes only. If you are publishing scientific work based on the Washington database, we request you to include a reference to our paper [1] A. Fischer, A. Keller, V. Frinken, and H. Bunke: "Lexicon-Free Handwritten Word Spotting Using Character HMMs," in Pattern Recognition Letters, Volume 33(7), pages 934-942, 2012.
References
Printed versions of the papers are linked by DOI. Additionally, we provide accepted preprint versions as PDFs. The preprints are intended for convenient online browsing only.
[1] A. Fischer, A. Keller, V. Frinken, and H. Bunke: "Lexicon-Free Handwritten Word Spotting Using Character HMMs," in Pattern Recognition Letters, Volume 33(7), pages 934-942, 2012. [doi] [pdf]
[2] V. Frinken, A. Fischer, R. Manmatha, and H. Bunke: "A Novel Word Spotting Method Based on Recurrent Neural Networks," in IEEE Trans. PAMI, Volume 34(2), pages 211-224, 2012. [doi] [pdf]
[3] T. M. Rath and R. Manmatha: "Word Spotting for Historical Documents," in Int. Journal on Document Analysis and Recognition, Volume 9, pages 139-152, 2007.
[4] George Washington Papers at the Library of Congress from 1741-1799, Series 2, Letterbook 1, pages 270-279 and 300-309