Parzival Database

Data Set

The Parzival database described in [4] contains a handwritten historical manuscript with following characteristics:

  • 13th century
  • Medieval German language
  • three writers
  • Gothic script
  • ink on parchment

The original manuscript [5] is housed at the Abbey Library of Saint Gall, Switzerland. The manuscript images as well as the transcriptions were made available on CD-ROM by the Parzival project of the German Language Institute of the University of Bern, Switzerland. For handwriting recognition, we have extracted text line and word images and aligned them with the transcription. Altogether, the manuscript data is given by:

  • page images (JPEG, 300dpi)
  • binarized and normalized text line images
  • binarized and normalized word images

The ground truth contains:

  • transcription at line-level
  • transcription at word-level

Statistics

The Parzival database includes:

  • 47 pages
  • 4,477 text lines
  • 23,478 word instances
  • 4,934 word classes
  • 93 letters

Download

If not already done, we ask you to register before downloading the database. Once registered, you can download the Parzival database here:

The archive contains a README file with detailed information about the data formats used. We also provide training, validation, and test set IDs for line recognition (used, for example, in [1] and [2]) as well as word recognition (used, for example, in [3] and [4]).

Terms of Use

The Parzival database may be used for non-commercial research and teaching purposes only. If you are publishing scientific work based on the Parzival database, we request you to include a reference to [1] A. Fischer, A. Keller, V. Frinken, and H. Bunke: "Lexicon-Free Handwritten Word Spotting Using Character HMMs," in Pattern Recognition Letters, Volume 33(7), pages 934-942, 2012.

With kind permission of Prof. Ernst Tremp from the Abbey Library of Saint Gall, the original manuscript images can be used for non-commercial research and teaching purposes explicitly as follows:

  • Show and print sample manuscript images in scientific publications
  • Show sample manuscript images during talks
  • Show sample manuscript images online

For any purposes other than non-commercial research and teaching, the Abbey Library of Saint Gall has to be contacted first.

References

Printed versions of the papers are linked by DOI. Additionally, we provide accepted preprint versions as PDFs. The preprints are intended for convenient online browsing only.

[1] A. Fischer, A. Keller, V. Frinken, and H. Bunke: "Lexicon-Free Handwritten Word Spotting Using Character HMMs," in Pattern Recognition Letters, Volume 33(7), pages 934-942, 2012. [doi] [pdf]

[2] A. Fischer, E. Indermühle, V. Frinken, and H. Bunke: "HMM-Based Alignment of Inaccurate Transcriptions for Historical Documents," in Proc. 11th Int. Conf. on Document Analysis and Recognition, pages 53-57, 2011. [doi] [pdf]

[3] A. Fischer, K. Riesen, and H. Bunke: "Graph Similarity Features for HMM-Based Handwriting Recognition in Historical Documents," in Proc. 12th Int. Conf. on Frontiers in Handwriting Recognition, pages 253-258, 2010. [doi] [pdf]

[4] A. Fischer, M. Wüthrich, M. Liwicki, V. Frinken, H. Bunke, G. Viehhauser, and M. Stolz: "Automatic Transcription of Handwritten Medieval Documents," in Proc. 15th Int. Conf. on Virtual Systems and Multimedia, pages 137–142, 2009. [doi] [pdf]

[5] Cod. 857, Abbey Library of Saint Gall, Switzerland

 

Document Actions