IAM Online Document Database (IAMonDo-database)
Overview
The IAMonDo-database (short for IAM online document database) contains 941 online handwritten documents
acquired with a digital pen (Logitech IO2). The documents consists of text blocks, lists, tables, formulas, diagrams, drawings, and markings. Such pieces of content have been placed in arbitrary positions on each document. Ground truth information is provided down to the word level.
The digital ink, the annotation, and the collected meta data of the writers are stored in xml-files complying with the W3C standard InkML.
It can be used to train and test methods for handwritten text recognition, document layout analysis, document content identification, table recognition, marking detection, and lot more.
The database was first published in [InLiBu10] at the DAS 2010. We use the database in our own research to investigate text vs non-text distinction.
In the acquisition phase, a total of 189 writers have been asked to copy the content of a template to a sheet of paper (A4). The writing was recorded online by a digital pen. A total of 1000 templates are composed of text coming from the Brown corpus and of drawings, diagrams, and formulas provided by Wikimedia Commons and Wikipedia, respectively. Thereby a total of 200 different diagrams, 200 different drawings and 200 different formulas are used. The text for the text blocks, lists, and tables has been selected randomly from 11 categories of the Brown corpus.
After a document has been written, the writer has marked the document with a small set of marking elements including underlining of text, marking of text on one or multiple lines by an angle on the top left as a start mark, and an angle on the bottom right as an end mark, marking of text enclosing it in square brackets, marking of entire text lines by a vertical stroke on the right or left side of the text block, encircling, annotation of these markings by small text labels, and lines connecting these text labels with the markings. Of all documents a subset of 839 contain an average of 10 such marking elements.
Some of the documents have landscape orientation, others have portrait orientation. On some documents, separate text parts have a different orientation.
Statistics:
The database contains:
- 941 documents
- 68841 words
- 7616 text lines in text blocks
- 1478 text blocks
- 2068 list items
- 536 lists
- 2550 table cells
- 450 tables
- 5698 labels in diagrams
- 917 drawings in diagrams
- 910 diagrams,
- 546 drawing not part of diagrams
- 489 formulas
- 355,097 strokes
Download:
Before you download the IAMonDo-database we ask you to register so we are aware of who is using our
data. Once registered you can access the IAMOnDo-database here.
You can also download the database on the website of the IAPR-TC11.
Contact:
If you have any questions or suggestions, please use the contact form.
Software:
The database's documents are stored using the InkML standard. InkML standard is quite complex and therefor might be a barrier to access the document's content. To resolve this issue a software library (libinkml) has been released which implements the portion of InkML required to read the documents. This software can be used outside of the context of this database.
The software InkAnno - built on libinkml - was developed to simplify the handling of IAMonDo documents
even more. It implements the following functionality: displaying the documents, add and edit annotations, export into pdf/images/feature vectors, rotate and mirror the documents. It may serve as basis for further functionalities.
References
[InLiBu10] E. Indermühle, M. Liwicki, and H. Bunke: IAMonDo-database: an Online Handwritten Document Database with Non-uniform Contents. Proc 9th Int. Workshop on Document Analysis Systems, 2010.