Download IAMonDo-database

Register

If not already done, we ask you to register before downloading the database.

Download

The documents as well as the predefined subsets are packed into a zip file. Each file contain the digital ink data, the meta data of the writer and the ground truth of one document. The file name indicates the id of the documents. They are numbered from 001 to 999. Not all of these number are used, and there are some documents which are copied from the same template. To distinguish these documents small latin letters are appended to the id, as for example 001a.inkml. 

As of now, the database is available in version 1.0. In the future we intend to release following editions which might include better, more, or other ground truth. The content however will not change.

Version 1.0

IAMonDo-db-1.0.tar.gz
IAMonDo-db-1.0.zip

 

Example Task 

Text and Non-Text Distinction in Online Handwritten Documents

A system is to developed which may have to be trained on a set of documents referred to as training set. This system is able to decide for all strokes of a document if they are part of text or part of non-text content elements in the document. 

The fraction of strokes correctly classified by the system is the value of the stroke accuracy of this system. If the online documents are converted to an image, the fraction of pixels corresponding to the correctly classified strokes is the value of the pixel accuracy of this system. In this task a stroke is considered to be a text stroke if, at any hierarchical level, it is annotated as formula or text line. Other strokes are labelled as non-text strokes. In this task marking elements are ignored.

Evaluation Protocol

The dataset is split into 5 disjoint sets each consisting of approximately 200 documents. No two documents from different sets were created by the same writer. The sets are indexed from 0 to 4. They are defined by 5 files listing the names of the contained documents. The set files are named 0.set, 1.set, 2.set, 3.set, and 4.set

 Two different approaches to conduct experiments have been defined for this dataset: 

  1. Set 0 and 1 are used for the training, set 2 is used to validate system parameters, and set 3 is the test set.
  2. A 4-fold cross validation where sets (0 + i) and (1 + i mod 4) are used for training, set (2 + i mod 4) for validation, and set (3 + i mod 4) for testing, for i = 0, . . . , 3. 

Set 4 is used as an independent test set which should be used only once in a system.

Document Actions