XML File
All form, line and word images are provided as PNG files and the corresponding form label files, including segmentation information and a variety of estimated parameters (from the preprocessing steps described in [2]), are included in the image files as meta-information in XML format. An example of such an XML file is given here. Both the PNG and the XML file formats are supported by the Java 2 Standard Edition v1.4.
Using such an XML file and the corresponding image file, text lines or isolated words can be extracted or the provided information can be used to create the images to verify the automatic segmentation as shown below.
In addition to the line and word segmentation information, the above mentioned parameters are available in the XML file as well. For illustration the image below shows the bounding boxes for the connected components, the estimated slant angle as well as the detected reference lines.