This tool (eynollah) performs border, region and textline detection and scaling and enhancing from document image data and returns the results as [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML).
The goal of this project is to extract textlines of a document in order to feed them to an OCR model. This is achieved by four successive stages as follows:
This tool (eynollah) performs document layout analysis (segmentation) from document image data and returns the results as [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML).
It can currently detect the following layout classes:
* Border
* Textregion
* Image
* Textline
* Separator
* Marginalia
* Initial
The final goal is to feed the output to an OCR model.
The tool uses a combination of various models and heuristics: