mirror of https://github.com/qurator-spk/eynollah.git synced 2025-08-02 14:49:54 +02:00

No description

Find a file

Clemens Neudecker 571dc84c3f README.md cleanup / restructuring		2022-03-28 13:15:35 +02:00
.circleci	set up circle ci	2020-11-23 14:02:34 +01:00
.github/workflows	ci: install test dependencies	2021-02-04 15:27:20 +01:00
qurator	Merge pull request #65 from mikegerber/fix/enhanced-message	2022-03-03 07:07:38 -05:00
tests	allow passing PIL image to Eynollah w/o disk I/O	2021-04-15 17:25:05 +02:00
.gitignore	📦 v0.0.3	2021-05-11 13:15:25 +02:00
CHANGELOG.md	📦 v0.0.11	2022-02-02 12:05:06 +01:00
LICENSE	extend setup.py, add Makefile, gitignore, requirements.txt	2020-11-20 17:48:06 +01:00
Makefile	do an actual test run	2021-02-04 15:21:14 +01:00
ocrd-tool.json	OCR-D CLI	2021-04-13 17:38:02 +02:00
README.md	README.md cleanup / restructuring	2022-03-28 13:15:35 +02:00
requirements-test.txt	🎨 reformat cli.py with black	2020-11-23 11:24:58 +01:00
requirements.txt	🐛 Fix ocrd core requirement	2021-04-22 20:06:31 +02:00
setup.py	📦 v0.0.2	2021-05-04 18:12:21 +02:00

README.md

Eynollah

Perform document layout analysis (segmentation) from image data and return the results as PAGE-XML.

Installation

pip install . or

pip install . -e for editable installation

Alternatively, you can also use make with these targets:

make install or

make install-dev for editable installation

Models

In order to run this tool you need trained models. You can download our pretrained models from qurator-data.de.

Alternatively, running make models will download and extract models to $(PWD)/models_eynollah.

Training

In case you want to train your own model to use with Eynollah, have a look at sbb_pixelwise_segmentation.

Usage

The command-line interface can be called like this:

eynollah \
-i <image file name> \
-o <directory to write output xml or enhanced image> \
-m <directory of models> \
-fl <if true, the tool will perform full layout analysis> \
-ae <if true, the tool will resize and enhance the image and produce the resulting image as output. The rescaled and enhanced image will be saved in output directory> \
-as <if true, the tool will check whether the document needs rescaling or not> \
-cl <if true, the tool will extract the contours of curved textlines instead of rectangle bounding boxes> \
-si <if a directory is given here, the tool will output image regions inside documents there> \
-sd <if a directory is given, deskewed image will be saved there> \
-sa <if a directory is given, all plots needed for documentation will be saved there> \
-tab <if true, this tool will try to detect tables> \
-ib <in general, eynollah uses RGB as input but if the input document is strongly dark, bright or for any other reason you can turn binarized input on. This option does not mean that you have to provide a binary image, otherwise this means that the tool itself will binarized the RGB input document> \
-ho <if true, this tool would ignore headers role in reading order detection> \
-sl <if a directory is given, plot of layout will be saved there> \
-ep <if true, the tool will be enabled to save desired plot. This should be true alongside with -sl, -sd, -sa , -si or -ae options>

The tool performs better with RGB images than greyscale/binarized images.

Additional documentation can be found in the wiki.