diff --git a/docs/models.md b/docs/models.md index c6f7340..ac563b0 100644 --- a/docs/models.md +++ b/docs/models.md @@ -16,8 +16,10 @@ Two Arabic/Persian terms form the name of the model suite: عين الله, whic "eynollah"; it translates into English as "God's Eye" -- it sees (nearly) everything on the document image. See the flowchart below for the different stages and how they interact: + ![](https://user-images.githubusercontent.com/952378/100619946-1936f680-331e-11eb-9297-6e8b4cab3c16.png) + ## Models ### Image enhancement diff --git a/docs/usage.md b/docs/usage.md new file mode 100644 index 0000000..22443a2 --- /dev/null +++ b/docs/usage.md @@ -0,0 +1,97 @@ +# Usage documentation +The command-line interface can be called like this: + +```sh +eynollah \ + -i | -di \ + -o \ + -m \ + [OPTIONS] +``` + +The following options can be used to further configure the processing: + +| option | description | +|-------------------|:-------------------------------------------------------------------------------| +| `-fl` | full layout analysis including all steps and segmentation classes | +| `-light` | lighter and faster but simpler method for main region detection and deskewing | +| `-tab` | apply table detection | +| `-ae` | apply enhancement (the resulting image is saved to the output directory) | +| `-as` | apply scaling | +| `-cl` | apply contour detection for curved text lines instead of bounding boxes | +| `-ib` | apply binarization (the resulting image is saved to the output directory) | +| `-ep` | enable plotting (MUST always be used with `-sl`, `-sd`, `-sa`, `-si` or `-ae`) | +| `-eoi` | extract only images to output directory (other processing will not be done) | +| `-ho` | ignore headers for reading order dectection | +| `-si ` | save image regions detected to this directory | +| `-sd ` | save deskewed image to this directory | +| `-sl ` | save layout prediction as plot to this directory | +| `-sp ` | save cropped page image to this directory | +| `-sa ` | save all (plot, enhanced/binary image, layout) to this directory | + +If no option is set, the tool performs layout detection of main regions (background, text, images, separators and marginals). + +The best output quality is produced when RGB images are used as input rather than greyscale or binarized images. + +### `--full-layout` vs `--no-full-layout` + +Here are the difference in elements detected depending on the `--full-layout`/`--no-full-layout` command line flags: + +| | `--full-layout` | `--no-full-layout` | +|--------------------------|-----------------|--------------------| +| reading order | x | x | +| header regions | x | - | +| text regions | x | x | +| text regions / text line | x | x | +| drop-capitals | x | - | +| marginals | x | x | +| marginals / text line | x | x | +| image region | x | x | + +## Use as OCR-D processor +Eynollah ships with a CLI interface to be used as [OCR-D](https://ocr-d.de) processor that is described in [`ocrd-tool.json`](https://github.com/qurator-spk/eynollah/tree/main/src/eynollah/ocrd-tool.json). + +The source image file group with (preferably) RGB images should be used as input for Eynollah like this: + +``` +ocrd-eynollah-segment -I OCR-D-IMG -O SEG-LINE -P models +``` + +Any image referenced by `@imageFilename` in PAGE-XML is passed on directly to Eynollah as a processor, so that e.g. + +``` +ocrd-eynollah-segment -I OCR-D-IMG-BIN -O SEG-LINE -P models +``` + +uses the original (RGB) image despite any binarization that may have occured in previous OCR-D processing steps. + +## Use with Docker +TODO + +## Hints +* If none of the parameters is set to `true`, the tool will perform a layout detection of main regions (background, +text, images, separators and marginals). An advantage of this tool is that it tries to extract main text regions +separately as much as possible. + +* If you set `-ae` (**a**llow image **e**nhancement) parameter to `true`, the tool will first check the ppi +(pixel-per-inch) of the image and when it is less than 300, the tool will resize it and only then image enhancement will +occur. Image enhancement can also take place without this option, but by setting this option to `true`, the layout xml +data (e.g. coordinates) will be based on the resized and enhanced image instead of the original image. + +* For some documents, while the quality is good, their scale is very large, and the performance of tool decreases. In +such cases you can set `-as` (**a**llow **s**caling) to `true`. With this option enabled, the tool will try to rescale +the image and only then the layout detection process will begin. + +* If you care about drop capitals (initials) and headings, you can set `-fl` (**f**ull **l**ayout) to `true`. With this +setting, the tool can currently distinguish 7 document layout classes/elements. + +* In cases where the document includes curved headers or curved lines, rectangular bounding boxes for textlines will not +be a great option. In such cases it is strongly recommended setting the flag `-cl` (**c**urved **l**ines) to `true` to +find contours of curved lines instead of rectangular bounding boxes. Be advised that enabling this option increases the +processing time of the tool. + +* To crop and save image regions inside the document, set the parameter `-si` (**s**ave **i**mages) to true and provide +a directory path to store the extracted images. + +* To extract only images from a document, set the parameter `-eoi` (**e**xtract **o**nly **i**mages). Choosing this +option disables any other processing. To save the cropped images add `-ep` and `-si`.