eynollah/docs/ocrd.md at c9efbe187159a72a9095ebee850a246553f6d986

mirror of https://github.com/qurator-spk/eynollah.git synced 2025-11-10 06:34:11 +01:00

cneud 230e7cc705 integrate ocrd docs

2025-10-20 22:52:54 +02:00

Use as OCR-D processor

Eynollah ships with a CLI interface to be used as OCR-D processor, formally described in ocrd-tool.json.

When using Eynollah in OCR-D, the source image file group with (preferably) RGB images should be used as input like this:

ocrd-eynollah-segment -I OCR-D-IMG -O OCR-D-SEG -P models eynollah_layout_v0_5_0

If the input file group is PAGE-XML (from a previous OCR-D workflow step), Eynollah behaves as follows:

existing regions are kept and ignored (i.e. in effect they might overlap segments from Eynollah results)
existing annotation (and respective AlternativeImages) are partially ignored:
- previous page frame detection (cropped images)
- previous derotation (deskewed images)
- previous thresholding (binarized images)
if the page-level image nevertheless deviates from the original (@imageFilename) (because some other preprocessing step was in effect like denoised), then the output PAGE-XML will be based on that as new top-level (@imageFilename)
```
ocrd-eynollah-segment -I OCR-D-XYZ -O OCR-D-SEG -P models eynollah_layout_v0_5_0
```

In general, it makes more sense to add other workflow steps after Eynollah.

There is also an OCR-D processor for binarization:

ocrd-sbb-binarize -I OCR-D-IMG -O OCR-D-BIN -P models default-2021-03-09