mirror of
https://github.com/qurator-spk/eynollah.git
synced 2025-06-10 04:39:54 +02:00
update readme (OCR-D section)
This commit is contained in:
parent
dcf2ed5e22
commit
903c87aca0
1 changed files with 16 additions and 12 deletions
24
README.md
24
README.md
|
@ -83,23 +83,27 @@ If no option is set, the tool performs layout detection of main regions (backgro
|
||||||
The best output quality is produced when RGB images are used as input rather than greyscale or binarized images.
|
The best output quality is produced when RGB images are used as input rather than greyscale or binarized images.
|
||||||
|
|
||||||
#### Use as OCR-D processor
|
#### Use as OCR-D processor
|
||||||
🚧 **Work in progress**
|
|
||||||
|
|
||||||
Eynollah ships with a CLI interface to be used as [OCR-D](https://ocr-d.de) processor.
|
Eynollah ships with a CLI interface to be used as [OCR-D](https://ocr-d.de) [processor](https://ocr-d.de/en/spec/cli).
|
||||||
|
|
||||||
In this case, the source image file group with (preferably) RGB images should be used as input like this:
|
In this case, the source image file group with (preferably) RGB images should be used as input like this:
|
||||||
|
|
||||||
```
|
ocrd-eynollah-segment -I OCR-D-IMG -O OCR-D-SEG -P models 2022-04-05
|
||||||
ocrd-eynollah-segment -I OCR-D-IMG -O SEG-LINE -P models
|
|
||||||
```
|
|
||||||
|
|
||||||
Any image referenced by `@imageFilename` in PAGE-XML is passed on directly to Eynollah as a processor, so that e.g.
|
|
||||||
|
|
||||||
```
|
If the input file group is PAGE-XML (from a previous OCR-D workflow step), Eynollah behaves as follows:
|
||||||
ocrd-eynollah-segment -I OCR-D-IMG-BIN -O SEG-LINE -P models
|
- existing regions are kept and ignored (i.e. in effect they might overlap segments from Eynollah results)
|
||||||
```
|
- existing annotation (and respective `AlternativeImage`s) are partially _ignored_:
|
||||||
|
- previous page frame detection (`cropped` images)
|
||||||
|
- previous derotation (`deskewed` images)
|
||||||
|
- previous thresholding (`binarized` images)
|
||||||
|
- if the page-level image nevertheless deviates from the original (`@imageFilename`)
|
||||||
|
(because some other preprocessing step was in effect like `denoised`), then
|
||||||
|
the output PAGE-XML will be based on that as new top-level (`@imageFilename`)
|
||||||
|
|
||||||
uses the original (RGB) image despite any binarization that may have occured in previous OCR-D processing steps
|
ocrd-eynollah-segment -I OCR-D-XYZ -O OCR-D-SEG -P models 2022-04-05
|
||||||
|
|
||||||
|
Still, in general, it makes more sense to add other workflow steps **after** Eynollah.
|
||||||
|
|
||||||
#### Additional documentation
|
#### Additional documentation
|
||||||
Please check the [wiki](https://github.com/qurator-spk/eynollah/wiki).
|
Please check the [wiki](https://github.com/qurator-spk/eynollah/wiki).
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue