update readme (OCR-D section)

2025-08-02 14:49:54 +02:00 · 2025-04-01 23:26:38 +02:00 · 2025-04-01 23:26:38 +02:00 · 903c87aca0
commit 903c87aca0
parent dcf2ed5e22
1 changed files with 16 additions and 12 deletions
--- a/README.md
+++ b/README.md
@ -83,23 +83,27 @@ If no option is set, the tool performs layout detection of main regions (backgro
 The best output quality is produced when RGB images are used as input rather than greyscale or binarized images.

 #### Use as OCR-D processor
-🚧 **Work in progress** 

-Eynollah ships with a CLI interface to be used as [OCR-D](https://ocr-d.de) processor. 
+Eynollah ships with a CLI interface to be used as [OCR-D](https://ocr-d.de) [processor](https://ocr-d.de/en/spec/cli).

 In this case, the source image file group with (preferably) RGB images should be used as input like this:

-```
-ocrd-eynollah-segment -I OCR-D-IMG -O SEG-LINE -P models
-```
-    
-Any image referenced by `@imageFilename` in PAGE-XML is passed on directly to Eynollah as a processor, so that e.g.
+    ocrd-eynollah-segment -I OCR-D-IMG -O OCR-D-SEG -P models 2022-04-05

-```
-ocrd-eynollah-segment -I OCR-D-IMG-BIN -O SEG-LINE -P models
-```
-    
-uses the original (RGB) image despite any binarization that may have occured in previous OCR-D processing steps
+
+If the input file group is PAGE-XML (from a previous OCR-D workflow step), Eynollah behaves as follows:
+- existing regions are kept and ignored (i.e. in effect they might overlap segments from Eynollah results)
+- existing annotation (and respective `AlternativeImage`s) are partially _ignored_:
+  - previous page frame detection (`cropped` images)
+  - previous derotation (`deskewed` images)
+  - previous thresholding (`binarized` images)
+- if the page-level image nevertheless deviates from the original (`@imageFilename`)
+  (because some other preprocessing step was in effect like `denoised`), then
+  the output PAGE-XML will be based on that as new top-level (`@imageFilename`)
+
+    ocrd-eynollah-segment -I OCR-D-XYZ -O OCR-D-SEG -P models 2022-04-05
+
+Still, in general, it makes more sense to add other workflow steps **after** Eynollah.

 #### Additional documentation
 Please check the [wiki](https://github.com/qurator-spk/eynollah/wiki).