diff --git a/README.md b/README.md index 5d5d5a8..8353005 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ ![](https://user-images.githubusercontent.com/952378/102350683-8a74db80-3fa5-11eb-8c7e-f743f7d6eae2.jpg) ## Features -* Document layout analysis using pixelwise segmentation models with support for 10 distinct segmentation classes: +* Document layout analysis using pixelwise segmentation models with support for 10 segmentation classes: * background, [page border](https://ocr-d.de/en/gt-guidelines/trans/lyRand.html), [text region](https://ocr-d.de/en/gt-guidelines/trans/lytextregion.html#textregionen__textregion_), [text line](https://ocr-d.de/en/gt-guidelines/pagexml/pagecontent_xsd_Complex_Type_pc_TextLineType.html), [header](https://ocr-d.de/en/gt-guidelines/trans/lyUeberschrift.html), [image](https://ocr-d.de/en/gt-guidelines/trans/lyBildbereiche.html), [separator](https://ocr-d.de/en/gt-guidelines/trans/lySeparatoren.html), [marginalia](https://ocr-d.de/en/gt-guidelines/trans/lyMarginalie.html), [initial](https://ocr-d.de/en/gt-guidelines/trans/lyInitiale.html), [table](https://ocr-d.de/en/gt-guidelines/trans/lyTabellen.html) * Textline segmentation to bounding boxes or polygons (contours) including for curved lines and vertical text * Document image binarization with pixelwise segmentation or hybrid CNN-Transformer models @@ -81,6 +81,8 @@ Eynollah supports five use cases: 4. [text recognition (OCR)](#ocr), and 5. [reading order detection](#reading-order-detection). +Some example outputs can be found in [`examples.md`](https://github.com/qurator-spk/eynollah/tree/main/docs/examples.md). + ### Layout Analysis The layout analysis module is responsible for detecting layout elements, identifying text lines, and determining reading @@ -152,16 +154,6 @@ TODO ### OCR -

- Input Image - Output Image -

- -

- Input Image - Output Image -

- The OCR module performs text recognition using either a CNN-RNN model or a Transformer model. The command-line interface for OCR can be called like this: @@ -176,17 +168,17 @@ eynollah ocr \ The following options can be used to further configure the ocr processing: -| option | description | -|-------------------|:------------------------------------------------------------------------------- | -| `-dib` | directory of bins(files type must be '.png'). Prediction with both RGB and bins. | -| `-doit` | Directory containing output images rendered with the predicted text | -| `--model_name` | Specific model file path to use for OCR | -| `-trocr` | transformer ocr will be applied, otherwise cnn_rnn model | -| `-etit` | textlines images and text in xml will be exported into output dir (OCR training data) | -| `-nmtc` | cropped textline images will not be masked with textline contour | -| `-bs` | ocr inference batch size. Default bs for trocr and cnn_rnn models are 2 and 8 respectively | -| `-ds_pref` | add an abbrevation of dataset name to generated training data | -| `-min_conf` | minimum OCR confidence value. OCRs with textline conf lower than this will be ignored | +| option | description | +|-------------------|:-------------------------------------------------------------------------------------------| +| `-dib` | directory of binarized images (file type must be '.png'), prediction with both RGB and bin | +| `-doit` | directory for output images rendered with the predicted text | +| `--model_name` | file path to use specific model for OCR | +| `-trocr` | use transformer ocr model (otherwise cnn_rnn model is used) | +| `-etit` | export textline images and text in xml to output dir (OCR training data) | +| `-nmtc` | cropped textline images will not be masked with textline contour | +| `-bs` | ocr inference batch size. Default batch size is 2 for trocr and 8 for cnn_rnn models | +| `-ds_pref` | add an abbrevation of dataset name to generated training data | +| `-min_conf` | minimum OCR confidence value. OCR with textline conf lower than this will be ignored | ### Reading Order Detection