From 2fc723d292093cdfb263e2d6681e478d7018b953 Mon Sep 17 00:00:00 2001
From: vahidrezanezhad <vahid631983@gmail.com>
Date: Wed, 22 Oct 2025 18:29:14 +0200
Subject: [PATCH] extend README

---
 README.md | 66 ++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 46 insertions(+), 20 deletions(-)
diff --git a/README.md b/README.md
index 3ba5086..c6ba0e4 100644
--- a/README.md
+++ b/README.md
@@ -91,24 +91,35 @@ eynollah layout \
 
 The following options can be used to further configure the processing:
 
-| option            | description                                                                    |
-|-------------------|:-------------------------------------------------------------------------------|
-| `-fl`             | full layout analysis including all steps and segmentation classes              |
-| `-light`          | lighter and faster but simpler method for main region detection and deskewing  |
-| `-tll`            | this indicates the light textline and should be passed with light version      |
-| `-tab`            | apply table detection                                                          |
-| `-ae`             | apply enhancement (the resulting image is saved to the output directory)       |
-| `-as`             | apply scaling                                                                  |
-| `-cl`             | apply contour detection for curved text lines instead of bounding boxes        |
-| `-ib`             | apply binarization (the resulting image is saved to the output directory)      |
-| `-ep`             | enable plotting (MUST always be used with `-sl`, `-sd`, `-sa`, `-si` or `-ae`) |
-| `-eoi`            | extract only images to output directory (other processing will not be done)    |
-| `-ho`             | ignore headers for reading order dectection                                    |
-| `-si <directory>` | save image regions detected to this directory                                  |
-| `-sd <directory>` | save deskewed image to this directory                                          |
-| `-sl <directory>` | save layout prediction as plot to this directory                               |
-| `-sp <directory>` | save cropped page image to this directory                                      |
-| `-sa <directory>` | save all (plot, enhanced/binary image, layout) to this directory               |
+| option            | description                                                                                 |
+|-------------------|:-------------------------------------------------------------------------------             |
+| `-fl`             | full layout analysis including all steps and segmentation classes (recommended)             |
+| `-light`          | lighter and faster but simpler method for main region detection and deskewing (recommended) |
+| `-tll`            | this indicates the light textline and should be passed with light version (recommended)     |
+| `-tab`            | apply table detection                                                                       |
+| `-ae`             | apply enhancement (the resulting image is saved to the output directory)                    |
+| `-as`             | apply scaling                                                                               |
+| `-cl`             | apply contour detection for curved text lines instead of bounding boxes                     |
+| `-ib`             | apply binarization (the resulting image is saved to the output directory)                   |
+| `-ep`             | enable plotting (MUST always be used with `-sl`, `-sd`, `-sa`, `-si` or `-ae`)              |
+| `-eoi`            | extract only images to output directory (other processing will not be done)                 |
+| `-ho`             | ignore headers for reading order dectection                                                 |
+| `-si <directory>` | save image regions detected to this directory                                               |
+| `-sd <directory>` | save deskewed image to this directory                                                       |
+| `-sl <directory>` | save layout prediction as plot to this directory                                            |
+| `-sp <directory>` | save cropped page image to this directory                                                   |
+| `-sa <directory>` | save all (plot, enhanced/binary image, layout) to this directory                            |
+| `-thart`          | threshold of artifical class in the case of textline detection. The default value is 0.1    |
+| `-tharl`          | threshold of artifical class in the case of layout detection. The default value is 0.1      |
+| `-ocr`            | do ocr                                                                                      |
+| `-tr`             | apply transformer ocr. Default model is a CNN-RNN model                                     |
+| `-bs_ocr`         | ocr inference batch size. Default bs for trocr and cnn_rnn models are 2 and 8 respectively  |
+| `-ncu`            | upper limit of columns in document image                                                    |
+| `-ncl`            | lower limit of columns in document image                                                    |
+| `-slro`           | skip layout detection and reading order                                                     |
+| `-romb`           | apply machine based reading order detection                                                 |
+| `-ipe`            | ignore page extraction                                                                      |
+
 
 If no further option is set, the tool performs layout detection of main regions (background, text, images, separators 
 and marginals).
@@ -124,7 +135,7 @@ The command-line interface for binarization can be called like this:
 eynollah binarization \
   -i <single image file> | -di <directory containing image files> \
   -o <output directory> \
-  -m <directory containing model files> \
+  -m <directory containing model files> 
 ```
 
 ### OCR
@@ -138,9 +149,24 @@ eynollah ocr \
   -i <single image file> | -di <directory containing image files> \
   -dx <directory of xmls> \
   -o <output directory> \
-  -m <directory containing model files> | --model_name <path to specific model> \
+  -m <directory containing model files> | --model_name <path to specific model>
 ```
 
+The following options can be used to further configure the ocr processing:
+
+| option            | description                                                                                 |
+|-------------------|:-------------------------------------------------------------------------------             |
+| `-dib`            | directory of bins(files type must be '.png'). Prediction with both RGB and bins.            |
+| `-doit`           | Directory containing output images rendered with the predicted text                         |
+| `--model_name`    | Specific model file path to use for OCR                                                     |
+| `-trocr`          | transformer ocr will be applied, otherwise cnn_rnn model                                    |
+| `-etit`           | textlines images and text in xml will be exported into output dir (OCR training data)       |
+| `-nmtc`           | cropped textline images will not be masked with textline contour                            |
+| `-bs`             | ocr inference batch size. Default bs for trocr and cnn_rnn models are 2 and 8 respectively  |
+| `-ds_pref`        | add an abbrevation of dataset name to generated training data                               |
+| `-min_conf`       | minimum OCR confidence value. OCRs with textline conf lower than this will be ignored       |
+
+
 ### Machine-based-reading-order
 
 The machine-based reading-order module employs a pretrained model to identify the reading order from layouts represented in PAGE-XML files.