mirror of
https://github.com/qurator-spk/eynollah.git
synced 2025-07-01 15:09:54 +02:00
Update README.md
This commit is contained in:
parent
0d47f28655
commit
cb7460e241
1 changed files with 16 additions and 9 deletions
25
README.md
25
README.md
|
@ -1,6 +1,3 @@
|
||||||
**WARNING! This tool is currently** 
|
|
||||||
**For any planned changes, please have a look at the [Pull Requests](https://github.com/qurator-spk/eynollah/pulls).**
|
|
||||||
|
|
||||||
# Eynollah
|
# Eynollah
|
||||||
> Document Layout Analysis
|
> Document Layout Analysis
|
||||||
|
|
||||||
|
@ -55,12 +52,22 @@ Some heuristic methods are also employed to further improve the model prediction
|
||||||
* Finally, using the derived coordinates, bounding boxes are determined for each textline.
|
* Finally, using the derived coordinates, bounding boxes are determined for each textline.
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
`pip install .`
|
`pip install .` or
|
||||||
|
|
||||||
|
`pip install . -e` for editable installation
|
||||||
|
|
||||||
|
Alternatively, you can also use `make` with these targets:
|
||||||
|
|
||||||
|
`make install` or
|
||||||
|
|
||||||
|
`make install-dev` for editable installation
|
||||||
|
|
||||||
### Models
|
### Models
|
||||||
|
|
||||||
In order to run this tool you also need trained models. You can download our pretrained models from [qurator-data.de](https://qurator-data.de/eynollah/).
|
In order to run this tool you also need trained models. You can download our pretrained models from [qurator-data.de](https://qurator-data.de/eynollah/).
|
||||||
|
|
||||||
|
Alternatively, running `make models` will download and extract models to `$(PWD)/models_eynollah`.
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
The basic command-line interface can be called like this:
|
The basic command-line interface can be called like this:
|
||||||
|
@ -72,7 +79,7 @@ The basic command-line interface can be called like this:
|
||||||
-fl <if true, the tool will perform full layout analysis> \
|
-fl <if true, the tool will perform full layout analysis> \
|
||||||
-ae <if true, the tool will resize and enhance the image and produce the resulting image as output> \
|
-ae <if true, the tool will resize and enhance the image and produce the resulting image as output> \
|
||||||
-as <if true, the tool will check whether the document needs rescaling or not> \
|
-as <if true, the tool will check whether the document needs rescaling or not> \
|
||||||
-cl <if true, the tool will try to extract the contours of texlines instead of rectangle bounding boxes> \
|
-cl <if true, the tool will extract the contours of curved textlines instead of rectangle bounding boxes> \
|
||||||
-si <if a directory is given here, the tool will output image regions inside documents there>
|
-si <if a directory is given here, the tool will output image regions inside documents there>
|
||||||
|
|
||||||
The tool does accept and works better on original images (RGB format) than binarized images.
|
The tool does accept and works better on original images (RGB format) than binarized images.
|
||||||
|
@ -87,10 +94,10 @@ First of all, this model makes use of up to 9 trained models which are responsib
|
||||||
|
|
||||||
* For some documents, while the quality is good, their scale is extremly large and the performance of tool decreases. In such cases you can set `-as` (**a**llow **s**caling) to `true`. With this option enabled, the tool will try to rescale the image and only then the layout detection process will begin.
|
* For some documents, while the quality is good, their scale is extremly large and the performance of tool decreases. In such cases you can set `-as` (**a**llow **s**caling) to `true`. With this option enabled, the tool will try to rescale the image and only then the layout detection process will begin.
|
||||||
|
|
||||||
* If you care about drop capitals (initials) and headings, you can set `-fl` (**f**ull **l**ayout) to `true`. As we can see in the case of full layout, we can currently distinguish 7 document layout classes/elements.
|
* If you care about drop capitals (initials) and headings, you can set `-fl` (**f**ull **l**ayout) to `true`. With this setting, the tool can currently distinguish 7 document layout classes/elements.
|
||||||
|
|
||||||
* In cases where the documents include curved headers or curved lines it is obvious that rectangular bounding boxes for textlines will not be a great option. For this, we have developed an option which tries to find contours of the curvy textlines. You can set `-cl` (**c**urved **l**ines) to `true` to enable this option. Be advised that this will increase the time needed for the tool to process the document.
|
* In cases where the document includes curved headers or curved lines, rectangular bounding boxes for textlines will not be a great option. In such cases it is strongly recommended to set the flag `-cl` (**c**urved **l**ines) to `true` to find countours of curved lines instead of rectangular boundinx boxes. Be advised that enabling this option increases the processing time of the tool.
|
||||||
|
|
||||||
* If you want to crop and save image regions inside the document, just provide a directory with the parameter, `-si` (**s**ave **i**mages).
|
* To crop and save image regions inside the document, set the parameter `-si` (**s**ave **i**mages) to true and provide a directory path to store the extracted images.
|
||||||
|
|
||||||
* This tool is actively being developed. If any problems occur or the performance does not meet your expectations, we welcome your feedback.
|
* This tool is actively being developed. If problems occur or the performance does not meet your expectations, we welcome your feedback via [issues](https://github.com/qurator-spk/eynollah/issues).
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue