mirror of
https://github.com/qurator-spk/eynollah.git
synced 2025-10-06 14:39:55 +02:00
📝 update README
This commit is contained in:
parent
830cc2c30a
commit
3123add815
4 changed files with 141 additions and 62 deletions
55
README.md
55
README.md
|
@ -1,5 +1,6 @@
|
||||||
# Eynollah
|
# Eynollah
|
||||||
> Document Layout Analysis with Deep Learning and Heuristics
|
|
||||||
|
> Document Layout Analysis, Binarization and OCR with Deep Learning and Heuristics
|
||||||
|
|
||||||
[](https://pypi.org/project/eynollah/)
|
[](https://pypi.org/project/eynollah/)
|
||||||
[](https://github.com/qurator-spk/eynollah/actions/workflows/test-eynollah.yml)
|
[](https://github.com/qurator-spk/eynollah/actions/workflows/test-eynollah.yml)
|
||||||
|
@ -23,6 +24,7 @@
|
||||||
historical documents and therefore processing can be very slow. We aim to improve this, but contributions are welcome.
|
historical documents and therefore processing can be very slow. We aim to improve this, but contributions are welcome.
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
Python `3.8-3.11` with Tensorflow `<2.13` on Linux are currently supported.
|
Python `3.8-3.11` with Tensorflow `<2.13` on Linux are currently supported.
|
||||||
|
|
||||||
For (limited) GPU support the CUDA toolkit needs to be installed.
|
For (limited) GPU support the CUDA toolkit needs to be installed.
|
||||||
|
@ -42,19 +44,30 @@ cd eynollah; pip install -e .
|
||||||
|
|
||||||
Alternatively, you can run `make install` or `make install-dev` for editable installation.
|
Alternatively, you can run `make install` or `make install-dev` for editable installation.
|
||||||
|
|
||||||
|
To also install the dependencies for the OCR engines:
|
||||||
|
|
||||||
|
```
|
||||||
|
pip install "eynollah[OCR]"
|
||||||
|
# or
|
||||||
|
make install EXTRAS=OCR
|
||||||
|
```
|
||||||
|
|
||||||
## Models
|
## Models
|
||||||
Pretrained models can be downloaded from [qurator-data.de](https://qurator-data.de/eynollah/) or [huggingface](https://huggingface.co/SBB?search_models=eynollah).
|
Pretrained models can be downloaded from [zenodo](https://zenodo.org/records/17194824) or [huggingface](https://huggingface.co/SBB?search_models=eynollah).
|
||||||
|
|
||||||
For documentation on methods and models, have a look at [`models.md`](https://github.com/qurator-spk/eynollah/tree/main/docs/models.md).
|
For documentation on methods and models, have a look at [`models.md`](https://github.com/qurator-spk/eynollah/tree/main/docs/models.md).
|
||||||
|
|
||||||
## Train
|
## Train
|
||||||
|
|
||||||
In case you want to train your own model with Eynollah, have a look at [`train.md`](https://github.com/qurator-spk/eynollah/tree/main/docs/train.md).
|
In case you want to train your own model with Eynollah, have a look at [`train.md`](https://github.com/qurator-spk/eynollah/tree/main/docs/train.md).
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
Eynollah supports four use cases: layout analysis (segmentation), binarization, text recognition (OCR),
|
|
||||||
and (trainable) reading order detection.
|
Eynollah supports five use cases: layout analysis (segmentation), binarization,
|
||||||
|
image enhancement, text recognition (OCR), and (trainable) reading order detection.
|
||||||
|
|
||||||
### Layout Analysis
|
### Layout Analysis
|
||||||
|
|
||||||
The layout analysis module is responsible for detecting layouts, identifying text lines, and determining reading order
|
The layout analysis module is responsible for detecting layouts, identifying text lines, and determining reading order
|
||||||
using both heuristic methods or a machine-based reading order detection model.
|
using both heuristic methods or a machine-based reading order detection model.
|
||||||
|
|
||||||
|
@ -97,58 +110,54 @@ and marginals).
|
||||||
The best output quality is produced when RGB images are used as input rather than greyscale or binarized images.
|
The best output quality is produced when RGB images are used as input rather than greyscale or binarized images.
|
||||||
|
|
||||||
### Binarization
|
### Binarization
|
||||||
|
|
||||||
The binarization module performs document image binarization using pretrained pixelwise segmentation models.
|
The binarization module performs document image binarization using pretrained pixelwise segmentation models.
|
||||||
|
|
||||||
The command-line interface for binarization of single image can be called like this:
|
The command-line interface for binarization of single image can be called like this:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
eynollah binarization \
|
eynollah binarization \
|
||||||
|
-i <single image file> | -di <directory containing image files> \
|
||||||
|
-o <output directory> \
|
||||||
-m <directory containing model files> \
|
-m <directory containing model files> \
|
||||||
<single image file> \
|
|
||||||
<output image>
|
|
||||||
```
|
|
||||||
|
|
||||||
and for flowing from a directory like this:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
eynollah binarization \
|
|
||||||
-m <path to directory containing model files> \
|
|
||||||
-di <directory containing image files> \
|
|
||||||
-do <output directory>
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### OCR
|
### OCR
|
||||||
|
|
||||||
The OCR module performs text recognition from images using two main families of pretrained models: CNN-RNN–based OCR and Transformer-based OCR.
|
The OCR module performs text recognition from images using two main families of pretrained models: CNN-RNN–based OCR and Transformer-based OCR.
|
||||||
|
|
||||||
The command-line interface for ocr can be called like this:
|
The command-line interface for ocr can be called like this:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
eynollah ocr \
|
eynollah ocr \
|
||||||
-m <path to directory containing model files> | --model_name <path to specific model> \
|
|
||||||
-i <single image file> | -di <directory containing image files> \
|
-i <single image file> | -di <directory containing image files> \
|
||||||
-dx <directory of xmls> \
|
-dx <directory of xmls> \
|
||||||
-o <output directory>
|
-o <output directory> \
|
||||||
|
-m <path to directory containing model files> | --model_name <path to specific model> \
|
||||||
```
|
```
|
||||||
|
|
||||||
### Machine-based-reading-order
|
### Machine-based-reading-order
|
||||||
|
|
||||||
The machine-based reading-order module employs a pretrained model to identify the reading order from layouts represented in PAGE-XML files.
|
The machine-based reading-order module employs a pretrained model to identify the reading order from layouts represented in PAGE-XML files.
|
||||||
|
|
||||||
The command-line interface for machine based reading order can be called like this:
|
The command-line interface for machine based reading order can be called like this:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
eynollah machine-based-reading-order \
|
eynollah machine-based-reading-order \
|
||||||
-m <path to directory containing model files> \
|
-i <single image file> | -di <directory containing image files> \
|
||||||
-xml <xml file name> | -dx <directory containing xml files> \
|
-xml <xml file name> | -dx <directory containing xml files> \
|
||||||
|
-m <path to directory containing model files> \
|
||||||
-o <output directory>
|
-o <output directory>
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Use as OCR-D processor
|
#### Use as OCR-D processor
|
||||||
|
|
||||||
Eynollah ships with a CLI interface to be used as [OCR-D](https://ocr-d.de) [processor](https://ocr-d.de/en/spec/cli),
|
Eynollah ships with a CLI interface to be used as [OCR-D](https://ocr-d.de) [processor](https://ocr-d.de/en/spec/cli),
|
||||||
formally described in [`ocrd-tool.json`](https://github.com/qurator-spk/eynollah/tree/main/src/eynollah/ocrd-tool.json).
|
formally described in [`ocrd-tool.json`](https://github.com/qurator-spk/eynollah/tree/main/src/eynollah/ocrd-tool.json).
|
||||||
|
|
||||||
In this case, the source image file group with (preferably) RGB images should be used as input like this:
|
In this case, the source image file group with (preferably) RGB images should be used as input like this:
|
||||||
|
|
||||||
ocrd-eynollah-segment -I OCR-D-IMG -O OCR-D-SEG -P models 2022-04-05
|
ocrd-eynollah-segment -I OCR-D-IMG -O OCR-D-SEG -P models eynollah_layout_v0_5_0
|
||||||
|
|
||||||
If the input file group is PAGE-XML (from a previous OCR-D workflow step), Eynollah behaves as follows:
|
If the input file group is PAGE-XML (from a previous OCR-D workflow step), Eynollah behaves as follows:
|
||||||
- existing regions are kept and ignored (i.e. in effect they might overlap segments from Eynollah results)
|
- existing regions are kept and ignored (i.e. in effect they might overlap segments from Eynollah results)
|
||||||
|
@ -160,14 +169,20 @@ If the input file group is PAGE-XML (from a previous OCR-D workflow step), Eynol
|
||||||
(because some other preprocessing step was in effect like `denoised`), then
|
(because some other preprocessing step was in effect like `denoised`), then
|
||||||
the output PAGE-XML will be based on that as new top-level (`@imageFilename`)
|
the output PAGE-XML will be based on that as new top-level (`@imageFilename`)
|
||||||
|
|
||||||
ocrd-eynollah-segment -I OCR-D-XYZ -O OCR-D-SEG -P models 2022-04-05
|
ocrd-eynollah-segment -I OCR-D-XYZ -O OCR-D-SEG -P models eynollah_layout_v0_5_0
|
||||||
|
|
||||||
Still, in general, it makes more sense to add other workflow steps **after** Eynollah.
|
Still, in general, it makes more sense to add other workflow steps **after** Eynollah.
|
||||||
|
|
||||||
|
There is also an OCR-D processor for the binarization:
|
||||||
|
|
||||||
|
ocrd-sbb-binarize -I OCR-D-IMG -O OCR-D-BIN -P models default-2021-03-09
|
||||||
|
|
||||||
#### Additional documentation
|
#### Additional documentation
|
||||||
|
|
||||||
Please check the [wiki](https://github.com/qurator-spk/eynollah/wiki).
|
Please check the [wiki](https://github.com/qurator-spk/eynollah/wiki).
|
||||||
|
|
||||||
## How to cite
|
## How to cite
|
||||||
|
|
||||||
If you find this tool useful in your work, please consider citing our paper:
|
If you find this tool useful in your work, please consider citing our paper:
|
||||||
|
|
||||||
```bibtex
|
```bibtex
|
||||||
|
|
|
@ -1,5 +1,6 @@
|
||||||
# Models documentation
|
# Models documentation
|
||||||
This suite of 14 models presents a document layout analysis (DLA) system for historical documents implemented by
|
|
||||||
|
This suite of 15 models presents a document layout analysis (DLA) system for historical documents implemented by
|
||||||
pixel-wise segmentation using a combination of a ResNet50 encoder with various U-Net decoders. In addition, heuristic
|
pixel-wise segmentation using a combination of a ResNet50 encoder with various U-Net decoders. In addition, heuristic
|
||||||
methods are applied to detect marginals and to determine the reading order of text regions.
|
methods are applied to detect marginals and to determine the reading order of text regions.
|
||||||
|
|
||||||
|
@ -23,6 +24,7 @@ See the flowchart below for the different stages and how they interact:
|
||||||
## Models
|
## Models
|
||||||
|
|
||||||
### Image enhancement
|
### Image enhancement
|
||||||
|
|
||||||
Model card: [Image Enhancement](https://huggingface.co/SBB/eynollah-enhancement)
|
Model card: [Image Enhancement](https://huggingface.co/SBB/eynollah-enhancement)
|
||||||
|
|
||||||
This model addresses image resolution, specifically targeting documents with suboptimal resolution. In instances where
|
This model addresses image resolution, specifically targeting documents with suboptimal resolution. In instances where
|
||||||
|
@ -30,12 +32,14 @@ the detection of document layout exhibits inadequate performance, the proposed e
|
||||||
the quality and clarity of the images, thus facilitating enhanced visual interpretation and analysis.
|
the quality and clarity of the images, thus facilitating enhanced visual interpretation and analysis.
|
||||||
|
|
||||||
### Page extraction / border detection
|
### Page extraction / border detection
|
||||||
|
|
||||||
Model card: [Page Extraction/Border Detection](https://huggingface.co/SBB/eynollah-page-extraction)
|
Model card: [Page Extraction/Border Detection](https://huggingface.co/SBB/eynollah-page-extraction)
|
||||||
|
|
||||||
A problem that can negatively affect OCR are black margins around a page caused by document scanning. A deep learning
|
A problem that can negatively affect OCR are black margins around a page caused by document scanning. A deep learning
|
||||||
model helps to crop to the page borders by using a pixel-wise segmentation method.
|
model helps to crop to the page borders by using a pixel-wise segmentation method.
|
||||||
|
|
||||||
### Column classification
|
### Column classification
|
||||||
|
|
||||||
Model card: [Column Classification](https://huggingface.co/SBB/eynollah-column-classifier)
|
Model card: [Column Classification](https://huggingface.co/SBB/eynollah-column-classifier)
|
||||||
|
|
||||||
This model is a trained classifier that recognizes the number of columns in a document by use of a training set with
|
This model is a trained classifier that recognizes the number of columns in a document by use of a training set with
|
||||||
|
@ -43,6 +47,7 @@ manual classification of all documents into six classes with either one, two, th
|
||||||
respectively.
|
respectively.
|
||||||
|
|
||||||
### Binarization
|
### Binarization
|
||||||
|
|
||||||
Model card: [Binarization](https://huggingface.co/SBB/eynollah-binarization)
|
Model card: [Binarization](https://huggingface.co/SBB/eynollah-binarization)
|
||||||
|
|
||||||
This model is designed to tackle the intricate task of document image binarization, which involves segmentation of the
|
This model is designed to tackle the intricate task of document image binarization, which involves segmentation of the
|
||||||
|
@ -52,6 +57,7 @@ capability of the model enables improved accuracy and reliability in subsequent
|
||||||
enhanced document understanding and interpretation.
|
enhanced document understanding and interpretation.
|
||||||
|
|
||||||
### Main region detection
|
### Main region detection
|
||||||
|
|
||||||
Model card: [Main Region Detection](https://huggingface.co/SBB/eynollah-main-regions)
|
Model card: [Main Region Detection](https://huggingface.co/SBB/eynollah-main-regions)
|
||||||
|
|
||||||
This model has employed a different set of labels, including an artificial class specifically designed to encompass the
|
This model has employed a different set of labels, including an artificial class specifically designed to encompass the
|
||||||
|
@ -61,6 +67,7 @@ during the inference phase. By incorporating this methodology, improved efficien
|
||||||
model's ability to accurately identify and classify text regions within documents.
|
model's ability to accurately identify and classify text regions within documents.
|
||||||
|
|
||||||
### Main region detection (with scaling augmentation)
|
### Main region detection (with scaling augmentation)
|
||||||
|
|
||||||
Model card: [Main Region Detection (with scaling augmentation)](https://huggingface.co/SBB/eynollah-main-regions-aug-scaling)
|
Model card: [Main Region Detection (with scaling augmentation)](https://huggingface.co/SBB/eynollah-main-regions-aug-scaling)
|
||||||
|
|
||||||
Utilizing scaling augmentation, this model leverages the capability to effectively segment elements of extremely high or
|
Utilizing scaling augmentation, this model leverages the capability to effectively segment elements of extremely high or
|
||||||
|
@ -69,12 +76,14 @@ categorizing and isolating such elements, thereby enhancing its overall performa
|
||||||
documents with varying scale characteristics.
|
documents with varying scale characteristics.
|
||||||
|
|
||||||
### Main region detection (with rotation augmentation)
|
### Main region detection (with rotation augmentation)
|
||||||
|
|
||||||
Model card: [Main Region Detection (with rotation augmentation)](https://huggingface.co/SBB/eynollah-main-regions-aug-rotation)
|
Model card: [Main Region Detection (with rotation augmentation)](https://huggingface.co/SBB/eynollah-main-regions-aug-rotation)
|
||||||
|
|
||||||
This model takes advantage of rotation augmentation. This helps the tool to segment the vertical text regions in a
|
This model takes advantage of rotation augmentation. This helps the tool to segment the vertical text regions in a
|
||||||
robust way.
|
robust way.
|
||||||
|
|
||||||
### Main region detection (ensembled)
|
### Main region detection (ensembled)
|
||||||
|
|
||||||
Model card: [Main Region Detection (ensembled)](https://huggingface.co/SBB/eynollah-main-regions-ensembled)
|
Model card: [Main Region Detection (ensembled)](https://huggingface.co/SBB/eynollah-main-regions-ensembled)
|
||||||
|
|
||||||
The robustness of this model is attained through an ensembling technique that combines the weights from various epochs.
|
The robustness of this model is attained through an ensembling technique that combines the weights from various epochs.
|
||||||
|
@ -82,16 +91,19 @@ By employing this approach, the model achieves a high level of resilience and st
|
||||||
strengths of multiple epochs to enhance its overall performance and deliver consistent and reliable results.
|
strengths of multiple epochs to enhance its overall performance and deliver consistent and reliable results.
|
||||||
|
|
||||||
### Full region detection (1,2-column documents)
|
### Full region detection (1,2-column documents)
|
||||||
|
|
||||||
Model card: [Full Region Detection (1,2-column documents)](https://huggingface.co/SBB/eynollah-full-regions-1column)
|
Model card: [Full Region Detection (1,2-column documents)](https://huggingface.co/SBB/eynollah-full-regions-1column)
|
||||||
|
|
||||||
This model deals with documents comprising of one and two columns.
|
This model deals with documents comprising of one and two columns.
|
||||||
|
|
||||||
### Full region detection (3,n-column documents)
|
### Full region detection (3,n-column documents)
|
||||||
|
|
||||||
Model card: [Full Region Detection (3,n-column documents)](https://huggingface.co/SBB/eynollah-full-regions-3pluscolumn)
|
Model card: [Full Region Detection (3,n-column documents)](https://huggingface.co/SBB/eynollah-full-regions-3pluscolumn)
|
||||||
|
|
||||||
This model is responsible for detecting headers and drop capitals in documents with three or more columns.
|
This model is responsible for detecting headers and drop capitals in documents with three or more columns.
|
||||||
|
|
||||||
### Textline detection
|
### Textline detection
|
||||||
|
|
||||||
Model card: [Textline Detection](https://huggingface.co/SBB/eynollah-textline)
|
Model card: [Textline Detection](https://huggingface.co/SBB/eynollah-textline)
|
||||||
|
|
||||||
The method for textline detection combines deep learning and heuristics. In the deep learning part, an image-to-image
|
The method for textline detection combines deep learning and heuristics. In the deep learning part, an image-to-image
|
||||||
|
@ -106,6 +118,7 @@ segmentation is first deskewed and then the textlines are separated with the sam
|
||||||
textline bounding boxes. Later, the strap is rotated back into its original orientation.
|
textline bounding boxes. Later, the strap is rotated back into its original orientation.
|
||||||
|
|
||||||
### Textline detection (light)
|
### Textline detection (light)
|
||||||
|
|
||||||
Model card: [Textline Detection Light (simpler but faster method)](https://huggingface.co/SBB/eynollah-textline_light)
|
Model card: [Textline Detection Light (simpler but faster method)](https://huggingface.co/SBB/eynollah-textline_light)
|
||||||
|
|
||||||
The method for textline detection combines deep learning and heuristics. In the deep learning part, an image-to-image
|
The method for textline detection combines deep learning and heuristics. In the deep learning part, an image-to-image
|
||||||
|
@ -119,6 +132,7 @@ enhancing the model's ability to accurately identify and delineate individual te
|
||||||
eliminates the need for additional heuristics in extracting textline contours.
|
eliminates the need for additional heuristics in extracting textline contours.
|
||||||
|
|
||||||
### Table detection
|
### Table detection
|
||||||
|
|
||||||
Model card: [Table Detection](https://huggingface.co/SBB/eynollah-tables)
|
Model card: [Table Detection](https://huggingface.co/SBB/eynollah-tables)
|
||||||
|
|
||||||
The objective of this model is to perform table segmentation in historical document images. Due to the pixel-wise
|
The objective of this model is to perform table segmentation in historical document images. Due to the pixel-wise
|
||||||
|
@ -128,17 +142,21 @@ effectively identify and delineate tables within the historical document images,
|
||||||
enabling subsequent analysis and interpretation.
|
enabling subsequent analysis and interpretation.
|
||||||
|
|
||||||
### Image detection
|
### Image detection
|
||||||
|
|
||||||
Model card: [Image Detection](https://huggingface.co/SBB/eynollah-image-extraction)
|
Model card: [Image Detection](https://huggingface.co/SBB/eynollah-image-extraction)
|
||||||
|
|
||||||
This model is used for the task of illustration detection only.
|
This model is used for the task of illustration detection only.
|
||||||
|
|
||||||
### Reading order detection
|
### Reading order detection
|
||||||
|
|
||||||
Model card: [Reading Order Detection]()
|
Model card: [Reading Order Detection]()
|
||||||
|
|
||||||
TODO
|
TODO
|
||||||
|
|
||||||
## Heuristic methods
|
## Heuristic methods
|
||||||
|
|
||||||
Additionally, some heuristic methods are employed to further improve the model predictions:
|
Additionally, some heuristic methods are employed to further improve the model predictions:
|
||||||
|
|
||||||
* After border detection, the largest contour is determined by a bounding box, and the image cropped to these coordinates.
|
* After border detection, the largest contour is determined by a bounding box, and the image cropped to these coordinates.
|
||||||
* For text region detection, the image is scaled up to make it easier for the model to detect background space between text regions.
|
* For text region detection, the image is scaled up to make it easier for the model to detect background space between text regions.
|
||||||
* A minimum area is defined for text regions in relation to the overall image dimensions, so that very small regions that are noise can be filtered out.
|
* A minimum area is defined for text regions in relation to the overall image dimensions, so that very small regions that are noise can be filtered out.
|
||||||
|
|
|
@ -1,4 +1,5 @@
|
||||||
# Training documentation
|
# Training documentation
|
||||||
|
|
||||||
This aims to assist users in preparing training datasets, training models, and performing inference with trained models.
|
This aims to assist users in preparing training datasets, training models, and performing inference with trained models.
|
||||||
We cover various use cases including pixel-wise segmentation, image classification, image enhancement, and machine-based
|
We cover various use cases including pixel-wise segmentation, image classification, image enhancement, and machine-based
|
||||||
reading order detection. For each use case, we provide guidance on how to generate the corresponding training dataset.
|
reading order detection. For each use case, we provide guidance on how to generate the corresponding training dataset.
|
||||||
|
@ -11,6 +12,7 @@ The following three tasks can all be accomplished using the code in the
|
||||||
* inference with the trained model
|
* inference with the trained model
|
||||||
|
|
||||||
## Generate training dataset
|
## Generate training dataset
|
||||||
|
|
||||||
The script `generate_gt_for_training.py` is used for generating training datasets. As the results of the following
|
The script `generate_gt_for_training.py` is used for generating training datasets. As the results of the following
|
||||||
command demonstrates, the dataset generator provides three different commands:
|
command demonstrates, the dataset generator provides three different commands:
|
||||||
|
|
||||||
|
@ -23,14 +25,19 @@ These three commands are:
|
||||||
* pagexml2label
|
* pagexml2label
|
||||||
|
|
||||||
### image-enhancement
|
### image-enhancement
|
||||||
|
|
||||||
Generating a training dataset for image enhancement is quite straightforward. All that is needed is a set of
|
Generating a training dataset for image enhancement is quite straightforward. All that is needed is a set of
|
||||||
high-resolution images. The training dataset can then be generated using the following command:
|
high-resolution images. The training dataset can then be generated using the following command:
|
||||||
|
|
||||||
`python generate_gt_for_training.py image-enhancement -dis "dir of high resolution images" -dois "dir where degraded
|
```sh
|
||||||
images will be written" -dols "dir where the corresponding high resolution image will be written as label" -scs
|
python generate_gt_for_training.py image-enhancement \
|
||||||
"degrading scales json file"`
|
-dis "dir of high resolution images" \
|
||||||
|
-dois "dir where degraded images will be written" \
|
||||||
|
-dols "dir where the corresponding high resolution image will be written as label" \
|
||||||
|
-scs "degrading scales json file"
|
||||||
|
```
|
||||||
|
|
||||||
The scales JSON file is a dictionary with a key named 'scales' and values representing scales smaller than 1. Images are
|
The scales JSON file is a dictionary with a key named `scales` and values representing scales smaller than 1. Images are
|
||||||
downscaled based on these scales and then upscaled again to their original size. This process causes the images to lose
|
downscaled based on these scales and then upscaled again to their original size. This process causes the images to lose
|
||||||
resolution at different scales. The degraded images are used as input images, and the original high-resolution images
|
resolution at different scales. The degraded images are used as input images, and the original high-resolution images
|
||||||
serve as labels. The enhancement model can be trained with this generated dataset. The scales JSON file looks like this:
|
serve as labels. The enhancement model can be trained with this generated dataset. The scales JSON file looks like this:
|
||||||
|
@ -42,6 +49,7 @@ serve as labels. The enhancement model can be trained with this generated datase
|
||||||
```
|
```
|
||||||
|
|
||||||
### machine-based-reading-order
|
### machine-based-reading-order
|
||||||
|
|
||||||
For machine-based reading order, we aim to determine the reading priority between two sets of text regions. The model's
|
For machine-based reading order, we aim to determine the reading priority between two sets of text regions. The model's
|
||||||
input is a three-channel image: the first and last channels contain information about each of the two text regions,
|
input is a three-channel image: the first and last channels contain information about each of the two text regions,
|
||||||
while the middle channel encodes prominent layout elements necessary for reading order, such as separators and headers.
|
while the middle channel encodes prominent layout elements necessary for reading order, such as separators and headers.
|
||||||
|
@ -52,10 +60,18 @@ For output images, it is necessary to specify the width and height. Additionally
|
||||||
to filter out regions smaller than this minimum size. This minimum size is defined as the ratio of the text region area
|
to filter out regions smaller than this minimum size. This minimum size is defined as the ratio of the text region area
|
||||||
to the image area, with a default value of zero. To run the dataset generator, use the following command:
|
to the image area, with a default value of zero. To run the dataset generator, use the following command:
|
||||||
|
|
||||||
`python generate_gt_for_training.py machine-based-reading-order -dx "dir of GT xml files" -domi "dir where output images
|
```shell
|
||||||
will be written" -docl "dir where the labels will be written" -ih "height" -iw "width" -min "min area ratio"`
|
python generate_gt_for_training.py machine-based-reading-order \
|
||||||
|
-dx "dir of GT xml files" \
|
||||||
|
-domi "dir where output images will be written" \
|
||||||
|
-docl "dir where the labels will be written" \
|
||||||
|
-ih "height" \
|
||||||
|
-iw "width" \
|
||||||
|
-min "min area ratio"
|
||||||
|
```
|
||||||
|
|
||||||
### pagexml2label
|
### pagexml2label
|
||||||
|
|
||||||
pagexml2label is designed to generate labels from GT page XML files for various pixel-wise segmentation use cases,
|
pagexml2label is designed to generate labels from GT page XML files for various pixel-wise segmentation use cases,
|
||||||
including 'layout,' 'textline,' 'printspace,' 'glyph,' and 'word' segmentation.
|
including 'layout,' 'textline,' 'printspace,' 'glyph,' and 'word' segmentation.
|
||||||
To train a pixel-wise segmentation model, we require images along with their corresponding labels. Our training script
|
To train a pixel-wise segmentation model, we require images along with their corresponding labels. Our training script
|
||||||
|
@ -119,9 +135,13 @@ graphic region, "stamp" has its own class, while all other types are classified
|
||||||
region" are also present in the label. However, other regions like "noise region" and "table region" will not be
|
region" are also present in the label. However, other regions like "noise region" and "table region" will not be
|
||||||
included in the label PNG file, even if they have information in the page XML files, as we chose not to include them.
|
included in the label PNG file, even if they have information in the page XML files, as we chose not to include them.
|
||||||
|
|
||||||
`python generate_gt_for_training.py pagexml2label -dx "dir of GT xml files" -do "dir where output label png files will
|
```sh
|
||||||
be written" -cfg "custom config json file" -to "output type which has 2d and 3d. 2d is used for training and 3d is just
|
python generate_gt_for_training.py pagexml2label \
|
||||||
to visualise the labels" "`
|
-dx "dir of GT xml files" \
|
||||||
|
-do "dir where output label png files will be written" \
|
||||||
|
-cfg "custom config json file" \
|
||||||
|
-to "output type which has 2d and 3d. 2d is used for training and 3d is just to visualise the labels"
|
||||||
|
```
|
||||||
|
|
||||||
We have also defined an artificial class that can be added to the boundary of text region types or text lines. This key
|
We have also defined an artificial class that can be added to the boundary of text region types or text lines. This key
|
||||||
is called "artificial_class_on_boundary." If users want to apply this to certain text regions in the layout use case,
|
is called "artificial_class_on_boundary." If users want to apply this to certain text regions in the layout use case,
|
||||||
|
@ -169,12 +189,19 @@ in this scenario, since cropping will be applied to the label files, the directo
|
||||||
provided to ensure that they are cropped in sync with the labels. This ensures that the correct images and labels
|
provided to ensure that they are cropped in sync with the labels. This ensures that the correct images and labels
|
||||||
required for training are obtained. The command should resemble the following:
|
required for training are obtained. The command should resemble the following:
|
||||||
|
|
||||||
`python generate_gt_for_training.py pagexml2label -dx "dir of GT xml files" -do "dir where output label png files will
|
```sh
|
||||||
be written" -cfg "custom config json file" -to "output type which has 2d and 3d. 2d is used for training and 3d is just
|
python generate_gt_for_training.py pagexml2label \
|
||||||
to visualise the labels" -ps -di "dir where the org images are located" -doi "dir where the cropped output images will
|
-dx "dir of GT xml files" \
|
||||||
be written" `
|
-do "dir where output label png files will be written" \
|
||||||
|
-cfg "custom config json file" \
|
||||||
|
-to "output type which has 2d and 3d. 2d is used for training and 3d is just to visualise the labels" \
|
||||||
|
-ps \
|
||||||
|
-di "dir where the org images are located" \
|
||||||
|
-doi "dir where the cropped output images will be written"
|
||||||
|
```
|
||||||
|
|
||||||
## Train a model
|
## Train a model
|
||||||
|
|
||||||
### classification
|
### classification
|
||||||
|
|
||||||
For the classification use case, we haven't provided a ground truth generator, as it's unnecessary. For classification,
|
For the classification use case, we haven't provided a ground truth generator, as it's unnecessary. For classification,
|
||||||
|
@ -225,7 +252,9 @@ And the "dir_eval" the same structure as train directory:
|
||||||
|
|
||||||
The classification model can be trained using the following command line:
|
The classification model can be trained using the following command line:
|
||||||
|
|
||||||
`python train.py with config_classification.json`
|
```sh
|
||||||
|
python train.py with config_classification.json
|
||||||
|
```
|
||||||
|
|
||||||
As evident in the example JSON file above, for classification, we utilize a "f1_threshold_classification" parameter.
|
As evident in the example JSON file above, for classification, we utilize a "f1_threshold_classification" parameter.
|
||||||
This parameter is employed to gather all models with an evaluation f1 score surpassing this threshold. Subsequently,
|
This parameter is employed to gather all models with an evaluation f1 score surpassing this threshold. Subsequently,
|
||||||
|
@ -276,6 +305,7 @@ The classification model can be trained like the classification case command lin
|
||||||
### Segmentation (Textline, Binarization, Page extraction and layout) and enhancement
|
### Segmentation (Textline, Binarization, Page extraction and layout) and enhancement
|
||||||
|
|
||||||
#### Parameter configuration for segmentation or enhancement usecases
|
#### Parameter configuration for segmentation or enhancement usecases
|
||||||
|
|
||||||
The following parameter configuration can be applied to all segmentation use cases and enhancements. The augmentation,
|
The following parameter configuration can be applied to all segmentation use cases and enhancements. The augmentation,
|
||||||
its sub-parameters, and continued training are defined only for segmentation use cases and enhancements, not for
|
its sub-parameters, and continued training are defined only for segmentation use cases and enhancements, not for
|
||||||
classification and machine-based reading order, as you can see in their example config files.
|
classification and machine-based reading order, as you can see in their example config files.
|
||||||
|
@ -355,6 +385,7 @@ command, similar to the process for classification and reading order:
|
||||||
`python train.py with config_classification.json`
|
`python train.py with config_classification.json`
|
||||||
|
|
||||||
#### Binarization
|
#### Binarization
|
||||||
|
|
||||||
An example config json file for binarization can be like this:
|
An example config json file for binarization can be like this:
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
|
@ -550,6 +581,7 @@ For page segmentation (or printspace or border segmentation), the model needs to
|
||||||
hence the patches parameter should be set to false.
|
hence the patches parameter should be set to false.
|
||||||
|
|
||||||
#### layout segmentation
|
#### layout segmentation
|
||||||
|
|
||||||
An example config json file for layout segmentation with 5 classes (including background) can be like this:
|
An example config json file for layout segmentation with 5 classes (including background) can be like this:
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
|
@ -605,26 +637,41 @@ An example config json file for layout segmentation with 5 classes (including ba
|
||||||
## Inference with the trained model
|
## Inference with the trained model
|
||||||
|
|
||||||
### classification
|
### classification
|
||||||
|
|
||||||
For conducting inference with a trained model, you simply need to execute the following command line, specifying the
|
For conducting inference with a trained model, you simply need to execute the following command line, specifying the
|
||||||
directory of the model and the image on which to perform inference:
|
directory of the model and the image on which to perform inference:
|
||||||
|
|
||||||
`python inference.py -m "model dir" -i "image" `
|
```sh
|
||||||
|
python inference.py -m "model dir" -i "image"
|
||||||
|
```
|
||||||
|
|
||||||
This will straightforwardly return the class of the image.
|
This will straightforwardly return the class of the image.
|
||||||
|
|
||||||
### machine based reading order
|
### machine based reading order
|
||||||
|
|
||||||
To infer the reading order using a reading order model, we need a page XML file containing layout information but
|
To infer the reading order using a reading order model, we need a page XML file containing layout information but
|
||||||
without the reading order. We simply need to provide the model directory, the XML file, and the output directory.
|
without the reading order. We simply need to provide the model directory, the XML file, and the output directory.
|
||||||
The new XML file with the added reading order will be written to the output directory with the same name.
|
The new XML file with the added reading order will be written to the output directory with the same name.
|
||||||
We need to run:
|
We need to run:
|
||||||
|
|
||||||
`python inference.py -m "model dir" -xml "page xml file" -o "output dir to write new xml with reading order" `
|
```sh
|
||||||
|
python inference.py \
|
||||||
|
-m "model dir" \
|
||||||
|
-xml "page xml file" \
|
||||||
|
-o "output dir to write new xml with reading order"
|
||||||
|
```
|
||||||
|
|
||||||
### Segmentation (Textline, Binarization, Page extraction and layout) and enhancement
|
### Segmentation (Textline, Binarization, Page extraction and layout) and enhancement
|
||||||
For conducting inference with a trained model for segmentation and enhancement you need to run the following command
|
For conducting inference with a trained model for segmentation and enhancement you need to run the following command
|
||||||
line:
|
line:
|
||||||
|
|
||||||
`python inference.py -m "model dir" -i "image" -p -s "output image" `
|
```sh
|
||||||
|
python inference.py \
|
||||||
|
-m "model dir" \
|
||||||
|
-i "image" \
|
||||||
|
-p \
|
||||||
|
-s "output image"
|
||||||
|
```
|
||||||
|
|
||||||
Note that in the case of page extraction the -p flag is not needed.
|
Note that in the case of page extraction the -p flag is not needed.
|
||||||
|
|
||||||
|
|
|
@ -289,27 +289,26 @@ def test_run_eynollah_ocr_filename(tmp_path, subtests, pytestconfig, caplog):
|
||||||
assert len(out_texts) >= 2, ("result is inaccurate", out_texts)
|
assert len(out_texts) >= 2, ("result is inaccurate", out_texts)
|
||||||
assert sum(map(len, out_texts)) > 100, ("result is inaccurate", out_texts)
|
assert sum(map(len, out_texts)) > 100, ("result is inaccurate", out_texts)
|
||||||
|
|
||||||
# kba Fri Sep 26 12:53:49 CEST 2025
|
@pytest.mark.skip("Disabled until NHWC/NCHW error in https://github.com/qurator-spk/eynollah/actions/runs/18019655200/job/51273541895 debugged")
|
||||||
# Disabled until NHWC/NCHW error in https://github.com/qurator-spk/eynollah/actions/runs/18019655200/job/51273541895 debugged
|
def test_run_eynollah_ocr_directory(tmp_path, subtests, pytestconfig, caplog):
|
||||||
# def test_run_eynollah_ocr_directory(tmp_path, subtests, pytestconfig, caplog):
|
indir = testdir.joinpath('resources')
|
||||||
# indir = testdir.joinpath('resources')
|
outdir = tmp_path
|
||||||
# outdir = tmp_path
|
args = [
|
||||||
# args = [
|
'-m', MODELS_OCR,
|
||||||
# '-m', MODELS_OCR,
|
'-di', str(indir),
|
||||||
# '-di', str(indir),
|
'-dx', str(indir),
|
||||||
# '-dx', str(indir),
|
'-o', str(outdir),
|
||||||
# '-o', str(outdir),
|
]
|
||||||
# ]
|
if pytestconfig.getoption('verbose') > 0:
|
||||||
# if pytestconfig.getoption('verbose') > 0:
|
args.extend(['-l', 'DEBUG'])
|
||||||
# args.extend(['-l', 'DEBUG'])
|
caplog.set_level(logging.INFO)
|
||||||
# caplog.set_level(logging.INFO)
|
def only_eynollah(logrec):
|
||||||
# def only_eynollah(logrec):
|
return logrec.name == 'eynollah'
|
||||||
# return logrec.name == 'eynollah'
|
runner = CliRunner()
|
||||||
# runner = CliRunner()
|
with caplog.filtering(only_eynollah):
|
||||||
# with caplog.filtering(only_eynollah):
|
result = runner.invoke(ocr_cli, args, catch_exceptions=False)
|
||||||
# result = runner.invoke(ocr_cli, args, catch_exceptions=False)
|
assert result.exit_code == 0, result.stdout
|
||||||
# assert result.exit_code == 0, result.stdout
|
logmsgs = [logrec.message for logrec in caplog.records]
|
||||||
# logmsgs = [logrec.message for logrec in caplog.records]
|
# FIXME: ocr has no logging!
|
||||||
# # FIXME: ocr has no logging!
|
#assert any(True for logmsg in logmsgs if logmsg.startswith('???')), logmsgs
|
||||||
# #assert any(True for logmsg in logmsgs if logmsg.startswith('???')), logmsgs
|
assert len(list(outdir.iterdir())) == 2
|
||||||
# assert len(list(outdir.iterdir())) == 2
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue