📝 update README

2026-06-28 07:49:21 +02:00 · 2025-09-26 15:07:32 +02:00 · 2025-09-26 15:07:32 +02:00 · 3123add815
commit 3123add815
parent 830cc2c30a
4 changed files with 141 additions and 62 deletions
--- a/docs/models.md
+++ b/docs/models.md
@ -1,5 +1,6 @@
 # Models documentation
-This suite of 14 models presents a document layout analysis (DLA) system for historical documents implemented by 
+
+This suite of 15 models presents a document layout analysis (DLA) system for historical documents implemented by 
 pixel-wise segmentation using a combination of a ResNet50 encoder with various U-Net decoders. In addition, heuristic 
 methods are applied to detect marginals and to determine the reading order of text regions.

@ -23,6 +24,7 @@ See the flowchart below for the different stages and how they interact:
 ## Models

 ### Image enhancement
+
 Model card: [Image Enhancement](https://huggingface.co/SBB/eynollah-enhancement)

 This model addresses image resolution, specifically targeting documents with suboptimal resolution. In instances where 
@ -30,12 +32,14 @@ the detection of document layout exhibits inadequate performance, the proposed e
 the quality and clarity of the images, thus facilitating enhanced visual interpretation and analysis.

 ### Page extraction / border detection
+
 Model card: [Page Extraction/Border Detection](https://huggingface.co/SBB/eynollah-page-extraction)

 A problem that can negatively affect OCR are black margins around a page caused by document scanning. A deep learning 
 model helps to crop to the page borders by using a pixel-wise segmentation method.

 ### Column classification
+
 Model card: [Column Classification](https://huggingface.co/SBB/eynollah-column-classifier)

 This model is a trained classifier that recognizes the number of columns in a document by use of a training set with 
@ -43,6 +47,7 @@ manual classification of all documents into six classes with either one, two, th
 respectively.

 ### Binarization
+
 Model card: [Binarization](https://huggingface.co/SBB/eynollah-binarization)

 This model is designed to tackle the intricate task of document image binarization, which involves segmentation of the 
@ -52,6 +57,7 @@ capability of the model enables improved accuracy and reliability in subsequent
 enhanced document understanding and interpretation.

 ### Main region detection
+
 Model card: [Main Region Detection](https://huggingface.co/SBB/eynollah-main-regions)

 This model has employed a different set of labels, including an artificial class specifically designed to encompass the 
@ -61,6 +67,7 @@ during the inference phase. By incorporating this methodology, improved efficien
 model's ability to accurately identify and classify text regions within documents.

 ### Main region detection (with scaling augmentation)
+
 Model card: [Main Region Detection (with scaling augmentation)](https://huggingface.co/SBB/eynollah-main-regions-aug-scaling)

 Utilizing scaling augmentation, this model leverages the capability to effectively segment elements of extremely high or 
@ -69,12 +76,14 @@ categorizing and isolating such elements, thereby enhancing its overall performa
 documents with varying scale characteristics.

 ### Main region detection (with rotation augmentation)
+
 Model card: [Main Region Detection (with rotation augmentation)](https://huggingface.co/SBB/eynollah-main-regions-aug-rotation)

 This model takes advantage of rotation augmentation. This helps the tool to segment the vertical text regions in a 
 robust way.

 ### Main region detection (ensembled)
+
 Model card: [Main Region Detection (ensembled)](https://huggingface.co/SBB/eynollah-main-regions-ensembled)

 The robustness of this model is attained through an ensembling technique that combines the weights from various epochs. 
@ -82,16 +91,19 @@ By employing this approach, the model achieves a high level of resilience and st
 strengths of multiple epochs to enhance its overall performance and deliver consistent and reliable results.

 ### Full region detection (1,2-column documents)
+
 Model card: [Full Region Detection (1,2-column documents)](https://huggingface.co/SBB/eynollah-full-regions-1column)

 This model deals with documents comprising of one and two columns.

 ### Full region detection (3,n-column documents)
+
 Model card: [Full Region Detection (3,n-column documents)](https://huggingface.co/SBB/eynollah-full-regions-3pluscolumn)

 This model is responsible for detecting headers and drop capitals in documents with three or more columns.

 ### Textline detection
+
 Model card: [Textline Detection](https://huggingface.co/SBB/eynollah-textline)

 The method for textline detection combines deep learning and heuristics. In the deep learning part, an image-to-image 
@ -106,6 +118,7 @@ segmentation is first deskewed and then the textlines are separated with the sam
 textline bounding boxes. Later, the strap is rotated back into its original orientation.

 ### Textline detection (light)
+
 Model card: [Textline Detection Light (simpler but faster method)](https://huggingface.co/SBB/eynollah-textline_light)

 The method for textline detection combines deep learning and heuristics. In the deep learning part, an image-to-image 
@ -119,6 +132,7 @@ enhancing the model's ability to accurately identify and delineate individual te
 eliminates the need for additional heuristics in extracting textline contours. 

 ### Table detection
+
 Model card: [Table Detection](https://huggingface.co/SBB/eynollah-tables)

 The objective of this model is to perform table segmentation in historical document images. Due to the pixel-wise 
@ -128,17 +142,21 @@ effectively identify and delineate tables within the historical document images,
 enabling subsequent analysis and interpretation.

 ### Image detection
+
 Model card: [Image Detection](https://huggingface.co/SBB/eynollah-image-extraction)

 This model is used for the task of illustration detection only.

 ### Reading order detection
+
 Model card: [Reading Order Detection]()

 TODO

 ## Heuristic methods
+
 Additionally, some heuristic methods are employed to further improve the model predictions: 
+
 * After border detection, the largest contour is determined by a bounding box, and the image cropped to these coordinates.
 * For text region detection, the image is scaled up to make it easier for the model to detect background space between text regions.
 * A minimum area is defined for text regions in relation to the overall image dimensions, so that very small regions that are noise can be filtered out.