providing ocr model evaluation metrics

This commit is contained in:
vahidrezanezhad 2025-10-22 21:30:06 +02:00
parent 59eb4fd3be
commit b56bb44284

View file

@ -157,6 +157,38 @@ The model extracts the reading order of text regions from the layout by classify
### OCR
We have trained three OCR models: two CNN-RNNbased models and one transformer-based TrOCR model. The CNN-RNN models are generally faster and provide better results in most cases, though their performance decreases with heavily degraded images. The TrOCR model, on the other hand, is computationally expensive and slower during inference, but it can possibly produce better results on strongly degraded images.
#### CNN-RNN model: model_eynollah_ocr_cnnrnn_20250805
This model is trained on data where most of the samples are in Fraktur german script.
| Dataset | Input | CER | WER |
|-----------------------|:-------|:-----------|:----------|
| OCR-D-GT-Archiveform | BIN | 0.02147 | 0.05685 |
| OCR-D-GT-Archiveform | RGB | | |
#### CNN-RNN model: model_eynollah_ocr_cnnrnn_20250904 (Default)
Compared to the model_eynollah_ocr_cnnrnn_20250805 model, this model is trained on a larger proportion of Antiqua data and achieves superior performance.
| Dataset | Input | CER | WER |
|-----------------------|:------------|:-----------|:----------|
| OCR-D-GT-Archiveform | BIN | 0.01635 | 0.05410 |
| OCR-D-GT-Archiveform | RGB | 0.01471 | 0.05813 |
| BLN600 | RGB | 0.04409 | 0.08879 |
| BLN600 | Enhanced | 0.03599 | 0.06244 |
#### Transformer OCR model: model_eynollah_ocr_trocr_20250919
This transformer OCR model is trained on the same data as model_eynollah_ocr_trocr_20250919.
| Dataset | Input | CER | WER |
|-----------------------|:------------|:-----------|:----------|
| OCR-D-GT-Archiveform | BIN | 0.01841 | 0.05589 |
| OCR-D-GT-Archiveform | RGB | | |
| BLN600 | RGB | 0.06347 | 0.13853 |
## Heuristic methods
Additionally, some heuristic methods are employed to further improve the model predictions: