diff --git a/docs/models.md b/docs/models.md index 7f83b33..a42cdb2 100644 --- a/docs/models.md +++ b/docs/models.md @@ -157,6 +157,38 @@ The model extracts the reading order of text regions from the layout by classify ### OCR We have trained three OCR models: two CNN-RNN–based models and one transformer-based TrOCR model. The CNN-RNN models are generally faster and provide better results in most cases, though their performance decreases with heavily degraded images. The TrOCR model, on the other hand, is computationally expensive and slower during inference, but it can possibly produce better results on strongly degraded images. + +#### CNN-RNN model: model_eynollah_ocr_cnnrnn_20250805 + +This model is trained on data where most of the samples are in Fraktur german script. + +| Dataset | Input | CER | WER | +|-----------------------|:-------|:-----------|:----------| +| OCR-D-GT-Archiveform | BIN | 0.02147 | 0.05685 | +| OCR-D-GT-Archiveform | RGB | | | + +#### CNN-RNN model: model_eynollah_ocr_cnnrnn_20250904 (Default) + +Compared to the model_eynollah_ocr_cnnrnn_20250805 model, this model is trained on a larger proportion of Antiqua data and achieves superior performance. + +| Dataset | Input | CER | WER | +|-----------------------|:------------|:-----------|:----------| +| OCR-D-GT-Archiveform | BIN | 0.01635 | 0.05410 | +| OCR-D-GT-Archiveform | RGB | 0.01471 | 0.05813 | +| BLN600 | RGB | 0.04409 | 0.08879 | +| BLN600 | Enhanced | 0.03599 | 0.06244 | + + +#### Transformer OCR model: model_eynollah_ocr_trocr_20250919 + +This transformer OCR model is trained on the same data as model_eynollah_ocr_trocr_20250919. + +| Dataset | Input | CER | WER | +|-----------------------|:------------|:-----------|:----------| +| OCR-D-GT-Archiveform | BIN | 0.01841 | 0.05589 | +| OCR-D-GT-Archiveform | RGB | | | +| BLN600 | RGB | 0.06347 | 0.13853 | + ## Heuristic methods Additionally, some heuristic methods are employed to further improve the model predictions: