Commit graph

1566 commits

Author SHA1 Message Date
Robert Sachunsky
8ffc4ed8d3 Eynollah_ocr: adapt to inference model, improve and simplify…
- drop `end_character` mechanics and `characters` model type
  for decoding output probability (not needed)
- drop `decode_batch_predictions()` and `num_to_char` model type
  (part of inference model)
- drop roughshot confidence estimation calculation
  (returned precisely by inference model)
- adapt model prediction to inference model: just omit zeros,
  map to bytes, filter OOV tokens and decode UTF-8 to str
- if no binarization input was provided, then compute it on the fly
  using `binarization` model
- also apply `min_conf_value_of_textline_text` (as for TrOCR)
- batching over entire page instead of region-wise
  (which underfilled batches)
- simplify and avoid copied redundant code
- rename `extracted_conf_value_merged` → `extracted_confs_merged`
- move `batched()` from `utils.utils_ocr` to `utils`
- drop `utils_ocr.distortion_free_resize()` (not needed)
- simplify `utils_ocr.break_curved_line_into_small_pieces_and_then_merge()`
- drop `utils_ocr.return_textline_contour_with_added_box_coordinate()`
  and `utils_ocr.return_rnn_cnn_ocr_of_given_textlines()` (not needed)
2026-06-02 21:20:06 +02:00
Robert Sachunsky
a391ee24e6 Predictor: handle multi-input and/or multi-output cases 2026-06-02 21:18:22 +02:00
Robert Sachunsky
c79b73dcc8 cnn-rnn-ocr: move CTC decoder and string decoder to inference model…
- ModelZoo: drop `num_to_char` and `characters` model types,
  also drop `_load_characters()` and `_load_num_to_char()` loaders
- `ModelZoo.load_models()`: use Predictor for `ocr` models, too
- `ModelZoo.load_model()`: delegate runtime/inference conversion of
  OCR models to `eynollah.training.models.cnn_rnn_ocr_model4inference`
- `training.models`: add (purely functional) Keras layer `CTCDecoder`
  for inference on top of softmax output, but using TF backend
  function instead of (broken) `Keras.backend.ctc_decode()`, while
  switching to beam search (instead of greedy) and also returning
  decoded path probability
- `training.models.cnn_rnn_ocr_model()` w/ `inference=True`:
  * add kwarg `characters_txt_file` for file path of character set
  * configure secondary tensor path on OCR graph for binarized input
    (additional input `image_bin`, averaging softmax outputs)
  * use new `CTCDecoder` layer and inverse `StringLookup` layer to
    decode from softmax output to tf.string; so inference models
    now have 2 inputs (RGB, binarized) and 2 outputs (text, prob)
  * since `np.dtype=object` cannot be handled by SharedMemory (as
    needed by Predictor queues), also replace tf.string by tf.uint8
    arrays
  * use this for `training convert` for OCR models w/ `--rebuild`
- `training.models.cnn_rnn_ocr_model4inference`:
  * new function which does the same but loads an existing OCR model
    in training configuration (i.e. without prior `inference=True`)
  * use this for `training convert` for OCR models w/o `--rebuild`
2026-06-02 20:26:42 +02:00
Robert Sachunsky
13f2f81c45 ModelZoo: support inference with ONNX/TensorRT…
- comment out ad-hoc conversion/loading of autosized models
- refactor predictor backends for model types into separate functions
- only attempt inference conversion of cnn-rnn-ocr model
  if applicable (`ctc_loss` layer still present)
- apply VRAM limits across model types
  (Keras, TF-Serving, ONNX)
- apply TF device selection across model types
  (Keras, TF-Serving)
- implement predictor backend for ONNX models:
  - using onnxruntime
  - covering CUDA and TensorRT providers
  - trying to support manual device selection
  - hiding session management details
  - converting float32 to float16
2026-05-28 18:08:08 +02:00
Robert Sachunsky
f833a516e7 training: add CLI command convert
- move `train_cli` from cli.py to train.py,
  add docstring
- add `convert_cli`:
  - load any (supported) model format
    (i.e. not exported TF-Serving or ONNX)
  - if SavedModel format with `config.json` present,
    and `--rebuild` is requested, create new model
    from `models.get_model()` for this configuration,
    and load weights
  - if model type is `cnn-rnn-ocr` and configuration
    is still for training (`ctc_loss`), then extract
    inference model
  - apply requested `--format` conversion:
    HDF5, Keras native, Keras SavedModel, TF-Serving SavedModel
    or ONNX
  - if output format is directory (i.e. SavedModel),
    then copy over `config.json`, too
- reload-models-v0.8.mk:
  - adapt recipe for converter CLI (i.e. `--format tf-serving`
    w/ `--rebuild` if possible)
  - add targets for other useful data formats
  - extend list of model names to all current models
    (as all benefit from TF-Serving export)
  - cancel ONNX conversion for vision transformer models
    (as these do not work, yet)
2026-05-28 17:48:21 +02:00
Robert Sachunsky
62b55a3809 train params: drop reload_weights, re-use dir_of_start_model
- drop ad-hoc configuration parameter `reload_weights`
  (used for conversion/export of models for inference,
   to be replaced by extra CLI)
- re-interprete `dir_of_start_model` to also load weights
  if not `continue_training`
2026-05-28 17:42:55 +02:00
Robert Sachunsky
093030f503 train/models: move all model builders to models.get_model()
- models: add new `get_model()`, passing in Sacred config
  to capture builder function arguments
- train: fewer imports
- train: no need to pass `custom_objects` if loading with
  `compile=False` (and we custom-compile later, anyway)
2026-05-28 17:37:45 +02:00
Robert Sachunsky
faef1967f8 models.cnn_rnn_ocr_model: add inference option, drop model name 2026-05-28 17:33:57 +02:00
Robert Sachunsky
c4a7eec5b3 models: cosmetics
- using `Reshape`, do not pass `target_shape` as kwarg
- add a default `name` for `Patches` and `PatchEncoder`
2026-05-27 01:58:21 +02:00
Robert Sachunsky
9801129aa6 estimate_skew_contours: ensure retval is always float 2026-05-22 12:37:07 +02:00
Robert Sachunsky
26afc5ddab ModelZoo: ensure exported TensorShape is converted to plain tuple 2026-05-22 12:35:44 +02:00
Robert Sachunsky
0836230c6b utils_ocr: avoid module-level import of TF 2026-05-21 22:50:53 +02:00
Robert Sachunsky
f3a93983c0 ModelZoo: add ocr key for memory_limit 2026-05-21 22:50:13 +02:00
Robert Sachunsky
ea41dcae1d trocr: use beam search instead of greedy decoding 2026-05-21 17:52:27 +02:00
Robert Sachunsky
074753a98e ModelZoo: fix Torch device selection 2026-05-21 17:25:53 +02:00
Robert Sachunsky
000e4ac8d8 trocr: extract confidence, too 2026-05-21 17:25:39 +02:00
Robert Sachunsky
f3649adbf2 trocr: apply do_not_mask_with_textline_contour here, too 2026-05-21 17:23:11 +02:00
Robert Sachunsky
1d67e65f11 trocr: simplify, batch over entire page…
- batching over entire page instead of region-wise
  (underfilling batches)
- avoid copied redundant code
2026-05-21 15:48:21 +02:00
Robert Sachunsky
d50bd7c650 trocr: avoid warnings by passing clean_up_tokenization_spaces=False 2026-05-21 14:20:51 +02:00
Robert Sachunsky
f9f9130dbb do_order_of_regions: remove redundant+overcautious assertion 2026-05-21 03:21:36 +02:00
Robert Sachunsky
bf7ec0233d ModelZoo.load_model: use memory_limit instead of memory_growth
- growth strategy is more flexible, but uses much more VRAM
- limit strategy needs to be calibrated to models (currently fixed),
  and batch size, but needs much less VRAM and is faster
2026-05-21 02:43:34 +02:00
Robert Sachunsky
94a5e9da14 ModelZoo.load_model: avoid attempting to load exported models as Keras
models (which causes a warning), but switch to TF-Serving import right away
2026-05-21 02:41:19 +02:00
Robert Sachunsky
7f2bf715df ModelZoo.load_model: fix loading exported vs saved models 2026-05-21 02:39:59 +02:00
Robert Sachunsky
3de1407d18 drop unnecessary TF / Torch imports 2026-05-21 02:38:20 +02:00
Robert Sachunsky
bdfebd2c70 reload_weights: save()export() w/ serve() inference 2026-05-19 03:40:18 +02:00
Robert Sachunsky
86adaf299a training.models.transformer_block: tf.reshape → Keras Reshape layer 2026-05-19 03:40:16 +02:00
Robert Sachunsky
9efce5e9f2 Predictor.shutdown: use join() instead of terminate() 2026-05-19 03:40:07 +02:00
Robert Sachunsky
ffe5cdc519 ModelZoo.shutdown: drop extra del (already done by shutdown()) 2026-05-19 03:40:05 +02:00
Robert Sachunsky
481c286da9 ModelZoo.load_model: no XLA compilation 2026-05-19 03:40:05 +02:00
Robert Sachunsky
f329e10a80 test_layout: rm ignored --allow_scaling option 2026-05-19 03:40:04 +02:00
Robert Sachunsky
17b311441a model_zoo: also parse comma/colon syntax for device in Torch case 2026-05-19 03:40:03 +02:00
Robert Sachunsky
be4fe8c263 contour: drop unused functions depending on rotation_image_new() 2026-05-19 03:40:02 +02:00
Robert Sachunsky
87cce6c963 CLI tests: add opt-in envvar EYNOLLAH_OPTIONS for device selection,
model directory etc.
2026-05-19 03:40:01 +02:00
Robert Sachunsky
1ed633bc25 test_model_zoo: adapt (load_models instead of load_model) 2026-05-19 03:40:00 +02:00
Robert Sachunsky
21ecb043f7 CLIs: move --device option to group level 2026-05-19 03:39:59 +02:00
Robert Sachunsky
7ed1a1ebac CLIs: allow -h and show defaults uniformly, harmonise help, drop
remaining redundant negative options
2026-05-19 03:39:56 +02:00
Robert Sachunsky
cd62f13872 eynollah_ocr: make work again, re-use Eynollah base class…
- re-use Eynollah base class
- use `ModelZoo.load_models()` instead of `load_model()`
- pass in `device` init kwarg, delegate to `ModelZoo.load_models()`
- `device`: return Torch device at loaded model tensors
  instead of ad-hoc selection
- make numeric init kwargs non-optional (only numeric)
2026-05-19 03:39:55 +02:00
Robert Sachunsky
ded668a256 model_zoo: fix clash between Predictor and direct (OCR) use-cases…
- `load_models()`: uniformly handle arg types
- `load_model()`: move handling of non-model categories
  to `load_models()`
- `load_model()`: move SavedModel preference over HDF5 to `model_path()`
- `_load_ocr_model()`: add user-selected device handling and reporting
  for Torch (as for TF)
- `_load_ocr_model()`: move (TF-based) CNN-RNN case to `load_model()`
  (including Keras layer mapping)
- `shutdown()`: only apply `shutdown()` to Predictor model types
2026-05-19 03:39:53 +02:00
Robert Sachunsky
98e6fbbcbb mbreorder: make work again, re-use Eynollah base class 2026-05-19 03:39:52 +02:00
Robert Sachunsky
7e8b9311d3 Revert "test_model_zoo: fix calls"
This reverts commit 5a98f55be3.
2026-05-19 03:32:37 +02:00
Robert Sachunsky
a1449da1d1 Revert "fix model loading in mb_ro and ocr"
This reverts commit 218a95e6a0.
2026-05-19 03:32:19 +02:00
kba
1df32eba87 CD: base docker image: typo {,v}3.13.0 2026-05-11 13:41:30 +02:00
kba
d7337a3080 CD: base docker image on versioned ocrd/core-cuda-tf2:v3.13.0 2026-05-11 13:38:36 +02:00
kba
e612db2bb1 📦 v0.8.0 2026-05-11 13:16:30 +02:00
kba
6cfbd93ac7 📝 changelog 2026-05-11 13:14:56 +02:00
kba
c7104c2852 Merge branch 'prepare-release-v0.8.0' 2026-05-11 13:12:19 +02:00
kba
5a98f55be3 test_model_zoo: fix calls 2026-05-11 12:22:24 +02:00
kba
218a95e6a0 fix model loading in mb_ro and ocr 2026-05-11 12:19:20 +02:00
kba
2035b07b55 Merge remote-tracking branch 'bertsky/ro-fixes-final' into prepare-release-v0.8.0
# Conflicts:
#	requirements-ocr.txt
2026-05-11 09:46:17 +02:00
Robert Sachunsky
db87aa995d reqs for OCR: relax ad5f2272 (depending on Python version) 2026-05-11 03:15:54 +02:00