Commit graph

1576 commits

Author SHA1 Message Date
Robert Sachunsky
08946067ac ModelZoo ONNX backend: handle multiple inputs, too 2026-06-12 14:54:51 +02:00
Robert Sachunsky
9d2412080f training.models for cnn-rnn-ocr: fix config names for height/width…
- rename `image_height` → `input_height`
- rename `image_width` → `input_width`
2026-06-12 14:52:23 +02:00
Robert Sachunsky
4181e03bc9 training convert --rebuild for cnn-rnn-ocr: override charset file…
when rebuilding the inference model for cnn-rnn-ocr,
- open the old `characters_org.txt` file for the charset
- use it to pass the actual `n_classes` (overriding the config)
- use its path to pass the `characters_txt_file` (overriding the config)
2026-06-12 14:48:47 +02:00
Robert Sachunsky
348ac95ad3 Eynollah_ocr: drop fixed input sizes…
- tr-ocr: no need to resize images in advance (done by model, anyway)
- cnn-rnn-ocr: get model size from model's input shape
2026-06-03 20:59:00 +02:00
Robert Sachunsky
24c7d4c277 update trocr smoke test, add cnnrnn ocr smoke test 2026-06-03 20:58:05 +02:00
Robert Sachunsky
27ca9733db ModelZoo ONNX backend for inference: support multi-input or -output 2026-06-03 20:57:02 +02:00
Robert Sachunsky
38fe4d33ad Predictor for multi-input models: present as list instead of tuple…
(because TF-Serving expects that and cannot cast)
2026-06-03 20:56:00 +02:00
Robert Sachunsky
4e7e1c06b9 trocr viarant for Predictor runtime: no model size for input_shape…
Because transformers v4 and v5 API for image preprocessor differs,
and the model-internal image input sizes are actually irrelevant,
because the preprocessor will resize them anyway, and there is no
batch dimension (because the input images will have different shapes),
do not advertise this information in `.input_shape`.
2026-06-03 20:51:56 +02:00
Robert Sachunsky
f447a9f248 trocr: move preprocessor and decoder into model object, too…
- ModelZoo: drop `trocr_processor` model type
- `ModelZoo.load_models()`: use Predictor for `ocr_tr` models, too
- `ModelZoo.load_model()`: for `ocr_tr`, load processor and model,
  then define a function object as stand-in for the common model
  interface based on Keras (w/ `.predict_on_batch()`)
- Predictor: allow multi-input without actual batch dimension
  for `ocr_tr` models (because the model takes a list of original
  image arrays and resizes them to model shape internally)
- Eynollah_ocr: adapt (replacing preprocessing, prediction and
  decoding steps by a single `.predict()` call)
2026-06-03 03:41:44 +02:00
Robert Sachunsky
d2f2a1e06b Eynollah_ocr: correctly handle min_conf, improve writer…
- `min_conf_value_of_textline_text`: apply by skipping
  lines below threshold (instead of writing empty text),
  and delete their TextEquiv (if existing)
- `write_ocr()`: simplify, and ensure consistency between
  line and region level text correctly
2026-06-03 00:43:46 +02:00
Robert Sachunsky
8ffc4ed8d3 Eynollah_ocr: adapt to inference model, improve and simplify…
- drop `end_character` mechanics and `characters` model type
  for decoding output probability (not needed)
- drop `decode_batch_predictions()` and `num_to_char` model type
  (part of inference model)
- drop roughshot confidence estimation calculation
  (returned precisely by inference model)
- adapt model prediction to inference model: just omit zeros,
  map to bytes, filter OOV tokens and decode UTF-8 to str
- if no binarization input was provided, then compute it on the fly
  using `binarization` model
- also apply `min_conf_value_of_textline_text` (as for TrOCR)
- batching over entire page instead of region-wise
  (which underfilled batches)
- simplify and avoid copied redundant code
- rename `extracted_conf_value_merged` → `extracted_confs_merged`
- move `batched()` from `utils.utils_ocr` to `utils`
- drop `utils_ocr.distortion_free_resize()` (not needed)
- simplify `utils_ocr.break_curved_line_into_small_pieces_and_then_merge()`
- drop `utils_ocr.return_textline_contour_with_added_box_coordinate()`
  and `utils_ocr.return_rnn_cnn_ocr_of_given_textlines()` (not needed)
2026-06-02 21:20:06 +02:00
Robert Sachunsky
a391ee24e6 Predictor: handle multi-input and/or multi-output cases 2026-06-02 21:18:22 +02:00
Robert Sachunsky
c79b73dcc8 cnn-rnn-ocr: move CTC decoder and string decoder to inference model…
- ModelZoo: drop `num_to_char` and `characters` model types,
  also drop `_load_characters()` and `_load_num_to_char()` loaders
- `ModelZoo.load_models()`: use Predictor for `ocr` models, too
- `ModelZoo.load_model()`: delegate runtime/inference conversion of
  OCR models to `eynollah.training.models.cnn_rnn_ocr_model4inference`
- `training.models`: add (purely functional) Keras layer `CTCDecoder`
  for inference on top of softmax output, but using TF backend
  function instead of (broken) `Keras.backend.ctc_decode()`, while
  switching to beam search (instead of greedy) and also returning
  decoded path probability
- `training.models.cnn_rnn_ocr_model()` w/ `inference=True`:
  * add kwarg `characters_txt_file` for file path of character set
  * configure secondary tensor path on OCR graph for binarized input
    (additional input `image_bin`, averaging softmax outputs)
  * use new `CTCDecoder` layer and inverse `StringLookup` layer to
    decode from softmax output to tf.string; so inference models
    now have 2 inputs (RGB, binarized) and 2 outputs (text, prob)
  * since `np.dtype=object` cannot be handled by SharedMemory (as
    needed by Predictor queues), also replace tf.string by tf.uint8
    arrays
  * use this for `training convert` for OCR models w/ `--rebuild`
- `training.models.cnn_rnn_ocr_model4inference`:
  * new function which does the same but loads an existing OCR model
    in training configuration (i.e. without prior `inference=True`)
  * use this for `training convert` for OCR models w/o `--rebuild`
2026-06-02 20:26:42 +02:00
Robert Sachunsky
13f2f81c45 ModelZoo: support inference with ONNX/TensorRT…
- comment out ad-hoc conversion/loading of autosized models
- refactor predictor backends for model types into separate functions
- only attempt inference conversion of cnn-rnn-ocr model
  if applicable (`ctc_loss` layer still present)
- apply VRAM limits across model types
  (Keras, TF-Serving, ONNX)
- apply TF device selection across model types
  (Keras, TF-Serving)
- implement predictor backend for ONNX models:
  - using onnxruntime
  - covering CUDA and TensorRT providers
  - trying to support manual device selection
  - hiding session management details
  - converting float32 to float16
2026-05-28 18:08:08 +02:00
Robert Sachunsky
f833a516e7 training: add CLI command convert
- move `train_cli` from cli.py to train.py,
  add docstring
- add `convert_cli`:
  - load any (supported) model format
    (i.e. not exported TF-Serving or ONNX)
  - if SavedModel format with `config.json` present,
    and `--rebuild` is requested, create new model
    from `models.get_model()` for this configuration,
    and load weights
  - if model type is `cnn-rnn-ocr` and configuration
    is still for training (`ctc_loss`), then extract
    inference model
  - apply requested `--format` conversion:
    HDF5, Keras native, Keras SavedModel, TF-Serving SavedModel
    or ONNX
  - if output format is directory (i.e. SavedModel),
    then copy over `config.json`, too
- reload-models-v0.8.mk:
  - adapt recipe for converter CLI (i.e. `--format tf-serving`
    w/ `--rebuild` if possible)
  - add targets for other useful data formats
  - extend list of model names to all current models
    (as all benefit from TF-Serving export)
  - cancel ONNX conversion for vision transformer models
    (as these do not work, yet)
2026-05-28 17:48:21 +02:00
Robert Sachunsky
62b55a3809 train params: drop reload_weights, re-use dir_of_start_model
- drop ad-hoc configuration parameter `reload_weights`
  (used for conversion/export of models for inference,
   to be replaced by extra CLI)
- re-interprete `dir_of_start_model` to also load weights
  if not `continue_training`
2026-05-28 17:42:55 +02:00
Robert Sachunsky
093030f503 train/models: move all model builders to models.get_model()
- models: add new `get_model()`, passing in Sacred config
  to capture builder function arguments
- train: fewer imports
- train: no need to pass `custom_objects` if loading with
  `compile=False` (and we custom-compile later, anyway)
2026-05-28 17:37:45 +02:00
Robert Sachunsky
faef1967f8 models.cnn_rnn_ocr_model: add inference option, drop model name 2026-05-28 17:33:57 +02:00
Robert Sachunsky
c4a7eec5b3 models: cosmetics
- using `Reshape`, do not pass `target_shape` as kwarg
- add a default `name` for `Patches` and `PatchEncoder`
2026-05-27 01:58:21 +02:00
Robert Sachunsky
9801129aa6 estimate_skew_contours: ensure retval is always float 2026-05-22 12:37:07 +02:00
Robert Sachunsky
26afc5ddab ModelZoo: ensure exported TensorShape is converted to plain tuple 2026-05-22 12:35:44 +02:00
Robert Sachunsky
0836230c6b utils_ocr: avoid module-level import of TF 2026-05-21 22:50:53 +02:00
Robert Sachunsky
f3a93983c0 ModelZoo: add ocr key for memory_limit 2026-05-21 22:50:13 +02:00
Robert Sachunsky
ea41dcae1d trocr: use beam search instead of greedy decoding 2026-05-21 17:52:27 +02:00
Robert Sachunsky
074753a98e ModelZoo: fix Torch device selection 2026-05-21 17:25:53 +02:00
Robert Sachunsky
000e4ac8d8 trocr: extract confidence, too 2026-05-21 17:25:39 +02:00
Robert Sachunsky
f3649adbf2 trocr: apply do_not_mask_with_textline_contour here, too 2026-05-21 17:23:11 +02:00
Robert Sachunsky
1d67e65f11 trocr: simplify, batch over entire page…
- batching over entire page instead of region-wise
  (underfilling batches)
- avoid copied redundant code
2026-05-21 15:48:21 +02:00
Robert Sachunsky
d50bd7c650 trocr: avoid warnings by passing clean_up_tokenization_spaces=False 2026-05-21 14:20:51 +02:00
Robert Sachunsky
f9f9130dbb do_order_of_regions: remove redundant+overcautious assertion 2026-05-21 03:21:36 +02:00
Robert Sachunsky
bf7ec0233d ModelZoo.load_model: use memory_limit instead of memory_growth
- growth strategy is more flexible, but uses much more VRAM
- limit strategy needs to be calibrated to models (currently fixed),
  and batch size, but needs much less VRAM and is faster
2026-05-21 02:43:34 +02:00
Robert Sachunsky
94a5e9da14 ModelZoo.load_model: avoid attempting to load exported models as Keras
models (which causes a warning), but switch to TF-Serving import right away
2026-05-21 02:41:19 +02:00
Robert Sachunsky
7f2bf715df ModelZoo.load_model: fix loading exported vs saved models 2026-05-21 02:39:59 +02:00
Robert Sachunsky
3de1407d18 drop unnecessary TF / Torch imports 2026-05-21 02:38:20 +02:00
Robert Sachunsky
bdfebd2c70 reload_weights: save()export() w/ serve() inference 2026-05-19 03:40:18 +02:00
Robert Sachunsky
86adaf299a training.models.transformer_block: tf.reshape → Keras Reshape layer 2026-05-19 03:40:16 +02:00
Robert Sachunsky
9efce5e9f2 Predictor.shutdown: use join() instead of terminate() 2026-05-19 03:40:07 +02:00
Robert Sachunsky
ffe5cdc519 ModelZoo.shutdown: drop extra del (already done by shutdown()) 2026-05-19 03:40:05 +02:00
Robert Sachunsky
481c286da9 ModelZoo.load_model: no XLA compilation 2026-05-19 03:40:05 +02:00
Robert Sachunsky
f329e10a80 test_layout: rm ignored --allow_scaling option 2026-05-19 03:40:04 +02:00
Robert Sachunsky
17b311441a model_zoo: also parse comma/colon syntax for device in Torch case 2026-05-19 03:40:03 +02:00
Robert Sachunsky
be4fe8c263 contour: drop unused functions depending on rotation_image_new() 2026-05-19 03:40:02 +02:00
Robert Sachunsky
87cce6c963 CLI tests: add opt-in envvar EYNOLLAH_OPTIONS for device selection,
model directory etc.
2026-05-19 03:40:01 +02:00
Robert Sachunsky
1ed633bc25 test_model_zoo: adapt (load_models instead of load_model) 2026-05-19 03:40:00 +02:00
Robert Sachunsky
21ecb043f7 CLIs: move --device option to group level 2026-05-19 03:39:59 +02:00
Robert Sachunsky
7ed1a1ebac CLIs: allow -h and show defaults uniformly, harmonise help, drop
remaining redundant negative options
2026-05-19 03:39:56 +02:00
Robert Sachunsky
cd62f13872 eynollah_ocr: make work again, re-use Eynollah base class…
- re-use Eynollah base class
- use `ModelZoo.load_models()` instead of `load_model()`
- pass in `device` init kwarg, delegate to `ModelZoo.load_models()`
- `device`: return Torch device at loaded model tensors
  instead of ad-hoc selection
- make numeric init kwargs non-optional (only numeric)
2026-05-19 03:39:55 +02:00
Robert Sachunsky
ded668a256 model_zoo: fix clash between Predictor and direct (OCR) use-cases…
- `load_models()`: uniformly handle arg types
- `load_model()`: move handling of non-model categories
  to `load_models()`
- `load_model()`: move SavedModel preference over HDF5 to `model_path()`
- `_load_ocr_model()`: add user-selected device handling and reporting
  for Torch (as for TF)
- `_load_ocr_model()`: move (TF-based) CNN-RNN case to `load_model()`
  (including Keras layer mapping)
- `shutdown()`: only apply `shutdown()` to Predictor model types
2026-05-19 03:39:53 +02:00
Robert Sachunsky
98e6fbbcbb mbreorder: make work again, re-use Eynollah base class 2026-05-19 03:39:52 +02:00
Robert Sachunsky
7e8b9311d3 Revert "test_model_zoo: fix calls"
This reverts commit 5a98f55be3.
2026-05-19 03:32:37 +02:00