Because transformers v4 and v5 API for image preprocessor differs,
and the model-internal image input sizes are actually irrelevant,
because the preprocessor will resize them anyway, and there is no
batch dimension (because the input images will have different shapes),
do not advertise this information in `.input_shape`.
- ModelZoo: drop `trocr_processor` model type
- `ModelZoo.load_models()`: use Predictor for `ocr_tr` models, too
- `ModelZoo.load_model()`: for `ocr_tr`, load processor and model,
then define a function object as stand-in for the common model
interface based on Keras (w/ `.predict_on_batch()`)
- Predictor: allow multi-input without actual batch dimension
for `ocr_tr` models (because the model takes a list of original
image arrays and resizes them to model shape internally)
- Eynollah_ocr: adapt (replacing preprocessing, prediction and
decoding steps by a single `.predict()` call)
- `min_conf_value_of_textline_text`: apply by skipping
lines below threshold (instead of writing empty text),
and delete their TextEquiv (if existing)
- `write_ocr()`: simplify, and ensure consistency between
line and region level text correctly
- drop `end_character` mechanics and `characters` model type
for decoding output probability (not needed)
- drop `decode_batch_predictions()` and `num_to_char` model type
(part of inference model)
- drop roughshot confidence estimation calculation
(returned precisely by inference model)
- adapt model prediction to inference model: just omit zeros,
map to bytes, filter OOV tokens and decode UTF-8 to str
- if no binarization input was provided, then compute it on the fly
using `binarization` model
- also apply `min_conf_value_of_textline_text` (as for TrOCR)
- batching over entire page instead of region-wise
(which underfilled batches)
- simplify and avoid copied redundant code
- rename `extracted_conf_value_merged` → `extracted_confs_merged`
- move `batched()` from `utils.utils_ocr` to `utils`
- drop `utils_ocr.distortion_free_resize()` (not needed)
- simplify `utils_ocr.break_curved_line_into_small_pieces_and_then_merge()`
- drop `utils_ocr.return_textline_contour_with_added_box_coordinate()`
and `utils_ocr.return_rnn_cnn_ocr_of_given_textlines()` (not needed)
- ModelZoo: drop `num_to_char` and `characters` model types,
also drop `_load_characters()` and `_load_num_to_char()` loaders
- `ModelZoo.load_models()`: use Predictor for `ocr` models, too
- `ModelZoo.load_model()`: delegate runtime/inference conversion of
OCR models to `eynollah.training.models.cnn_rnn_ocr_model4inference`
- `training.models`: add (purely functional) Keras layer `CTCDecoder`
for inference on top of softmax output, but using TF backend
function instead of (broken) `Keras.backend.ctc_decode()`, while
switching to beam search (instead of greedy) and also returning
decoded path probability
- `training.models.cnn_rnn_ocr_model()` w/ `inference=True`:
* add kwarg `characters_txt_file` for file path of character set
* configure secondary tensor path on OCR graph for binarized input
(additional input `image_bin`, averaging softmax outputs)
* use new `CTCDecoder` layer and inverse `StringLookup` layer to
decode from softmax output to tf.string; so inference models
now have 2 inputs (RGB, binarized) and 2 outputs (text, prob)
* since `np.dtype=object` cannot be handled by SharedMemory (as
needed by Predictor queues), also replace tf.string by tf.uint8
arrays
* use this for `training convert` for OCR models w/ `--rebuild`
- `training.models.cnn_rnn_ocr_model4inference`:
* new function which does the same but loads an existing OCR model
in training configuration (i.e. without prior `inference=True`)
* use this for `training convert` for OCR models w/o `--rebuild`
- comment out ad-hoc conversion/loading of autosized models
- refactor predictor backends for model types into separate functions
- only attempt inference conversion of cnn-rnn-ocr model
if applicable (`ctc_loss` layer still present)
- apply VRAM limits across model types
(Keras, TF-Serving, ONNX)
- apply TF device selection across model types
(Keras, TF-Serving)
- implement predictor backend for ONNX models:
- using onnxruntime
- covering CUDA and TensorRT providers
- trying to support manual device selection
- hiding session management details
- converting float32 to float16
- move `train_cli` from cli.py to train.py,
add docstring
- add `convert_cli`:
- load any (supported) model format
(i.e. not exported TF-Serving or ONNX)
- if SavedModel format with `config.json` present,
and `--rebuild` is requested, create new model
from `models.get_model()` for this configuration,
and load weights
- if model type is `cnn-rnn-ocr` and configuration
is still for training (`ctc_loss`), then extract
inference model
- apply requested `--format` conversion:
HDF5, Keras native, Keras SavedModel, TF-Serving SavedModel
or ONNX
- if output format is directory (i.e. SavedModel),
then copy over `config.json`, too
- reload-models-v0.8.mk:
- adapt recipe for converter CLI (i.e. `--format tf-serving`
w/ `--rebuild` if possible)
- add targets for other useful data formats
- extend list of model names to all current models
(as all benefit from TF-Serving export)
- cancel ONNX conversion for vision transformer models
(as these do not work, yet)
- drop ad-hoc configuration parameter `reload_weights`
(used for conversion/export of models for inference,
to be replaced by extra CLI)
- re-interprete `dir_of_start_model` to also load weights
if not `continue_training`
- models: add new `get_model()`, passing in Sacred config
to capture builder function arguments
- train: fewer imports
- train: no need to pass `custom_objects` if loading with
`compile=False` (and we custom-compile later, anyway)
- growth strategy is more flexible, but uses much more VRAM
- limit strategy needs to be calibrated to models (currently fixed),
and batch size, but needs much less VRAM and is faster
- re-use Eynollah base class
- use `ModelZoo.load_models()` instead of `load_model()`
- pass in `device` init kwarg, delegate to `ModelZoo.load_models()`
- `device`: return Torch device at loaded model tensors
instead of ad-hoc selection
- make numeric init kwargs non-optional (only numeric)
- `load_models()`: uniformly handle arg types
- `load_model()`: move handling of non-model categories
to `load_models()`
- `load_model()`: move SavedModel preference over HDF5 to `model_path()`
- `_load_ocr_model()`: add user-selected device handling and reporting
for Torch (as for TF)
- `_load_ocr_model()`: move (TF-based) CNN-RNN case to `load_model()`
(including Keras layer mapping)
- `shutdown()`: only apply `shutdown()` to Predictor model types