Commit graph

1422 commits

Author SHA1 Message Date
Robert Sachunsky
0d21b62aee disable autosized prediction entirely (also for _patched)…
When 338c4a0e wrapped all prediction models for automatic
image size adaptation in CUDA,
- tiling (`_patched`) was indeed faster
- whole  (`_resized`) was actually slower

But CUDA-based tiling also increases GPU memory requirements
a lot. And with the new parallel subprocess predictors, Numpy-
based tiling is not necessarily slower anymore.
2026-04-10 18:23:10 +02:00
Robert Sachunsky
ccef63f08b get_regions: always use resized/enhanced image…
(avoid strange image handling short-cut, which uses
 early cropped image used for column classification
 instead of normal image in 1/2-column cases;
 fixes accuracy issues of region_1_2 model on these images)
2026-04-10 18:17:51 +02:00
Robert Sachunsky
04da66ed73 training: plot only ~ 1000 training and ~ 100 validation images 2026-03-30 13:34:05 +02:00
Robert Sachunsky
a8556f5210 run: sort parallel log messages by file name instead of prefixing…
(as follow-up to ec08004f:)

- create log queues and QueueListener separately for each job
- receive job logs sequentially
- drop log filter mechanism (prefixing log messages by file name)
- also count ratio of successful jobs
2026-03-30 13:18:40 +02:00
Robert Sachunsky
1756443605 fixup device sel 2026-03-16 15:35:07 +01:00
Robert Sachunsky
6bbdcc39ef CLI/Eynollah.setup_models/ModelZoo.load_models: add device option/kwarg
allow setting device specifier to load models into

either
- CPU or single GPU0, GPU1 etc
- per-model patterns, e.g. col*:CPU,page:GPU0,*:GPU1

pass through as kwargs until `ModelZoo.load_models()` setup up TF
2026-03-15 04:54:04 +01:00
Robert Sachunsky
67e9f84b54 do_prediction* for "col_classifier": pass array as float16 instead of float64 2026-03-15 03:20:39 +01:00
Robert Sachunsky
f54deff452 model_zoo/predictor: use one subprocess per model…
- Eynollah: instead of one `Predictor` instance as stand-in for
  entire `ModelZoo`, keep the latter but have each model in `_loaded`
  dict become an independent predictor instance
- `ModelZoo.load_models()`: instantiate `Predictor`s for each
  `model_category` and then call `Predictor.load_model()` on them
- `Predictor.load_model()`: set args/kwargs for `ModelZoo.load_model()`,
  then spawn subprocess via `.start()`, which first enters `setup()`...
- `Predictor.setup()`: call `ModelZoo.load_model()` instead of (plural)
 `.load_models()`; save to `self.model` instead of `self.model_zoo`
- `ModelZoo.load_model()`: move _all_ CUDA configuration and
  TF/Keras-specific module initialization here (to be used only by
  predictor subprocess)
- `Predictor`: drop stand-in `SingleModelPredictor` retrieved by `get()`;
  directly provide `predict()` and `output_shape` via `self.call()`
- `Predictor`: drop `model` arg from all queues - now implicit; use
  `self.name` for model name in messages
- `Predictor`: no need for requeuing other tasks (only same model now)
- `Predictor`: reduce rebatching batch sizes due to increased VRAM footprint

- `Eynollah.setup_models()`: set up loading `_patched` / `_resized`
  here instead of during `ModelZoo.load_model()`
- `ModelZoo.load_models()`: for resized/patched models, call
  `Predictor.load_model()` with kwarg instead of resp. model name suffix
- `ModelZoo.load_model()`: expect boolean kwargs `patched/resized`
  for `wrap_layout_model_patched/resized` model wrappers, respectively
2026-03-15 02:53:37 +01:00
Robert Sachunsky
c514bbc661 make switching between autosized and looped tiling easier 2026-03-14 02:16:26 +01:00
Robert Sachunsky
2f3b622cf5 predictor: rebatch tasks to increase CUDA throughput…
- depending on model type (i.e. size), configure target
  batch sizes
- after receiving a prediction task for some model,
  look up target batch size, then try to retrieve arrays
  from follow-up tasks for the same model on the task queue;
  stop when either no tasks are immediately available or
  when the combined batch size (input batch size * number of tasks)
  reaches the target
- push back tasks for other models to the queue
- rebatch: read all shared arrays, concatenate them along axis 0,
  map respective job ids they came from
- predict on new (possibly larger) batch
- split result along axis 0 into number of jobs
- send each result along with its jobid to task queue
2026-03-14 00:52:34 +01:00
Robert Sachunsky
b550725cc5 wrap_layout_model_patched: simplify shape calculation 2026-03-14 00:51:22 +01:00
Robert Sachunsky
d6404dbbc2 do_prediction*: pass arrays as float16 instead of float64 to TF 2026-03-14 00:49:26 +01:00
Robert Sachunsky
135064a48e model_zoo: region model not used at runtime anymore - don't load 2026-03-14 00:48:52 +01:00
Robert Sachunsky
ec08004fb0 run: add QueueListener to pool / QueueHandler to workers…
- set up a Queue and QueueListener along with ProcessPoolExecutor,
  delegating messages from the queue to all handlers
- in forked subprocesses, instead of just inheriting handlers,
  replace them with a single QueueHandler, and make sure
  log messages get prefixes by the respective job id (img_filename)
  so concurrent messages will still be readable
- in the predictor, make sure to pass on the log level to the
  spawned subprocess, too
2026-03-14 00:43:58 +01:00
Robert Sachunsky
b7aa1d24cc CLI: drop redundant negative option forms, add --num-jobs 2026-03-13 18:22:25 +01:00
Robert Sachunsky
576e120ba6 autosized prediction is only faster for _patched, not for _resized…
When 338c4a0e wrapped all prediction models for automatic
image size adaptation in CUDA,
- tiling (`_patched`) was indeed faster
- whole  (`_resized`) was actually slower

So this reverts the latter part.
2026-03-13 18:15:30 +01:00
Robert Sachunsky
6d55f297a5 run: use ProcessPoolExecutor for parallel run_single across pages…
- reintroduce ProcessPoolExecutor
  (previously for parallel deskewing within pages)
- wrap Eynollah instance into global, so (with forking)
  serialization can be avoided – same pattern as in core ocrd.Processor
- move timing/logging into `run_single()`, respectively
2026-03-13 10:15:51 +01:00
Robert Sachunsky
96cfddf92d split_textregion_main_vs_header: avoid zero division 2026-03-13 02:44:08 +01:00
Robert Sachunsky
4e9b062b84 separate_marginals_to_left_and_right...: simplify 2026-03-13 02:44:08 +01:00
Robert Sachunsky
ae0f194241 drop ProcessPoolExecutor for intra-page parallel subprocessing…
(interferes with inter-page parallelism, not as useful)
2026-03-13 02:44:08 +01:00
Robert Sachunsky
becf031c65 refactor to remove data-dependency from all Eynollah methods…
- `cache_images()`: only return an image dict (plus extra keys
  for file name stem and dpi) - don't set any attributes
- `imread()`: just take from passed image dict, also add `binary` key
- `resize_and_enhance_image_with_column_classifier()`:
  * `imread()` from image dict
  * set `img_bin` key for binarization result if `input_binary`
  * instead of `image_page_org_size` / `page_coord` attributes,
    set `img_page` / `coord_page` in image dict
  * instead of retval, set `img_res` in image dict
  * also set `scale_x` and `scale_y` in image dict, resp.
  * simplify
- `resize_image_with_column_classifier()`:
  * `imread()` from image dict
  * (as in `resize_and_enhance_with_column_classifier`:)
    call `calculate_width_height_by_columns_1_2` if `num_col` is
    1 or 2 here
  * instead of retval, set `img_res` in image dict
  * also set `scale_x` and `scale_y` in image dict, resp.
  * simplify
- `calculate_width_height_by_columns*()`: simplify, get confidence of
  num_col instead of entire array
- `extract_page()`: read `img_res` from image dict; simplify
- `early_page_for_num_of_column_classification()`:
  `imread()` from image dict; simplify
- `textline_contours()`: no need for `num_col_classifier` here
- `run_textline()`: no need for `num_col_classifier` here
- `get_regions_light_v()` → `get_regions()`:
  * read `img_res` from image dict
  * get shapes via `img` from image dict instead of `image_org` attr
  * use `img_page` / `coord_page` from image dict instead of attrs
  * avoid unnecessary 3-channel arrays
  * simplify
- `get_tables_from_model()`: no need for `num_col_classifier` here
- `run_graphics_and_columns_light()` → `run_graphics_and_columns()`:
  * pass through image dict instead of `img_bin` (which really was `img_res`)
  * simplify
- `run_graphics_and_columns_without_layout()`:
  * pass through image dict instead of `img_bin` (which really was `img_res`)
  * simplify
- `run_enhancement()`: pass through image dict
- `get_image_and_sclaes*()`: drop
- `run_boxes_full_layout()`:
  * pass `image_page` instead of `img_bin` (which really was `image_page`)
  * simplify
- `run()`:
  * instantiate plotter outside of loop, and independent of img files
  * move writer instantiation and overwrite checks into `run_single()`
  * add try/catch for `run_single()` w/ logging
- `reset_file_name_dir`: drop
- `run_single()`:
  * add some args/kwargs from `run()`
  * call `cache_images()` (reading image dict) here
  * instantiate writer here instead of (reused) attr in `run()`
  * set `scale_x` / `scale_y` in writer from image dict once known
    (i.e. after `run_enhancement()`)
  * don't return anything, but write PAGE result here
- `check_any_text_region_in_model_one_is_main_or_header_light()` →
  `split_textregion_main_vs_header()`
- plotter:
  * pass `name` (file stem) from image dict to all methods
  * for `write_images_into_directory()`: also `scale_x` and `scale_y`
    from image dict
- writer:
  * init with width/height
- ocrd processor:
  * adapt (just `run_single()` call)
  * drop `max_workers=1` restriction (can now run fully parallel)
- `get_textregion_contours_in_org_image_light()` →
  `get_textregion_confidences()`:
  * take shape from confmat directly instead of extra array
  * simplify
2026-03-13 02:44:08 +01:00
Robert Sachunsky
800c55b826 predictor: fix spawn vs fork / parent vs child contexts 2026-03-13 02:44:07 +01:00
Robert Sachunsky
64281768a9 run_graphics_and_columns_light: fix double 1-off error…
When `num_col_classifier` predicted result gets bypassed
by heuristic result from `find_num_col()` (because prediction
had too little confidence or `calculate_width_height_by_columns()`
would have become too large), do not increment `num_col` further
(already 1 more than colseps).
2026-03-12 10:18:14 +01:00
Robert Sachunsky
46c5f52491 CLI: don't append /models_eynollah here (already in default_specs) 2026-03-11 02:40:53 +01:00
Robert Sachunsky
10214dfdda predictor: make sure all shared arrays get freed eventually 2026-03-11 02:40:53 +01:00
Robert Sachunsky
cf5caa1eca predictor: fix termination for pytests…
- rename `terminate` → `stopped`
- call `terminate()` from superclass during shutdown
- del `self.model_zoo` in the parent process after spawn,
  and in the child during shutdown
2026-03-11 02:40:53 +01:00
Robert Sachunsky
bb468bf68f predictor: mp.Value must come from spawn context, too 2026-03-11 02:27:47 +01:00
Robert Sachunsky
9f127a0783 introduce predictor subprocess for exclusive GPU processing…
- new class `Predictor(multiprocessing.Process)` as stand-in
  for EynollahModelZoo:
  * calling `load_models()` starts the subprocess (and has
    `.model_zoo.load_models()` run internally)
  * calling `get()` yields a stand-in that supports `.predict()`,
    which actually communicates with the singleton subprocess
    via task and result queues, sharing Numpy arrays via SHM
  * calling `predict()` with an empty dict (instead of an image)
    merely retrieves the respective model's output shapes (cached)
  * shared memory objects for arrays are cleared as soon as possible
  * log messages are piped through QueueHandler / QueueListener
  * exceptions are passed through the queues, and raised afterwards
- move all TF initialization to the predictor
2026-03-07 03:54:16 +01:00
Robert Sachunsky
6f4ec53f7e wrap_layout_model_resized/patched: compile call instead of predict
(so `predict()` can directly convert back to Numpy)
2026-03-07 03:52:14 +01:00
Robert Sachunsky
338c4a0edf wrap layout models for prediction (image resize or tiling) all in TF
(to avoid back and forth between CPU and GPU memory when looping
 over image patches)

- `patch_encoder`: define `Model` subclasses which take an existing
  (layout segmentation) model in the constructor, and define a new
  `call()` using the existing model in a GPU-only `tf.function`:
  * `wrap_layout_model_resized`: just `tf.image.resize()` from
    input image to model size, then predict, then resize back
  * `wrap_layout_model_patched`: ditto if smaller than model size;
    otherwise use `tf.image.extract_patches` for patching in a
    sliding-window approach, then predict patches one by one, then
    `tf.scatter_nd` to reconstruct to image size
- when compiling `tf.function` graph, make sure to use input signature
  with variable image size, but avoid retracing each new size sample
- in `EynollahModelZoo.load_model` for relevant model types,
  also wrap the loaded model
  * by `wrap_layout_model_resized` under model name + `_resized`
  * by `wrap_layout_model_patched` under model name + `_patched`
- introduce `do_prediction_new_concept_autosize`,
  replacing `do_prediction/_new_concept`,
  but using passed model's `predict` directly without
  resizing or tiling to model size
- instead of `do_prediction/_new_concept(True, ...)`,
  now call `do_prediction_new_concept_autosize`,
  but with `_patched` appended to model name
- instead of `do_prediction/_new_concept(False, ...)`,
  now call `do_prediction_new_concept_autosize`,
  but with `_resized` appended to model name
2026-03-07 03:33:44 +01:00
Robert Sachunsky
f33fd57da8 model_zoo: resolve path names coming in from caller (CLI)
(to make relative paths work)
2026-03-05 00:50:32 +01:00
Robert Sachunsky
41dccb216c use (generalized) do_prediction() instead of predict_enhancement() 2026-03-05 00:50:32 +01:00
Robert Sachunsky
341480e9a0 do_prediction: if img was too small for model, also upscale results
(i.e. resize back to match original size after prediction)
2026-03-05 00:50:32 +01:00
Robert Sachunsky
8ebbe65c17 textline_contours: remove unnecessary resize_image, simplify 2026-03-05 00:50:32 +01:00
Robert Sachunsky
3370a3aa85 do_prediction*: avoid 3-channel results, simplify further…
- `do_prediction/_new_concept`: avoid unnecessary `np.repeat`
  on results, aggregate intermediate artificial class mask and
  confidence data in extra arrays
- callers: avoid unnecessary thresholding the result arrays
- callers: adapt (no need to slice into channels)
- simplify by refactoring thresholding and skeletonization into
  function `seg_mask_label`
- `extract_text_regions*`: drop unused second result array
- `textline_contours`: avoid calculating unused unpatched prediction
2026-03-05 00:50:32 +01:00
Robert Sachunsky
ff7dc31a68 do_prediction*: rename identifiers for artificial class thresholding
- `do_prediction_new_concept` w/ patches: remove branches for
  `thresholding_for_artificial_class` (never used, wrong name)
- `do_prediction_new_concept` w/ patches: rename kwarg
  `thresholding_for_some_classes` →
  `thresholding_for_artificial_class`
- `do_prediction_new_concept`: introduce kwarg `artificial_class`
  (for baked constant 4)
- `do_prediction`: introduce kwarg `artificial_class`
  (for baked constant 2)
- `do_prediction/_new_concept`: rename kwargs
  `thresholding_for..._in_light_version` →
  `thresholding_for...`
- `do_prediction`: rename kwarg
  `threshold_art_class_textline` →
  `threshold_art_class`
- `do_prediction_new_concept`: rename kwarg
  `threshold_art_class_layout` →
  `threshold_art_class`
2026-03-02 13:08:11 +01:00
Robert Sachunsky
b9cf68b51a training: fix b6d2440c 2026-03-01 20:00:05 +01:00
Robert Sachunsky
686f1d34aa do_prediction*: simplify (esp. indexing/slicing) 2026-03-01 04:37:20 +01:00
Robert Sachunsky
3b56fa2a5b training: plot GT/prediction and metrics before training (commented) 2026-02-28 20:11:12 +01:00
Robert Sachunsky
e47653f684 training: move nCC metric/loss to .metrics and rename…
- `num_connected_components_regression` → `connected_components_loss`
- move from training.train to training.metrics
2026-02-28 20:11:12 +01:00
Robert Sachunsky
361d40c064 training: improve nCC metric/loss - measure localized congruence…
- instead of just comparing the number of connected components,
  calculate the GT/pred label incidence matrix and retrieve the
  share of singular values (i.e. nearly diagonal under reordering)
  over total counts as similarity score
- also, suppress artificial class in that
2026-02-28 20:11:12 +01:00
Robert Sachunsky
7e06ab2c8c training: add config param add_ncc_loss for layout/binarization…
- add `metrics.metrics_superposition` and `metrics.Superposition`
- if non-zero, mix configured loss with weighted nCC metric
2026-02-28 20:11:12 +01:00
Robert Sachunsky
c6d9dd7945 training: use mixed precision and XLA (commented; does not work, yet) 2026-02-28 20:10:53 +01:00
Robert Sachunsky
c1d8a72edc training: shuffle tf.data pipelines 2026-02-28 20:10:53 +01:00
Robert Sachunsky
1cff937e72 training: make data pipeline in 7888fa5 more efficient 2026-02-28 20:10:53 +01:00
Robert Sachunsky
f8dd5a328c training: make plotting 18607e0f more efficient…
- avoid control dependencies in model path
- store only every 3rd sample
2026-02-28 20:10:53 +01:00
Robert Sachunsky
2d5de8e595 training.models: use bilinear instead of nearest upsampling…
(to benefit from CUDA optimization)
2026-02-27 12:48:28 +01:00
Robert Sachunsky
ba954d6314 training.models: fix daa084c3 2026-02-27 12:47:59 +01:00
Robert Sachunsky
7c3aeda65e training.models: fix 9b66867c 2026-02-27 12:40:56 +01:00
Robert Sachunsky
439ca350dd training: add metric ConfusionMatrix and plot it to TensorBoard 2026-02-26 13:55:37 +01:00