Commit graph

1482 commits

Author SHA1 Message Date
Robert Sachunsky
f5f2435a38 run_marginals: drop unnecessarily passing textline_mask, mask_seps, mask_images 2026-04-16 05:13:06 +02:00
Robert Sachunsky
9309586712 split_textregion_main_vs_header → split_textregion_main_vs_head…
(and simplify)
2026-04-16 05:07:22 +02:00
Robert Sachunsky
0f82b568ba do_prediction_new_concept: aggregate confidence for all classes…
(not just text; will still have to pass that on to the writer...)
2026-04-16 05:02:20 +02:00
Robert Sachunsky
5a27e46b22 keep seps over artificial boundaries to improve col separation…
(thresholding and decoding with artificial boundary class can
 overwrite existing column separators, which in turn can contribute
 to missing column boundaries; this prioritises seps over boundaries,
 which does not impair separation of instances, as seps will separate
 text/image/etc instances just as well as artificial boundaries)
2026-04-16 04:56:38 +02:00
Robert Sachunsky
9d6ff65e1d get_tables_from_model: utilise artificial bound thresholding…
(to improve separation of neighbouring tables, esp. across
 columns; since model's threshold class is particularly weak,
 also use lower threshold here)
2026-04-16 04:49:07 +02:00
Robert Sachunsky
12b1271487 layout cli: add option --halt-fail 2026-04-13 01:19:47 +02:00
Robert Sachunsky
56e6deb02c predictor: jit-compile and precompile (non-autosized) models 2026-04-13 01:17:04 +02:00
Robert Sachunsky
01c54eb2ef reduce inference batch sizes to accommodate 8 GB VRAM
(still pending a solution for flexible batch sizes)
2026-04-13 01:15:25 +02:00
Robert Sachunsky
f44c39667e predictor: disable rebatching (until we have flexible batch sizes) 2026-04-13 01:14:49 +02:00
Robert Sachunsky
219954d15b predictor: use predict_on_batch instead of predict 2026-04-13 01:14:18 +02:00
Robert Sachunsky
0d21b62aee disable autosized prediction entirely (also for _patched)…
When 338c4a0e wrapped all prediction models for automatic
image size adaptation in CUDA,
- tiling (`_patched`) was indeed faster
- whole  (`_resized`) was actually slower

But CUDA-based tiling also increases GPU memory requirements
a lot. And with the new parallel subprocess predictors, Numpy-
based tiling is not necessarily slower anymore.
2026-04-10 18:23:10 +02:00
Robert Sachunsky
ccef63f08b get_regions: always use resized/enhanced image…
(avoid strange image handling short-cut, which uses
 early cropped image used for column classification
 instead of normal image in 1/2-column cases;
 fixes accuracy issues of region_1_2 model on these images)
2026-04-10 18:17:51 +02:00
Robert Sachunsky
04da66ed73 training: plot only ~ 1000 training and ~ 100 validation images 2026-03-30 13:34:05 +02:00
Robert Sachunsky
a8556f5210 run: sort parallel log messages by file name instead of prefixing…
(as follow-up to ec08004f:)

- create log queues and QueueListener separately for each job
- receive job logs sequentially
- drop log filter mechanism (prefixing log messages by file name)
- also count ratio of successful jobs
2026-03-30 13:18:40 +02:00
Robert Sachunsky
1756443605 fixup device sel 2026-03-16 15:35:07 +01:00
Robert Sachunsky
6bbdcc39ef CLI/Eynollah.setup_models/ModelZoo.load_models: add device option/kwarg
allow setting device specifier to load models into

either
- CPU or single GPU0, GPU1 etc
- per-model patterns, e.g. col*:CPU,page:GPU0,*:GPU1

pass through as kwargs until `ModelZoo.load_models()` setup up TF
2026-03-15 04:54:04 +01:00
Robert Sachunsky
67e9f84b54 do_prediction* for "col_classifier": pass array as float16 instead of float64 2026-03-15 03:20:39 +01:00
Robert Sachunsky
f54deff452 model_zoo/predictor: use one subprocess per model…
- Eynollah: instead of one `Predictor` instance as stand-in for
  entire `ModelZoo`, keep the latter but have each model in `_loaded`
  dict become an independent predictor instance
- `ModelZoo.load_models()`: instantiate `Predictor`s for each
  `model_category` and then call `Predictor.load_model()` on them
- `Predictor.load_model()`: set args/kwargs for `ModelZoo.load_model()`,
  then spawn subprocess via `.start()`, which first enters `setup()`...
- `Predictor.setup()`: call `ModelZoo.load_model()` instead of (plural)
 `.load_models()`; save to `self.model` instead of `self.model_zoo`
- `ModelZoo.load_model()`: move _all_ CUDA configuration and
  TF/Keras-specific module initialization here (to be used only by
  predictor subprocess)
- `Predictor`: drop stand-in `SingleModelPredictor` retrieved by `get()`;
  directly provide `predict()` and `output_shape` via `self.call()`
- `Predictor`: drop `model` arg from all queues - now implicit; use
  `self.name` for model name in messages
- `Predictor`: no need for requeuing other tasks (only same model now)
- `Predictor`: reduce rebatching batch sizes due to increased VRAM footprint

- `Eynollah.setup_models()`: set up loading `_patched` / `_resized`
  here instead of during `ModelZoo.load_model()`
- `ModelZoo.load_models()`: for resized/patched models, call
  `Predictor.load_model()` with kwarg instead of resp. model name suffix
- `ModelZoo.load_model()`: expect boolean kwargs `patched/resized`
  for `wrap_layout_model_patched/resized` model wrappers, respectively
2026-03-15 02:53:37 +01:00
Robert Sachunsky
c514bbc661 make switching between autosized and looped tiling easier 2026-03-14 02:16:26 +01:00
Robert Sachunsky
2f3b622cf5 predictor: rebatch tasks to increase CUDA throughput…
- depending on model type (i.e. size), configure target
  batch sizes
- after receiving a prediction task for some model,
  look up target batch size, then try to retrieve arrays
  from follow-up tasks for the same model on the task queue;
  stop when either no tasks are immediately available or
  when the combined batch size (input batch size * number of tasks)
  reaches the target
- push back tasks for other models to the queue
- rebatch: read all shared arrays, concatenate them along axis 0,
  map respective job ids they came from
- predict on new (possibly larger) batch
- split result along axis 0 into number of jobs
- send each result along with its jobid to task queue
2026-03-14 00:52:34 +01:00
Robert Sachunsky
b550725cc5 wrap_layout_model_patched: simplify shape calculation 2026-03-14 00:51:22 +01:00
Robert Sachunsky
d6404dbbc2 do_prediction*: pass arrays as float16 instead of float64 to TF 2026-03-14 00:49:26 +01:00
Robert Sachunsky
135064a48e model_zoo: region model not used at runtime anymore - don't load 2026-03-14 00:48:52 +01:00
Robert Sachunsky
ec08004fb0 run: add QueueListener to pool / QueueHandler to workers…
- set up a Queue and QueueListener along with ProcessPoolExecutor,
  delegating messages from the queue to all handlers
- in forked subprocesses, instead of just inheriting handlers,
  replace them with a single QueueHandler, and make sure
  log messages get prefixes by the respective job id (img_filename)
  so concurrent messages will still be readable
- in the predictor, make sure to pass on the log level to the
  spawned subprocess, too
2026-03-14 00:43:58 +01:00
Robert Sachunsky
b7aa1d24cc CLI: drop redundant negative option forms, add --num-jobs 2026-03-13 18:22:25 +01:00
Robert Sachunsky
576e120ba6 autosized prediction is only faster for _patched, not for _resized…
When 338c4a0e wrapped all prediction models for automatic
image size adaptation in CUDA,
- tiling (`_patched`) was indeed faster
- whole  (`_resized`) was actually slower

So this reverts the latter part.
2026-03-13 18:15:30 +01:00
Robert Sachunsky
6d55f297a5 run: use ProcessPoolExecutor for parallel run_single across pages…
- reintroduce ProcessPoolExecutor
  (previously for parallel deskewing within pages)
- wrap Eynollah instance into global, so (with forking)
  serialization can be avoided – same pattern as in core ocrd.Processor
- move timing/logging into `run_single()`, respectively
2026-03-13 10:15:51 +01:00
Robert Sachunsky
96cfddf92d split_textregion_main_vs_header: avoid zero division 2026-03-13 02:44:08 +01:00
Robert Sachunsky
4e9b062b84 separate_marginals_to_left_and_right...: simplify 2026-03-13 02:44:08 +01:00
Robert Sachunsky
ae0f194241 drop ProcessPoolExecutor for intra-page parallel subprocessing…
(interferes with inter-page parallelism, not as useful)
2026-03-13 02:44:08 +01:00
Robert Sachunsky
becf031c65 refactor to remove data-dependency from all Eynollah methods…
- `cache_images()`: only return an image dict (plus extra keys
  for file name stem and dpi) - don't set any attributes
- `imread()`: just take from passed image dict, also add `binary` key
- `resize_and_enhance_image_with_column_classifier()`:
  * `imread()` from image dict
  * set `img_bin` key for binarization result if `input_binary`
  * instead of `image_page_org_size` / `page_coord` attributes,
    set `img_page` / `coord_page` in image dict
  * instead of retval, set `img_res` in image dict
  * also set `scale_x` and `scale_y` in image dict, resp.
  * simplify
- `resize_image_with_column_classifier()`:
  * `imread()` from image dict
  * (as in `resize_and_enhance_with_column_classifier`:)
    call `calculate_width_height_by_columns_1_2` if `num_col` is
    1 or 2 here
  * instead of retval, set `img_res` in image dict
  * also set `scale_x` and `scale_y` in image dict, resp.
  * simplify
- `calculate_width_height_by_columns*()`: simplify, get confidence of
  num_col instead of entire array
- `extract_page()`: read `img_res` from image dict; simplify
- `early_page_for_num_of_column_classification()`:
  `imread()` from image dict; simplify
- `textline_contours()`: no need for `num_col_classifier` here
- `run_textline()`: no need for `num_col_classifier` here
- `get_regions_light_v()` → `get_regions()`:
  * read `img_res` from image dict
  * get shapes via `img` from image dict instead of `image_org` attr
  * use `img_page` / `coord_page` from image dict instead of attrs
  * avoid unnecessary 3-channel arrays
  * simplify
- `get_tables_from_model()`: no need for `num_col_classifier` here
- `run_graphics_and_columns_light()` → `run_graphics_and_columns()`:
  * pass through image dict instead of `img_bin` (which really was `img_res`)
  * simplify
- `run_graphics_and_columns_without_layout()`:
  * pass through image dict instead of `img_bin` (which really was `img_res`)
  * simplify
- `run_enhancement()`: pass through image dict
- `get_image_and_sclaes*()`: drop
- `run_boxes_full_layout()`:
  * pass `image_page` instead of `img_bin` (which really was `image_page`)
  * simplify
- `run()`:
  * instantiate plotter outside of loop, and independent of img files
  * move writer instantiation and overwrite checks into `run_single()`
  * add try/catch for `run_single()` w/ logging
- `reset_file_name_dir`: drop
- `run_single()`:
  * add some args/kwargs from `run()`
  * call `cache_images()` (reading image dict) here
  * instantiate writer here instead of (reused) attr in `run()`
  * set `scale_x` / `scale_y` in writer from image dict once known
    (i.e. after `run_enhancement()`)
  * don't return anything, but write PAGE result here
- `check_any_text_region_in_model_one_is_main_or_header_light()` →
  `split_textregion_main_vs_header()`
- plotter:
  * pass `name` (file stem) from image dict to all methods
  * for `write_images_into_directory()`: also `scale_x` and `scale_y`
    from image dict
- writer:
  * init with width/height
- ocrd processor:
  * adapt (just `run_single()` call)
  * drop `max_workers=1` restriction (can now run fully parallel)
- `get_textregion_contours_in_org_image_light()` →
  `get_textregion_confidences()`:
  * take shape from confmat directly instead of extra array
  * simplify
2026-03-13 02:44:08 +01:00
Robert Sachunsky
800c55b826 predictor: fix spawn vs fork / parent vs child contexts 2026-03-13 02:44:07 +01:00
Robert Sachunsky
64281768a9 run_graphics_and_columns_light: fix double 1-off error…
When `num_col_classifier` predicted result gets bypassed
by heuristic result from `find_num_col()` (because prediction
had too little confidence or `calculate_width_height_by_columns()`
would have become too large), do not increment `num_col` further
(already 1 more than colseps).
2026-03-12 10:18:14 +01:00
Robert Sachunsky
46c5f52491 CLI: don't append /models_eynollah here (already in default_specs) 2026-03-11 02:40:53 +01:00
Robert Sachunsky
10214dfdda predictor: make sure all shared arrays get freed eventually 2026-03-11 02:40:53 +01:00
Robert Sachunsky
cf5caa1eca predictor: fix termination for pytests…
- rename `terminate` → `stopped`
- call `terminate()` from superclass during shutdown
- del `self.model_zoo` in the parent process after spawn,
  and in the child during shutdown
2026-03-11 02:40:53 +01:00
Robert Sachunsky
bb468bf68f predictor: mp.Value must come from spawn context, too 2026-03-11 02:27:47 +01:00
Robert Sachunsky
9f127a0783 introduce predictor subprocess for exclusive GPU processing…
- new class `Predictor(multiprocessing.Process)` as stand-in
  for EynollahModelZoo:
  * calling `load_models()` starts the subprocess (and has
    `.model_zoo.load_models()` run internally)
  * calling `get()` yields a stand-in that supports `.predict()`,
    which actually communicates with the singleton subprocess
    via task and result queues, sharing Numpy arrays via SHM
  * calling `predict()` with an empty dict (instead of an image)
    merely retrieves the respective model's output shapes (cached)
  * shared memory objects for arrays are cleared as soon as possible
  * log messages are piped through QueueHandler / QueueListener
  * exceptions are passed through the queues, and raised afterwards
- move all TF initialization to the predictor
2026-03-07 03:54:16 +01:00
Robert Sachunsky
6f4ec53f7e wrap_layout_model_resized/patched: compile call instead of predict
(so `predict()` can directly convert back to Numpy)
2026-03-07 03:52:14 +01:00
Robert Sachunsky
338c4a0edf wrap layout models for prediction (image resize or tiling) all in TF
(to avoid back and forth between CPU and GPU memory when looping
 over image patches)

- `patch_encoder`: define `Model` subclasses which take an existing
  (layout segmentation) model in the constructor, and define a new
  `call()` using the existing model in a GPU-only `tf.function`:
  * `wrap_layout_model_resized`: just `tf.image.resize()` from
    input image to model size, then predict, then resize back
  * `wrap_layout_model_patched`: ditto if smaller than model size;
    otherwise use `tf.image.extract_patches` for patching in a
    sliding-window approach, then predict patches one by one, then
    `tf.scatter_nd` to reconstruct to image size
- when compiling `tf.function` graph, make sure to use input signature
  with variable image size, but avoid retracing each new size sample
- in `EynollahModelZoo.load_model` for relevant model types,
  also wrap the loaded model
  * by `wrap_layout_model_resized` under model name + `_resized`
  * by `wrap_layout_model_patched` under model name + `_patched`
- introduce `do_prediction_new_concept_autosize`,
  replacing `do_prediction/_new_concept`,
  but using passed model's `predict` directly without
  resizing or tiling to model size
- instead of `do_prediction/_new_concept(True, ...)`,
  now call `do_prediction_new_concept_autosize`,
  but with `_patched` appended to model name
- instead of `do_prediction/_new_concept(False, ...)`,
  now call `do_prediction_new_concept_autosize`,
  but with `_resized` appended to model name
2026-03-07 03:33:44 +01:00
Robert Sachunsky
f33fd57da8 model_zoo: resolve path names coming in from caller (CLI)
(to make relative paths work)
2026-03-05 00:50:32 +01:00
Robert Sachunsky
41dccb216c use (generalized) do_prediction() instead of predict_enhancement() 2026-03-05 00:50:32 +01:00
Robert Sachunsky
341480e9a0 do_prediction: if img was too small for model, also upscale results
(i.e. resize back to match original size after prediction)
2026-03-05 00:50:32 +01:00
Robert Sachunsky
8ebbe65c17 textline_contours: remove unnecessary resize_image, simplify 2026-03-05 00:50:32 +01:00
Robert Sachunsky
3370a3aa85 do_prediction*: avoid 3-channel results, simplify further…
- `do_prediction/_new_concept`: avoid unnecessary `np.repeat`
  on results, aggregate intermediate artificial class mask and
  confidence data in extra arrays
- callers: avoid unnecessary thresholding the result arrays
- callers: adapt (no need to slice into channels)
- simplify by refactoring thresholding and skeletonization into
  function `seg_mask_label`
- `extract_text_regions*`: drop unused second result array
- `textline_contours`: avoid calculating unused unpatched prediction
2026-03-05 00:50:32 +01:00
Robert Sachunsky
ff7dc31a68 do_prediction*: rename identifiers for artificial class thresholding
- `do_prediction_new_concept` w/ patches: remove branches for
  `thresholding_for_artificial_class` (never used, wrong name)
- `do_prediction_new_concept` w/ patches: rename kwarg
  `thresholding_for_some_classes` →
  `thresholding_for_artificial_class`
- `do_prediction_new_concept`: introduce kwarg `artificial_class`
  (for baked constant 4)
- `do_prediction`: introduce kwarg `artificial_class`
  (for baked constant 2)
- `do_prediction/_new_concept`: rename kwargs
  `thresholding_for..._in_light_version` →
  `thresholding_for...`
- `do_prediction`: rename kwarg
  `threshold_art_class_textline` →
  `threshold_art_class`
- `do_prediction_new_concept`: rename kwarg
  `threshold_art_class_layout` →
  `threshold_art_class`
2026-03-02 13:08:11 +01:00
Robert Sachunsky
b9cf68b51a training: fix b6d2440c 2026-03-01 20:00:05 +01:00
Robert Sachunsky
686f1d34aa do_prediction*: simplify (esp. indexing/slicing) 2026-03-01 04:37:20 +01:00
Robert Sachunsky
3b56fa2a5b training: plot GT/prediction and metrics before training (commented) 2026-02-28 20:11:12 +01:00
Robert Sachunsky
e47653f684 training: move nCC metric/loss to .metrics and rename…
- `num_connected_components_regression` → `connected_components_loss`
- move from training.train to training.metrics
2026-02-28 20:11:12 +01:00