Commit graph

1442 commits

Author SHA1 Message Date
Robert Sachunsky
0dfc9d911f run_boxes_no_full_layout: also map to fl labels here…
(because -mbro assumes the label set from -fl)
2026-04-20 18:20:58 +02:00
Robert Sachunsky
0015f2675b with -slro, also extract and apply page (Border) mask 2026-04-20 18:20:58 +02:00
Robert Sachunsky
569b96d1a9 find_number_of_columns_in_document: pass correct label_seps…
- in fl: 6
- non-fl: 3 (now fixed)
2026-04-20 18:20:58 +02:00
Robert Sachunsky
f28a9c9e0b add confidence for all region types, prepare for textlines…
- pass on probabilities from predicted class everywhere
- rename `confidence_matrix` → `confidence_regions` / `regions_confidence`
- rename `get_textregion_confidences()` → `get_region_confidences()`
- add same for tables, textlines and regionsfl (full layout model)
- aggregate per-region confidence lists for image, table, drop-capital,
  left marginal and right marginal regions
- add in writer
- simplify/re-indent some
- try to replace more number literals with class label identifiers
2026-04-20 18:20:58 +02:00
Robert Sachunsky
1164b97917 extract_text_regions_new: fix heading thresholding…
- re-introduce boosting `heading` thresholding broken
  when refactoring (light version and do_prediction)
- also return confidence for full layout prediction
2026-04-20 18:20:58 +02:00
Robert Sachunsky
20dc5c3188 also cover drop-capital in (heuristic) reading order 2026-04-20 18:20:58 +02:00
Robert Sachunsky
92e94753c7 decoding of dropcaps in -fl: ensure consistency w/ early layout…
1. use connected component analysis to get unique segments
   in early prediction result
2. for each drop-capital segment in full prediction result,
   find matching early segment
3. when they have high overlap, assign drop-capital label
   to the entire early segment
2026-04-20 18:20:58 +02:00
Robert Sachunsky
29b42fdfaa decoding of drop-capitals in full layout: also allow replacing img…
- rename `putt_bb_of_drop_capitals_of_model_in_patches_in_layout`
  → `fill_bb_of_drop_capitals`
- also allow image (besides text) label in early layout prediction
  result when checking if entire bbox can be filled (as opposed to
  just drop-capital | image | background mask)
- simplify
2026-04-16 18:37:27 +02:00
Robert Sachunsky
6e0aed35f4 run_boxes_*: simplify, document class label mappings, start using
identifier constants instead of literals for labels
2026-04-16 18:37:27 +02:00
Robert Sachunsky
f29e876a7c return_boxes_of_images_by_order_of_reading_new: sep label differs w/o -fl…
fix bug where in non-full mode, the wrong class label was assumed
for separator regions (3 in non- vs 6 in full layout mode):

- pass in separator mask instead of full segmentation map
- rename for clarity:
  - `regions_without_separators` → `text_mask` (alread binary)
  - `regions_with_separators` → `sep_mask` (now just binary)
2026-04-16 05:16:23 +02:00
Robert Sachunsky
f5f2435a38 run_marginals: drop unnecessarily passing textline_mask, mask_seps, mask_images 2026-04-16 05:13:06 +02:00
Robert Sachunsky
9309586712 split_textregion_main_vs_header → split_textregion_main_vs_head…
(and simplify)
2026-04-16 05:07:22 +02:00
Robert Sachunsky
0f82b568ba do_prediction_new_concept: aggregate confidence for all classes…
(not just text; will still have to pass that on to the writer...)
2026-04-16 05:02:20 +02:00
Robert Sachunsky
5a27e46b22 keep seps over artificial boundaries to improve col separation…
(thresholding and decoding with artificial boundary class can
 overwrite existing column separators, which in turn can contribute
 to missing column boundaries; this prioritises seps over boundaries,
 which does not impair separation of instances, as seps will separate
 text/image/etc instances just as well as artificial boundaries)
2026-04-16 04:56:38 +02:00
Robert Sachunsky
9d6ff65e1d get_tables_from_model: utilise artificial bound thresholding…
(to improve separation of neighbouring tables, esp. across
 columns; since model's threshold class is particularly weak,
 also use lower threshold here)
2026-04-16 04:49:07 +02:00
Robert Sachunsky
12b1271487 layout cli: add option --halt-fail 2026-04-13 01:19:47 +02:00
Robert Sachunsky
56e6deb02c predictor: jit-compile and precompile (non-autosized) models 2026-04-13 01:17:04 +02:00
Robert Sachunsky
01c54eb2ef reduce inference batch sizes to accommodate 8 GB VRAM
(still pending a solution for flexible batch sizes)
2026-04-13 01:15:25 +02:00
Robert Sachunsky
f44c39667e predictor: disable rebatching (until we have flexible batch sizes) 2026-04-13 01:14:49 +02:00
Robert Sachunsky
219954d15b predictor: use predict_on_batch instead of predict 2026-04-13 01:14:18 +02:00
Robert Sachunsky
0d21b62aee disable autosized prediction entirely (also for _patched)…
When 338c4a0e wrapped all prediction models for automatic
image size adaptation in CUDA,
- tiling (`_patched`) was indeed faster
- whole  (`_resized`) was actually slower

But CUDA-based tiling also increases GPU memory requirements
a lot. And with the new parallel subprocess predictors, Numpy-
based tiling is not necessarily slower anymore.
2026-04-10 18:23:10 +02:00
Robert Sachunsky
ccef63f08b get_regions: always use resized/enhanced image…
(avoid strange image handling short-cut, which uses
 early cropped image used for column classification
 instead of normal image in 1/2-column cases;
 fixes accuracy issues of region_1_2 model on these images)
2026-04-10 18:17:51 +02:00
Robert Sachunsky
04da66ed73 training: plot only ~ 1000 training and ~ 100 validation images 2026-03-30 13:34:05 +02:00
Robert Sachunsky
a8556f5210 run: sort parallel log messages by file name instead of prefixing…
(as follow-up to ec08004f:)

- create log queues and QueueListener separately for each job
- receive job logs sequentially
- drop log filter mechanism (prefixing log messages by file name)
- also count ratio of successful jobs
2026-03-30 13:18:40 +02:00
Robert Sachunsky
1756443605 fixup device sel 2026-03-16 15:35:07 +01:00
Robert Sachunsky
6bbdcc39ef CLI/Eynollah.setup_models/ModelZoo.load_models: add device option/kwarg
allow setting device specifier to load models into

either
- CPU or single GPU0, GPU1 etc
- per-model patterns, e.g. col*:CPU,page:GPU0,*:GPU1

pass through as kwargs until `ModelZoo.load_models()` setup up TF
2026-03-15 04:54:04 +01:00
Robert Sachunsky
67e9f84b54 do_prediction* for "col_classifier": pass array as float16 instead of float64 2026-03-15 03:20:39 +01:00
Robert Sachunsky
f54deff452 model_zoo/predictor: use one subprocess per model…
- Eynollah: instead of one `Predictor` instance as stand-in for
  entire `ModelZoo`, keep the latter but have each model in `_loaded`
  dict become an independent predictor instance
- `ModelZoo.load_models()`: instantiate `Predictor`s for each
  `model_category` and then call `Predictor.load_model()` on them
- `Predictor.load_model()`: set args/kwargs for `ModelZoo.load_model()`,
  then spawn subprocess via `.start()`, which first enters `setup()`...
- `Predictor.setup()`: call `ModelZoo.load_model()` instead of (plural)
 `.load_models()`; save to `self.model` instead of `self.model_zoo`
- `ModelZoo.load_model()`: move _all_ CUDA configuration and
  TF/Keras-specific module initialization here (to be used only by
  predictor subprocess)
- `Predictor`: drop stand-in `SingleModelPredictor` retrieved by `get()`;
  directly provide `predict()` and `output_shape` via `self.call()`
- `Predictor`: drop `model` arg from all queues - now implicit; use
  `self.name` for model name in messages
- `Predictor`: no need for requeuing other tasks (only same model now)
- `Predictor`: reduce rebatching batch sizes due to increased VRAM footprint

- `Eynollah.setup_models()`: set up loading `_patched` / `_resized`
  here instead of during `ModelZoo.load_model()`
- `ModelZoo.load_models()`: for resized/patched models, call
  `Predictor.load_model()` with kwarg instead of resp. model name suffix
- `ModelZoo.load_model()`: expect boolean kwargs `patched/resized`
  for `wrap_layout_model_patched/resized` model wrappers, respectively
2026-03-15 02:53:37 +01:00
Robert Sachunsky
c514bbc661 make switching between autosized and looped tiling easier 2026-03-14 02:16:26 +01:00
Robert Sachunsky
2f3b622cf5 predictor: rebatch tasks to increase CUDA throughput…
- depending on model type (i.e. size), configure target
  batch sizes
- after receiving a prediction task for some model,
  look up target batch size, then try to retrieve arrays
  from follow-up tasks for the same model on the task queue;
  stop when either no tasks are immediately available or
  when the combined batch size (input batch size * number of tasks)
  reaches the target
- push back tasks for other models to the queue
- rebatch: read all shared arrays, concatenate them along axis 0,
  map respective job ids they came from
- predict on new (possibly larger) batch
- split result along axis 0 into number of jobs
- send each result along with its jobid to task queue
2026-03-14 00:52:34 +01:00
Robert Sachunsky
b550725cc5 wrap_layout_model_patched: simplify shape calculation 2026-03-14 00:51:22 +01:00
Robert Sachunsky
d6404dbbc2 do_prediction*: pass arrays as float16 instead of float64 to TF 2026-03-14 00:49:26 +01:00
Robert Sachunsky
135064a48e model_zoo: region model not used at runtime anymore - don't load 2026-03-14 00:48:52 +01:00
Robert Sachunsky
ec08004fb0 run: add QueueListener to pool / QueueHandler to workers…
- set up a Queue and QueueListener along with ProcessPoolExecutor,
  delegating messages from the queue to all handlers
- in forked subprocesses, instead of just inheriting handlers,
  replace them with a single QueueHandler, and make sure
  log messages get prefixes by the respective job id (img_filename)
  so concurrent messages will still be readable
- in the predictor, make sure to pass on the log level to the
  spawned subprocess, too
2026-03-14 00:43:58 +01:00
Robert Sachunsky
b7aa1d24cc CLI: drop redundant negative option forms, add --num-jobs 2026-03-13 18:22:25 +01:00
Robert Sachunsky
576e120ba6 autosized prediction is only faster for _patched, not for _resized…
When 338c4a0e wrapped all prediction models for automatic
image size adaptation in CUDA,
- tiling (`_patched`) was indeed faster
- whole  (`_resized`) was actually slower

So this reverts the latter part.
2026-03-13 18:15:30 +01:00
Robert Sachunsky
6d55f297a5 run: use ProcessPoolExecutor for parallel run_single across pages…
- reintroduce ProcessPoolExecutor
  (previously for parallel deskewing within pages)
- wrap Eynollah instance into global, so (with forking)
  serialization can be avoided – same pattern as in core ocrd.Processor
- move timing/logging into `run_single()`, respectively
2026-03-13 10:15:51 +01:00
Robert Sachunsky
96cfddf92d split_textregion_main_vs_header: avoid zero division 2026-03-13 02:44:08 +01:00
Robert Sachunsky
4e9b062b84 separate_marginals_to_left_and_right...: simplify 2026-03-13 02:44:08 +01:00
Robert Sachunsky
ae0f194241 drop ProcessPoolExecutor for intra-page parallel subprocessing…
(interferes with inter-page parallelism, not as useful)
2026-03-13 02:44:08 +01:00
Robert Sachunsky
becf031c65 refactor to remove data-dependency from all Eynollah methods…
- `cache_images()`: only return an image dict (plus extra keys
  for file name stem and dpi) - don't set any attributes
- `imread()`: just take from passed image dict, also add `binary` key
- `resize_and_enhance_image_with_column_classifier()`:
  * `imread()` from image dict
  * set `img_bin` key for binarization result if `input_binary`
  * instead of `image_page_org_size` / `page_coord` attributes,
    set `img_page` / `coord_page` in image dict
  * instead of retval, set `img_res` in image dict
  * also set `scale_x` and `scale_y` in image dict, resp.
  * simplify
- `resize_image_with_column_classifier()`:
  * `imread()` from image dict
  * (as in `resize_and_enhance_with_column_classifier`:)
    call `calculate_width_height_by_columns_1_2` if `num_col` is
    1 or 2 here
  * instead of retval, set `img_res` in image dict
  * also set `scale_x` and `scale_y` in image dict, resp.
  * simplify
- `calculate_width_height_by_columns*()`: simplify, get confidence of
  num_col instead of entire array
- `extract_page()`: read `img_res` from image dict; simplify
- `early_page_for_num_of_column_classification()`:
  `imread()` from image dict; simplify
- `textline_contours()`: no need for `num_col_classifier` here
- `run_textline()`: no need for `num_col_classifier` here
- `get_regions_light_v()` → `get_regions()`:
  * read `img_res` from image dict
  * get shapes via `img` from image dict instead of `image_org` attr
  * use `img_page` / `coord_page` from image dict instead of attrs
  * avoid unnecessary 3-channel arrays
  * simplify
- `get_tables_from_model()`: no need for `num_col_classifier` here
- `run_graphics_and_columns_light()` → `run_graphics_and_columns()`:
  * pass through image dict instead of `img_bin` (which really was `img_res`)
  * simplify
- `run_graphics_and_columns_without_layout()`:
  * pass through image dict instead of `img_bin` (which really was `img_res`)
  * simplify
- `run_enhancement()`: pass through image dict
- `get_image_and_sclaes*()`: drop
- `run_boxes_full_layout()`:
  * pass `image_page` instead of `img_bin` (which really was `image_page`)
  * simplify
- `run()`:
  * instantiate plotter outside of loop, and independent of img files
  * move writer instantiation and overwrite checks into `run_single()`
  * add try/catch for `run_single()` w/ logging
- `reset_file_name_dir`: drop
- `run_single()`:
  * add some args/kwargs from `run()`
  * call `cache_images()` (reading image dict) here
  * instantiate writer here instead of (reused) attr in `run()`
  * set `scale_x` / `scale_y` in writer from image dict once known
    (i.e. after `run_enhancement()`)
  * don't return anything, but write PAGE result here
- `check_any_text_region_in_model_one_is_main_or_header_light()` →
  `split_textregion_main_vs_header()`
- plotter:
  * pass `name` (file stem) from image dict to all methods
  * for `write_images_into_directory()`: also `scale_x` and `scale_y`
    from image dict
- writer:
  * init with width/height
- ocrd processor:
  * adapt (just `run_single()` call)
  * drop `max_workers=1` restriction (can now run fully parallel)
- `get_textregion_contours_in_org_image_light()` →
  `get_textregion_confidences()`:
  * take shape from confmat directly instead of extra array
  * simplify
2026-03-13 02:44:08 +01:00
Robert Sachunsky
800c55b826 predictor: fix spawn vs fork / parent vs child contexts 2026-03-13 02:44:07 +01:00
Robert Sachunsky
64281768a9 run_graphics_and_columns_light: fix double 1-off error…
When `num_col_classifier` predicted result gets bypassed
by heuristic result from `find_num_col()` (because prediction
had too little confidence or `calculate_width_height_by_columns()`
would have become too large), do not increment `num_col` further
(already 1 more than colseps).
2026-03-12 10:18:14 +01:00
Robert Sachunsky
46c5f52491 CLI: don't append /models_eynollah here (already in default_specs) 2026-03-11 02:40:53 +01:00
Robert Sachunsky
10214dfdda predictor: make sure all shared arrays get freed eventually 2026-03-11 02:40:53 +01:00
Robert Sachunsky
cf5caa1eca predictor: fix termination for pytests…
- rename `terminate` → `stopped`
- call `terminate()` from superclass during shutdown
- del `self.model_zoo` in the parent process after spawn,
  and in the child during shutdown
2026-03-11 02:40:53 +01:00
Robert Sachunsky
bb468bf68f predictor: mp.Value must come from spawn context, too 2026-03-11 02:27:47 +01:00
Robert Sachunsky
9f127a0783 introduce predictor subprocess for exclusive GPU processing…
- new class `Predictor(multiprocessing.Process)` as stand-in
  for EynollahModelZoo:
  * calling `load_models()` starts the subprocess (and has
    `.model_zoo.load_models()` run internally)
  * calling `get()` yields a stand-in that supports `.predict()`,
    which actually communicates with the singleton subprocess
    via task and result queues, sharing Numpy arrays via SHM
  * calling `predict()` with an empty dict (instead of an image)
    merely retrieves the respective model's output shapes (cached)
  * shared memory objects for arrays are cleared as soon as possible
  * log messages are piped through QueueHandler / QueueListener
  * exceptions are passed through the queues, and raised afterwards
- move all TF initialization to the predictor
2026-03-07 03:54:16 +01:00
Robert Sachunsky
6f4ec53f7e wrap_layout_model_resized/patched: compile call instead of predict
(so `predict()` can directly convert back to Numpy)
2026-03-07 03:52:14 +01:00
Robert Sachunsky
338c4a0edf wrap layout models for prediction (image resize or tiling) all in TF
(to avoid back and forth between CPU and GPU memory when looping
 over image patches)

- `patch_encoder`: define `Model` subclasses which take an existing
  (layout segmentation) model in the constructor, and define a new
  `call()` using the existing model in a GPU-only `tf.function`:
  * `wrap_layout_model_resized`: just `tf.image.resize()` from
    input image to model size, then predict, then resize back
  * `wrap_layout_model_patched`: ditto if smaller than model size;
    otherwise use `tf.image.extract_patches` for patching in a
    sliding-window approach, then predict patches one by one, then
    `tf.scatter_nd` to reconstruct to image size
- when compiling `tf.function` graph, make sure to use input signature
  with variable image size, but avoid retracing each new size sample
- in `EynollahModelZoo.load_model` for relevant model types,
  also wrap the loaded model
  * by `wrap_layout_model_resized` under model name + `_resized`
  * by `wrap_layout_model_patched` under model name + `_patched`
- introduce `do_prediction_new_concept_autosize`,
  replacing `do_prediction/_new_concept`,
  but using passed model's `predict` directly without
  resizing or tiling to model size
- instead of `do_prediction/_new_concept(True, ...)`,
  now call `do_prediction_new_concept_autosize`,
  but with `_patched` appended to model name
- instead of `do_prediction/_new_concept(False, ...)`,
  now call `do_prediction_new_concept_autosize`,
  but with `_resized` appended to model name
2026-03-07 03:33:44 +01:00