Commit graph

1395 commits

Author SHA1 Message Date
Robert Sachunsky
9f127a0783 introduce predictor subprocess for exclusive GPU processing…
- new class `Predictor(multiprocessing.Process)` as stand-in
  for EynollahModelZoo:
  * calling `load_models()` starts the subprocess (and has
    `.model_zoo.load_models()` run internally)
  * calling `get()` yields a stand-in that supports `.predict()`,
    which actually communicates with the singleton subprocess
    via task and result queues, sharing Numpy arrays via SHM
  * calling `predict()` with an empty dict (instead of an image)
    merely retrieves the respective model's output shapes (cached)
  * shared memory objects for arrays are cleared as soon as possible
  * log messages are piped through QueueHandler / QueueListener
  * exceptions are passed through the queues, and raised afterwards
- move all TF initialization to the predictor
2026-03-07 03:54:16 +01:00
Robert Sachunsky
6f4ec53f7e wrap_layout_model_resized/patched: compile call instead of predict
(so `predict()` can directly convert back to Numpy)
2026-03-07 03:52:14 +01:00
Robert Sachunsky
338c4a0edf wrap layout models for prediction (image resize or tiling) all in TF
(to avoid back and forth between CPU and GPU memory when looping
 over image patches)

- `patch_encoder`: define `Model` subclasses which take an existing
  (layout segmentation) model in the constructor, and define a new
  `call()` using the existing model in a GPU-only `tf.function`:
  * `wrap_layout_model_resized`: just `tf.image.resize()` from
    input image to model size, then predict, then resize back
  * `wrap_layout_model_patched`: ditto if smaller than model size;
    otherwise use `tf.image.extract_patches` for patching in a
    sliding-window approach, then predict patches one by one, then
    `tf.scatter_nd` to reconstruct to image size
- when compiling `tf.function` graph, make sure to use input signature
  with variable image size, but avoid retracing each new size sample
- in `EynollahModelZoo.load_model` for relevant model types,
  also wrap the loaded model
  * by `wrap_layout_model_resized` under model name + `_resized`
  * by `wrap_layout_model_patched` under model name + `_patched`
- introduce `do_prediction_new_concept_autosize`,
  replacing `do_prediction/_new_concept`,
  but using passed model's `predict` directly without
  resizing or tiling to model size
- instead of `do_prediction/_new_concept(True, ...)`,
  now call `do_prediction_new_concept_autosize`,
  but with `_patched` appended to model name
- instead of `do_prediction/_new_concept(False, ...)`,
  now call `do_prediction_new_concept_autosize`,
  but with `_resized` appended to model name
2026-03-07 03:33:44 +01:00
Robert Sachunsky
f33fd57da8 model_zoo: resolve path names coming in from caller (CLI)
(to make relative paths work)
2026-03-05 00:50:32 +01:00
Robert Sachunsky
41dccb216c use (generalized) do_prediction() instead of predict_enhancement() 2026-03-05 00:50:32 +01:00
Robert Sachunsky
341480e9a0 do_prediction: if img was too small for model, also upscale results
(i.e. resize back to match original size after prediction)
2026-03-05 00:50:32 +01:00
Robert Sachunsky
8ebbe65c17 textline_contours: remove unnecessary resize_image, simplify 2026-03-05 00:50:32 +01:00
Robert Sachunsky
3370a3aa85 do_prediction*: avoid 3-channel results, simplify further…
- `do_prediction/_new_concept`: avoid unnecessary `np.repeat`
  on results, aggregate intermediate artificial class mask and
  confidence data in extra arrays
- callers: avoid unnecessary thresholding the result arrays
- callers: adapt (no need to slice into channels)
- simplify by refactoring thresholding and skeletonization into
  function `seg_mask_label`
- `extract_text_regions*`: drop unused second result array
- `textline_contours`: avoid calculating unused unpatched prediction
2026-03-05 00:50:32 +01:00
Robert Sachunsky
ff7dc31a68 do_prediction*: rename identifiers for artificial class thresholding
- `do_prediction_new_concept` w/ patches: remove branches for
  `thresholding_for_artificial_class` (never used, wrong name)
- `do_prediction_new_concept` w/ patches: rename kwarg
  `thresholding_for_some_classes` →
  `thresholding_for_artificial_class`
- `do_prediction_new_concept`: introduce kwarg `artificial_class`
  (for baked constant 4)
- `do_prediction`: introduce kwarg `artificial_class`
  (for baked constant 2)
- `do_prediction/_new_concept`: rename kwargs
  `thresholding_for..._in_light_version` →
  `thresholding_for...`
- `do_prediction`: rename kwarg
  `threshold_art_class_textline` →
  `threshold_art_class`
- `do_prediction_new_concept`: rename kwarg
  `threshold_art_class_layout` →
  `threshold_art_class`
2026-03-02 13:08:11 +01:00
Robert Sachunsky
b9cf68b51a training: fix b6d2440c 2026-03-01 20:00:05 +01:00
Robert Sachunsky
686f1d34aa do_prediction*: simplify (esp. indexing/slicing) 2026-03-01 04:37:20 +01:00
Robert Sachunsky
3b56fa2a5b training: plot GT/prediction and metrics before training (commented) 2026-02-28 20:11:12 +01:00
Robert Sachunsky
e47653f684 training: move nCC metric/loss to .metrics and rename…
- `num_connected_components_regression` → `connected_components_loss`
- move from training.train to training.metrics
2026-02-28 20:11:12 +01:00
Robert Sachunsky
361d40c064 training: improve nCC metric/loss - measure localized congruence…
- instead of just comparing the number of connected components,
  calculate the GT/pred label incidence matrix and retrieve the
  share of singular values (i.e. nearly diagonal under reordering)
  over total counts as similarity score
- also, suppress artificial class in that
2026-02-28 20:11:12 +01:00
Robert Sachunsky
7e06ab2c8c training: add config param add_ncc_loss for layout/binarization…
- add `metrics.metrics_superposition` and `metrics.Superposition`
- if non-zero, mix configured loss with weighted nCC metric
2026-02-28 20:11:12 +01:00
Robert Sachunsky
c6d9dd7945 training: use mixed precision and XLA (commented; does not work, yet) 2026-02-28 20:10:53 +01:00
Robert Sachunsky
c1d8a72edc training: shuffle tf.data pipelines 2026-02-28 20:10:53 +01:00
Robert Sachunsky
1cff937e72 training: make data pipeline in 7888fa5 more efficient 2026-02-28 20:10:53 +01:00
Robert Sachunsky
f8dd5a328c training: make plotting 18607e0f more efficient…
- avoid control dependencies in model path
- store only every 3rd sample
2026-02-28 20:10:53 +01:00
Robert Sachunsky
2d5de8e595 training.models: use bilinear instead of nearest upsampling…
(to benefit from CUDA optimization)
2026-02-27 12:48:28 +01:00
Robert Sachunsky
ba954d6314 training.models: fix daa084c3 2026-02-27 12:47:59 +01:00
Robert Sachunsky
7c3aeda65e training.models: fix 9b66867c 2026-02-27 12:40:56 +01:00
Robert Sachunsky
439ca350dd training: add metric ConfusionMatrix and plot it to TensorBoard 2026-02-26 13:55:37 +01:00
Robert Sachunsky
b6d2440ce1 training.utils.preprocess_imgs: fix polymorphy in 27f43c1
(Functions cannot be both generators and procedures,
 so make this a pure generator and save the image files
 on the caller's side; also avoids passing output
 directories)

Moreover, simplify by moving the `os.listdir` into the function
body (saving lots of extra variable bindings).
2026-02-25 20:39:15 +01:00
Robert Sachunsky
42bab0f935 docs/train: document --missing-printspace=project 2026-02-25 13:18:40 +01:00
Robert Sachunsky
4202a1b2db training.generate-gt.pagexml2label: add --missing-printspace
- keep default (fallback to full page), but warn
- new option `skip`
- new option `project`
2026-02-25 11:16:21 +01:00
Robert Sachunsky
7823ea2c95 training.train: add early stopping for OCR 2026-02-25 00:16:07 +01:00
Robert Sachunsky
36e370aa45 training.train: add validation data for OCR 2026-02-25 00:10:43 +01:00
Robert Sachunsky
b399db3c00 training.models: simplify CTC loss layer 2026-02-24 20:43:50 +01:00
Robert Sachunsky
92fc2bd815 training.train: fix data batching for OCR in 27f43c17 2026-02-24 20:42:08 +01:00
Robert Sachunsky
86b009bc31 training.utils.preprocess_imgs: fix file name stemming 27f43c17 2026-02-24 20:41:08 +01:00
Robert Sachunsky
20a3672be3 training.utils.preprocess_imgs: fix file shuffling in 27f43c17 2026-02-24 20:37:44 +01:00
Robert Sachunsky
658dade0d4 training.config_params: flip_index needed for scaling_flip, too 2026-02-24 20:36:00 +01:00
Robert Sachunsky
abf111de76 training: add metric for (same) number of connected components
(in trying to capture region instance separability)
2026-02-24 17:03:21 +01:00
Robert Sachunsky
18607e0f48 training: plot predictions to TB logs along with training/testing 2026-02-24 17:00:48 +01:00
Robert Sachunsky
56833b3f55 training: fix data representation in 7888fa5
(Eynollah models expet BGR/float instead of RGB/int)
2026-02-24 16:46:19 +01:00
Robert Sachunsky
a9496bbc70 enhancer/mbreorder: use std Keras data loader for classification 2026-02-17 18:39:30 +01:00
Robert Sachunsky
003c88f18a fix double import in 82266f82 2026-02-17 18:23:32 +01:00
Robert Sachunsky
f61effe8ce fix typo in c8240905 2026-02-17 18:20:58 +01:00
Robert Sachunsky
5f71333649 fix missing import in 49261fa9 2026-02-17 18:11:49 +01:00
Robert Sachunsky
67fca82f38 fix missing import in 27f43c17 2026-02-17 18:09:15 +01:00
Robert Sachunsky
6a4163ae56 fix typo in 27f43c17 2026-02-17 18:09:15 +01:00
Robert Sachunsky
c1b5cc92af fix typo in 7562317d 2026-02-17 18:09:15 +01:00
Robert Sachunsky
7bef8fa95a training.train: add verbose=1 consistently 2026-02-17 18:09:15 +01:00
Robert Sachunsky
9b66867c21 training.models: re-use transformer builder code 2026-02-17 18:09:15 +01:00
Robert Sachunsky
daa084c367 training.models: re-use UNet decoder builder code 2026-02-17 18:09:15 +01:00
Robert Sachunsky
fcd10c3956 training.models: re-use RESNET50 builder (+weight init) code 2026-02-17 18:09:15 +01:00
Robert Sachunsky
4414f7b89b training.models.vit_resnet50_unet: re-use IMAGE_ORDERING 2026-02-17 14:18:32 +01:00
Robert Sachunsky
7888fa5968 training: remove data_gen in favor of tf.data pipelines
instead of looping over file pairs indefinitely, yielding
Numpy arrays: re-use `keras.utils.image_dataset_from_directory`
here as well, but with img/label generators zipped together

(thus, everything will already be loaded/prefetched on the GPU)
2026-02-17 12:44:45 +01:00
Robert Sachunsky
83c2408192 training.utils.data_gen: avoid repeated array allocation 2026-02-17 12:44:45 +01:00