Commit graph

1393 commits

Author SHA1 Message Date
Robert Sachunsky
338c4a0edf wrap layout models for prediction (image resize or tiling) all in TF
(to avoid back and forth between CPU and GPU memory when looping
 over image patches)

- `patch_encoder`: define `Model` subclasses which take an existing
  (layout segmentation) model in the constructor, and define a new
  `call()` using the existing model in a GPU-only `tf.function`:
  * `wrap_layout_model_resized`: just `tf.image.resize()` from
    input image to model size, then predict, then resize back
  * `wrap_layout_model_patched`: ditto if smaller than model size;
    otherwise use `tf.image.extract_patches` for patching in a
    sliding-window approach, then predict patches one by one, then
    `tf.scatter_nd` to reconstruct to image size
- when compiling `tf.function` graph, make sure to use input signature
  with variable image size, but avoid retracing each new size sample
- in `EynollahModelZoo.load_model` for relevant model types,
  also wrap the loaded model
  * by `wrap_layout_model_resized` under model name + `_resized`
  * by `wrap_layout_model_patched` under model name + `_patched`
- introduce `do_prediction_new_concept_autosize`,
  replacing `do_prediction/_new_concept`,
  but using passed model's `predict` directly without
  resizing or tiling to model size
- instead of `do_prediction/_new_concept(True, ...)`,
  now call `do_prediction_new_concept_autosize`,
  but with `_patched` appended to model name
- instead of `do_prediction/_new_concept(False, ...)`,
  now call `do_prediction_new_concept_autosize`,
  but with `_resized` appended to model name
2026-03-07 03:33:44 +01:00
Robert Sachunsky
f33fd57da8 model_zoo: resolve path names coming in from caller (CLI)
(to make relative paths work)
2026-03-05 00:50:32 +01:00
Robert Sachunsky
41dccb216c use (generalized) do_prediction() instead of predict_enhancement() 2026-03-05 00:50:32 +01:00
Robert Sachunsky
341480e9a0 do_prediction: if img was too small for model, also upscale results
(i.e. resize back to match original size after prediction)
2026-03-05 00:50:32 +01:00
Robert Sachunsky
8ebbe65c17 textline_contours: remove unnecessary resize_image, simplify 2026-03-05 00:50:32 +01:00
Robert Sachunsky
3370a3aa85 do_prediction*: avoid 3-channel results, simplify further…
- `do_prediction/_new_concept`: avoid unnecessary `np.repeat`
  on results, aggregate intermediate artificial class mask and
  confidence data in extra arrays
- callers: avoid unnecessary thresholding the result arrays
- callers: adapt (no need to slice into channels)
- simplify by refactoring thresholding and skeletonization into
  function `seg_mask_label`
- `extract_text_regions*`: drop unused second result array
- `textline_contours`: avoid calculating unused unpatched prediction
2026-03-05 00:50:32 +01:00
Robert Sachunsky
ff7dc31a68 do_prediction*: rename identifiers for artificial class thresholding
- `do_prediction_new_concept` w/ patches: remove branches for
  `thresholding_for_artificial_class` (never used, wrong name)
- `do_prediction_new_concept` w/ patches: rename kwarg
  `thresholding_for_some_classes` →
  `thresholding_for_artificial_class`
- `do_prediction_new_concept`: introduce kwarg `artificial_class`
  (for baked constant 4)
- `do_prediction`: introduce kwarg `artificial_class`
  (for baked constant 2)
- `do_prediction/_new_concept`: rename kwargs
  `thresholding_for..._in_light_version` →
  `thresholding_for...`
- `do_prediction`: rename kwarg
  `threshold_art_class_textline` →
  `threshold_art_class`
- `do_prediction_new_concept`: rename kwarg
  `threshold_art_class_layout` →
  `threshold_art_class`
2026-03-02 13:08:11 +01:00
Robert Sachunsky
b9cf68b51a training: fix b6d2440c 2026-03-01 20:00:05 +01:00
Robert Sachunsky
686f1d34aa do_prediction*: simplify (esp. indexing/slicing) 2026-03-01 04:37:20 +01:00
Robert Sachunsky
3b56fa2a5b training: plot GT/prediction and metrics before training (commented) 2026-02-28 20:11:12 +01:00
Robert Sachunsky
e47653f684 training: move nCC metric/loss to .metrics and rename…
- `num_connected_components_regression` → `connected_components_loss`
- move from training.train to training.metrics
2026-02-28 20:11:12 +01:00
Robert Sachunsky
361d40c064 training: improve nCC metric/loss - measure localized congruence…
- instead of just comparing the number of connected components,
  calculate the GT/pred label incidence matrix and retrieve the
  share of singular values (i.e. nearly diagonal under reordering)
  over total counts as similarity score
- also, suppress artificial class in that
2026-02-28 20:11:12 +01:00
Robert Sachunsky
7e06ab2c8c training: add config param add_ncc_loss for layout/binarization…
- add `metrics.metrics_superposition` and `metrics.Superposition`
- if non-zero, mix configured loss with weighted nCC metric
2026-02-28 20:11:12 +01:00
Robert Sachunsky
c6d9dd7945 training: use mixed precision and XLA (commented; does not work, yet) 2026-02-28 20:10:53 +01:00
Robert Sachunsky
c1d8a72edc training: shuffle tf.data pipelines 2026-02-28 20:10:53 +01:00
Robert Sachunsky
1cff937e72 training: make data pipeline in 7888fa5 more efficient 2026-02-28 20:10:53 +01:00
Robert Sachunsky
f8dd5a328c training: make plotting 18607e0f more efficient…
- avoid control dependencies in model path
- store only every 3rd sample
2026-02-28 20:10:53 +01:00
Robert Sachunsky
2d5de8e595 training.models: use bilinear instead of nearest upsampling…
(to benefit from CUDA optimization)
2026-02-27 12:48:28 +01:00
Robert Sachunsky
ba954d6314 training.models: fix daa084c3 2026-02-27 12:47:59 +01:00
Robert Sachunsky
7c3aeda65e training.models: fix 9b66867c 2026-02-27 12:40:56 +01:00
Robert Sachunsky
439ca350dd training: add metric ConfusionMatrix and plot it to TensorBoard 2026-02-26 13:55:37 +01:00
Robert Sachunsky
b6d2440ce1 training.utils.preprocess_imgs: fix polymorphy in 27f43c1
(Functions cannot be both generators and procedures,
 so make this a pure generator and save the image files
 on the caller's side; also avoids passing output
 directories)

Moreover, simplify by moving the `os.listdir` into the function
body (saving lots of extra variable bindings).
2026-02-25 20:39:15 +01:00
Robert Sachunsky
42bab0f935 docs/train: document --missing-printspace=project 2026-02-25 13:18:40 +01:00
Robert Sachunsky
4202a1b2db training.generate-gt.pagexml2label: add --missing-printspace
- keep default (fallback to full page), but warn
- new option `skip`
- new option `project`
2026-02-25 11:16:21 +01:00
Robert Sachunsky
7823ea2c95 training.train: add early stopping for OCR 2026-02-25 00:16:07 +01:00
Robert Sachunsky
36e370aa45 training.train: add validation data for OCR 2026-02-25 00:10:43 +01:00
Robert Sachunsky
b399db3c00 training.models: simplify CTC loss layer 2026-02-24 20:43:50 +01:00
Robert Sachunsky
92fc2bd815 training.train: fix data batching for OCR in 27f43c17 2026-02-24 20:42:08 +01:00
Robert Sachunsky
86b009bc31 training.utils.preprocess_imgs: fix file name stemming 27f43c17 2026-02-24 20:41:08 +01:00
Robert Sachunsky
20a3672be3 training.utils.preprocess_imgs: fix file shuffling in 27f43c17 2026-02-24 20:37:44 +01:00
Robert Sachunsky
658dade0d4 training.config_params: flip_index needed for scaling_flip, too 2026-02-24 20:36:00 +01:00
Robert Sachunsky
abf111de76 training: add metric for (same) number of connected components
(in trying to capture region instance separability)
2026-02-24 17:03:21 +01:00
Robert Sachunsky
18607e0f48 training: plot predictions to TB logs along with training/testing 2026-02-24 17:00:48 +01:00
Robert Sachunsky
56833b3f55 training: fix data representation in 7888fa5
(Eynollah models expet BGR/float instead of RGB/int)
2026-02-24 16:46:19 +01:00
Robert Sachunsky
a9496bbc70 enhancer/mbreorder: use std Keras data loader for classification 2026-02-17 18:39:30 +01:00
Robert Sachunsky
003c88f18a fix double import in 82266f82 2026-02-17 18:23:32 +01:00
Robert Sachunsky
f61effe8ce fix typo in c8240905 2026-02-17 18:20:58 +01:00
Robert Sachunsky
5f71333649 fix missing import in 49261fa9 2026-02-17 18:11:49 +01:00
Robert Sachunsky
67fca82f38 fix missing import in 27f43c17 2026-02-17 18:09:15 +01:00
Robert Sachunsky
6a4163ae56 fix typo in 27f43c17 2026-02-17 18:09:15 +01:00
Robert Sachunsky
c1b5cc92af fix typo in 7562317d 2026-02-17 18:09:15 +01:00
Robert Sachunsky
7bef8fa95a training.train: add verbose=1 consistently 2026-02-17 18:09:15 +01:00
Robert Sachunsky
9b66867c21 training.models: re-use transformer builder code 2026-02-17 18:09:15 +01:00
Robert Sachunsky
daa084c367 training.models: re-use UNet decoder builder code 2026-02-17 18:09:15 +01:00
Robert Sachunsky
fcd10c3956 training.models: re-use RESNET50 builder (+weight init) code 2026-02-17 18:09:15 +01:00
Robert Sachunsky
4414f7b89b training.models.vit_resnet50_unet: re-use IMAGE_ORDERING 2026-02-17 14:18:32 +01:00
Robert Sachunsky
7888fa5968 training: remove data_gen in favor of tf.data pipelines
instead of looping over file pairs indefinitely, yielding
Numpy arrays: re-use `keras.utils.image_dataset_from_directory`
here as well, but with img/label generators zipped together

(thus, everything will already be loaded/prefetched on the GPU)
2026-02-17 12:44:45 +01:00
Robert Sachunsky
83c2408192 training.utils.data_gen: avoid repeated array allocation 2026-02-17 12:44:45 +01:00
Robert Sachunsky
514a897dd5 training.train: assert n_epochs vs. index_start 2026-02-17 12:44:45 +01:00
Robert Sachunsky
37338049af training: use relative imports 2026-02-17 12:44:45 +01:00