eynollah

mirror of https://github.com/qurator-spk/eynollah.git synced 2026-07-14 07:39:15 +02:00

Author	SHA1	Message	Date
Robert Sachunsky	0d21b62aee	disable autosized prediction entirely (also for _patched)… When `338c4a0e` wrapped all prediction models for automatic image size adaptation in CUDA, - tiling (`_patched`) was indeed faster - whole (`_resized`) was actually slower But CUDA-based tiling also increases GPU memory requirements a lot. And with the new parallel subprocess predictors, Numpy- based tiling is not necessarily slower anymore.	2026-04-10 18:23:10 +02:00
Robert Sachunsky	ccef63f08b	get_regions: always use resized/enhanced image… (avoid strange image handling short-cut, which uses early cropped image used for column classification instead of normal image in 1/2-column cases; fixes accuracy issues of region_1_2 model on these images)	2026-04-10 18:17:51 +02:00
Robert Sachunsky	04da66ed73	training: plot only ~ 1000 training and ~ 100 validation images	2026-03-30 13:34:05 +02:00
Robert Sachunsky	a8556f5210	run: sort parallel log messages by file name instead of prefixing… (as follow-up to `ec08004f`:) - create log queues and QueueListener separately for each job - receive job logs sequentially - drop log filter mechanism (prefixing log messages by file name) - also count ratio of successful jobs	2026-03-30 13:18:40 +02:00
Robert Sachunsky	1756443605	fixup device sel	2026-03-16 15:35:07 +01:00
Robert Sachunsky	6bbdcc39ef	CLI/Eynollah.setup_models/ModelZoo.load_models: add device option/kwarg allow setting device specifier to load models into either - CPU or single GPU0, GPU1 etc - per-model patterns, e.g. col:CPU,page:GPU0,:GPU1 pass through as kwargs until `ModelZoo.load_models()` setup up TF	2026-03-15 04:54:04 +01:00
Robert Sachunsky	67e9f84b54	`do_prediction*` for "col_classifier": pass array as float16 instead of float64	2026-03-15 03:20:39 +01:00
Robert Sachunsky	f54deff452	model_zoo/predictor: use one subprocess per model… - Eynollah: instead of one `Predictor` instance as stand-in for entire `ModelZoo`, keep the latter but have each model in `_loaded` dict become an independent predictor instance - `ModelZoo.load_models()`: instantiate `Predictor`s for each `model_category` and then call `Predictor.load_model()` on them - `Predictor.load_model()`: set args/kwargs for `ModelZoo.load_model()`, then spawn subprocess via `.start()`, which first enters `setup()`... - `Predictor.setup()`: call `ModelZoo.load_model()` instead of (plural) `.load_models()`; save to `self.model` instead of `self.model_zoo` - `ModelZoo.load_model()`: move _all_ CUDA configuration and TF/Keras-specific module initialization here (to be used only by predictor subprocess) - `Predictor`: drop stand-in `SingleModelPredictor` retrieved by `get()`; directly provide `predict()` and `output_shape` via `self.call()` - `Predictor`: drop `model` arg from all queues - now implicit; use `self.name` for model name in messages - `Predictor`: no need for requeuing other tasks (only same model now) - `Predictor`: reduce rebatching batch sizes due to increased VRAM footprint - `Eynollah.setup_models()`: set up loading `_patched` / `_resized` here instead of during `ModelZoo.load_model()` - `ModelZoo.load_models()`: for resized/patched models, call `Predictor.load_model()` with kwarg instead of resp. model name suffix - `ModelZoo.load_model()`: expect boolean kwargs `patched/resized` for `wrap_layout_model_patched/resized` model wrappers, respectively	2026-03-15 02:53:37 +01:00
Robert Sachunsky	c514bbc661	make switching between autosized and looped tiling easier	2026-03-14 02:16:26 +01:00
Robert Sachunsky	2f3b622cf5	predictor: rebatch tasks to increase CUDA throughput… - depending on model type (i.e. size), configure target batch sizes - after receiving a prediction task for some model, look up target batch size, then try to retrieve arrays from follow-up tasks for the same model on the task queue; stop when either no tasks are immediately available or when the combined batch size (input batch size * number of tasks) reaches the target - push back tasks for other models to the queue - rebatch: read all shared arrays, concatenate them along axis 0, map respective job ids they came from - predict on new (possibly larger) batch - split result along axis 0 into number of jobs - send each result along with its jobid to task queue	2026-03-14 00:52:34 +01:00
Robert Sachunsky	b550725cc5	wrap_layout_model_patched: simplify shape calculation	2026-03-14 00:51:22 +01:00
Robert Sachunsky	d6404dbbc2	`do_prediction*`: pass arrays as float16 instead of float64 to TF	2026-03-14 00:49:26 +01:00
Robert Sachunsky	135064a48e	model_zoo: `region` model not used at runtime anymore - don't load	2026-03-14 00:48:52 +01:00
Robert Sachunsky	ec08004fb0	run: add QueueListener to pool / QueueHandler to workers… - set up a Queue and QueueListener along with ProcessPoolExecutor, delegating messages from the queue to all handlers - in forked subprocesses, instead of just inheriting handlers, replace them with a single QueueHandler, and make sure log messages get prefixes by the respective job id (img_filename) so concurrent messages will still be readable - in the predictor, make sure to pass on the log level to the spawned subprocess, too	2026-03-14 00:43:58 +01:00
Robert Sachunsky	b7aa1d24cc	CLI: drop redundant negative option forms, add `--num-jobs`	2026-03-13 18:22:25 +01:00
Robert Sachunsky	576e120ba6	autosized prediction is only faster for _patched, not for _resized… When `338c4a0e` wrapped all prediction models for automatic image size adaptation in CUDA, - tiling (`_patched`) was indeed faster - whole (`_resized`) was actually slower So this reverts the latter part.	2026-03-13 18:15:30 +01:00
Robert Sachunsky	6d55f297a5	run: use ProcessPoolExecutor for parallel `run_single` across pages… - reintroduce ProcessPoolExecutor (previously for parallel deskewing within pages) - wrap Eynollah instance into global, so (with forking) serialization can be avoided – same pattern as in core ocrd.Processor - move timing/logging into `run_single()`, respectively	2026-03-13 10:15:51 +01:00
Robert Sachunsky	96cfddf92d	split_textregion_main_vs_header: avoid zero division	2026-03-13 02:44:08 +01:00
Robert Sachunsky	4e9b062b84	separate_marginals_to_left_and_right...: simplify	2026-03-13 02:44:08 +01:00
Robert Sachunsky	ae0f194241	drop ProcessPoolExecutor for intra-page parallel subprocessing… (interferes with inter-page parallelism, not as useful)	2026-03-13 02:44:08 +01:00
Robert Sachunsky	becf031c65	refactor to remove data-dependency from all Eynollah methods… - `cache_images()`: only return an image dict (plus extra keys for file name stem and dpi) - don't set any attributes - `imread()`: just take from passed image dict, also add `binary` key - `resize_and_enhance_image_with_column_classifier()`: * `imread()` from image dict * set `img_bin` key for binarization result if `input_binary` * instead of `image_page_org_size` / `page_coord` attributes, set `img_page` / `coord_page` in image dict * instead of retval, set `img_res` in image dict * also set `scale_x` and `scale_y` in image dict, resp. * simplify - `resize_image_with_column_classifier()`: * `imread()` from image dict * (as in `resize_and_enhance_with_column_classifier`:) call `calculate_width_height_by_columns_1_2` if `num_col` is 1 or 2 here * instead of retval, set `img_res` in image dict * also set `scale_x` and `scale_y` in image dict, resp. * simplify - `calculate_width_height_by_columns()`: simplify, get confidence of num_col instead of entire array - `extract_page()`: read `img_res` from image dict; simplify - `early_page_for_num_of_column_classification()`: `imread()` from image dict; simplify - `textline_contours()`: no need for `num_col_classifier` here - `run_textline()`: no need for `num_col_classifier` here - `get_regions_light_v()` → `get_regions()`: read `img_res` from image dict * get shapes via `img` from image dict instead of `image_org` attr * use `img_page` / `coord_page` from image dict instead of attrs * avoid unnecessary 3-channel arrays * simplify - `get_tables_from_model()`: no need for `num_col_classifier` here - `run_graphics_and_columns_light()` → `run_graphics_and_columns()`: * pass through image dict instead of `img_bin` (which really was `img_res`) * simplify - `run_graphics_and_columns_without_layout()`: * pass through image dict instead of `img_bin` (which really was `img_res`) * simplify - `run_enhancement()`: pass through image dict - `get_image_and_sclaes()`: drop - `run_boxes_full_layout()`: pass `image_page` instead of `img_bin` (which really was `image_page`) * simplify - `run()`: * instantiate plotter outside of loop, and independent of img files * move writer instantiation and overwrite checks into `run_single()` * add try/catch for `run_single()` w/ logging - `reset_file_name_dir`: drop - `run_single()`: * add some args/kwargs from `run()` * call `cache_images()` (reading image dict) here * instantiate writer here instead of (reused) attr in `run()` * set `scale_x` / `scale_y` in writer from image dict once known (i.e. after `run_enhancement()`) * don't return anything, but write PAGE result here - `check_any_text_region_in_model_one_is_main_or_header_light()` → `split_textregion_main_vs_header()` - plotter: * pass `name` (file stem) from image dict to all methods * for `write_images_into_directory()`: also `scale_x` and `scale_y` from image dict - writer: * init with width/height - ocrd processor: * adapt (just `run_single()` call) * drop `max_workers=1` restriction (can now run fully parallel) - `get_textregion_contours_in_org_image_light()` → `get_textregion_confidences()`: * take shape from confmat directly instead of extra array * simplify	2026-03-13 02:44:08 +01:00
Robert Sachunsky	800c55b826	predictor: fix spawn vs fork / parent vs child contexts	2026-03-13 02:44:07 +01:00
Robert Sachunsky	64281768a9	run_graphics_and_columns_light: fix double 1-off error… When `num_col_classifier` predicted result gets bypassed by heuristic result from `find_num_col()` (because prediction had too little confidence or `calculate_width_height_by_columns()` would have become too large), do not increment `num_col` further (already 1 more than colseps).	2026-03-12 10:18:14 +01:00
Robert Sachunsky	46c5f52491	CLI: don't append `/models_eynollah` here (already in default_specs)	2026-03-11 02:40:53 +01:00
Robert Sachunsky	10214dfdda	predictor: make sure all shared arrays get freed eventually	2026-03-11 02:40:53 +01:00
Robert Sachunsky	cf5caa1eca	predictor: fix termination for pytests… - rename `terminate` → `stopped` - call `terminate()` from superclass during shutdown - del `self.model_zoo` in the parent process after spawn, and in the child during shutdown	2026-03-11 02:40:53 +01:00
Robert Sachunsky	bb468bf68f	predictor: mp.Value must come from spawn context, too	2026-03-11 02:27:47 +01:00
Robert Sachunsky	9f127a0783	introduce predictor subprocess for exclusive GPU processing… - new class `Predictor(multiprocessing.Process)` as stand-in for EynollahModelZoo: * calling `load_models()` starts the subprocess (and has `.model_zoo.load_models()` run internally) * calling `get()` yields a stand-in that supports `.predict()`, which actually communicates with the singleton subprocess via task and result queues, sharing Numpy arrays via SHM * calling `predict()` with an empty dict (instead of an image) merely retrieves the respective model's output shapes (cached) * shared memory objects for arrays are cleared as soon as possible * log messages are piped through QueueHandler / QueueListener * exceptions are passed through the queues, and raised afterwards - move all TF initialization to the predictor	2026-03-07 03:54:16 +01:00
Robert Sachunsky	6f4ec53f7e	wrap_layout_model_resized/patched: compile `call` instead of `predict` (so `predict()` can directly convert back to Numpy)	2026-03-07 03:52:14 +01:00
Robert Sachunsky	338c4a0edf	wrap layout models for prediction (image resize or tiling) all in TF (to avoid back and forth between CPU and GPU memory when looping over image patches) - `patch_encoder`: define `Model` subclasses which take an existing (layout segmentation) model in the constructor, and define a new `call()` using the existing model in a GPU-only `tf.function`: * `wrap_layout_model_resized`: just `tf.image.resize()` from input image to model size, then predict, then resize back * `wrap_layout_model_patched`: ditto if smaller than model size; otherwise use `tf.image.extract_patches` for patching in a sliding-window approach, then predict patches one by one, then `tf.scatter_nd` to reconstruct to image size - when compiling `tf.function` graph, make sure to use input signature with variable image size, but avoid retracing each new size sample - in `EynollahModelZoo.load_model` for relevant model types, also wrap the loaded model * by `wrap_layout_model_resized` under model name + `_resized` * by `wrap_layout_model_patched` under model name + `_patched` - introduce `do_prediction_new_concept_autosize`, replacing `do_prediction/_new_concept`, but using passed model's `predict` directly without resizing or tiling to model size - instead of `do_prediction/_new_concept(True, ...)`, now call `do_prediction_new_concept_autosize`, but with `_patched` appended to model name - instead of `do_prediction/_new_concept(False, ...)`, now call `do_prediction_new_concept_autosize`, but with `_resized` appended to model name	2026-03-07 03:33:44 +01:00
Robert Sachunsky	f33fd57da8	model_zoo: resolve path names coming in from caller (CLI) (to make relative paths work)	2026-03-05 00:50:32 +01:00
Robert Sachunsky	41dccb216c	use (generalized) `do_prediction()` instead of `predict_enhancement()`	2026-03-05 00:50:32 +01:00
Robert Sachunsky	341480e9a0	do_prediction: if img was too small for model, also upscale results (i.e. resize back to match original size after prediction)	2026-03-05 00:50:32 +01:00
Robert Sachunsky	8ebbe65c17	`textline_contours`: remove unnecessary `resize_image`, simplify	2026-03-05 00:50:32 +01:00
Robert Sachunsky	3370a3aa85	do_prediction: avoid 3-channel results, simplify further… - `do_prediction/_new_concept`: avoid unnecessary `np.repeat` on results, aggregate intermediate artificial class mask and confidence data in extra arrays - callers: avoid unnecessary thresholding the result arrays - callers: adapt (no need to slice into channels) - simplify by refactoring thresholding and skeletonization into function `seg_mask_label` - `extract_text_regions`: drop unused second result array - `textline_contours`: avoid calculating unused unpatched prediction	2026-03-05 00:50:32 +01:00
Robert Sachunsky	ff7dc31a68	do_prediction*: rename identifiers for artificial class thresholding - `do_prediction_new_concept` w/ patches: remove branches for `thresholding_for_artificial_class` (never used, wrong name) - `do_prediction_new_concept` w/ patches: rename kwarg `thresholding_for_some_classes` → `thresholding_for_artificial_class` - `do_prediction_new_concept`: introduce kwarg `artificial_class` (for baked constant 4) - `do_prediction`: introduce kwarg `artificial_class` (for baked constant 2) - `do_prediction/_new_concept`: rename kwargs `thresholding_for..._in_light_version` → `thresholding_for...` - `do_prediction`: rename kwarg `threshold_art_class_textline` → `threshold_art_class` - `do_prediction_new_concept`: rename kwarg `threshold_art_class_layout` → `threshold_art_class`	2026-03-02 13:08:11 +01:00
Robert Sachunsky	b9cf68b51a	training: fix `b6d2440c`	2026-03-01 20:00:05 +01:00
Robert Sachunsky	686f1d34aa	do_prediction*: simplify (esp. indexing/slicing)	2026-03-01 04:37:20 +01:00
Robert Sachunsky	3b56fa2a5b	training: plot GT/prediction and metrics before training (commented)	2026-02-28 20:11:12 +01:00
Robert Sachunsky	e47653f684	training: move nCC metric/loss to .metrics and rename… - `num_connected_components_regression` → `connected_components_loss` - move from training.train to training.metrics	2026-02-28 20:11:12 +01:00
Robert Sachunsky	361d40c064	training: improve nCC metric/loss - measure localized congruence… - instead of just comparing the number of connected components, calculate the GT/pred label incidence matrix and retrieve the share of singular values (i.e. nearly diagonal under reordering) over total counts as similarity score - also, suppress artificial class in that	2026-02-28 20:11:12 +01:00
Robert Sachunsky	7e06ab2c8c	training: add config param add_ncc_loss for layout/binarization… - add `metrics.metrics_superposition` and `metrics.Superposition` - if non-zero, mix configured loss with weighted nCC metric	2026-02-28 20:11:12 +01:00
Robert Sachunsky	c6d9dd7945	training: use mixed precision and XLA (commented; does not work, yet)	2026-02-28 20:10:53 +01:00
Robert Sachunsky	c1d8a72edc	training: shuffle tf.data pipelines	2026-02-28 20:10:53 +01:00
Robert Sachunsky	1cff937e72	training: make data pipeline in `7888fa5` more efficient	2026-02-28 20:10:53 +01:00
Robert Sachunsky	f8dd5a328c	training: make plotting `18607e0f` more efficient… - avoid control dependencies in model path - store only every 3rd sample	2026-02-28 20:10:53 +01:00
Robert Sachunsky	2d5de8e595	training.models: use bilinear instead of nearest upsampling… (to benefit from CUDA optimization)	2026-02-27 12:48:28 +01:00
Robert Sachunsky	ba954d6314	training.models: fix `daa084c3`	2026-02-27 12:47:59 +01:00
Robert Sachunsky	7c3aeda65e	training.models: fix `9b66867c`	2026-02-27 12:40:56 +01:00
Robert Sachunsky	439ca350dd	training: add metric ConfusionMatrix and plot it to TensorBoard	2026-02-26 13:55:37 +01:00

1 2 3 4 5 ...

1422 commits