eynollah

mirror of https://github.com/qurator-spk/eynollah.git synced 2026-06-16 09:59:13 +02:00

Author	SHA1	Message	Date
Robert Sachunsky	f5f2435a38	run_marginals: drop unnecessarily passing textline_mask, mask_seps, mask_images	2026-04-16 05:13:06 +02:00
Robert Sachunsky	9309586712	split_textregion_main_vs_header → split_textregion_main_vs_head… (and simplify)	2026-04-16 05:07:22 +02:00
Robert Sachunsky	0f82b568ba	do_prediction_new_concept: aggregate confidence for all classes… (not just text; will still have to pass that on to the writer...)	2026-04-16 05:02:20 +02:00
Robert Sachunsky	5a27e46b22	keep seps over artificial boundaries to improve col separation… (thresholding and decoding with artificial boundary class can overwrite existing column separators, which in turn can contribute to missing column boundaries; this prioritises seps over boundaries, which does not impair separation of instances, as seps will separate text/image/etc instances just as well as artificial boundaries)	2026-04-16 04:56:38 +02:00
Robert Sachunsky	9d6ff65e1d	get_tables_from_model: utilise artificial bound thresholding… (to improve separation of neighbouring tables, esp. across columns; since model's threshold class is particularly weak, also use lower threshold here)	2026-04-16 04:49:07 +02:00
Robert Sachunsky	12b1271487	layout cli: add option `--halt-fail`	2026-04-13 01:19:47 +02:00
Robert Sachunsky	56e6deb02c	predictor: jit-compile and precompile (non-autosized) models	2026-04-13 01:17:04 +02:00
Robert Sachunsky	01c54eb2ef	reduce inference batch sizes to accommodate 8 GB VRAM (still pending a solution for flexible batch sizes)	2026-04-13 01:15:25 +02:00
Robert Sachunsky	f44c39667e	predictor: disable rebatching (until we have flexible batch sizes)	2026-04-13 01:14:49 +02:00
Robert Sachunsky	219954d15b	predictor: use `predict_on_batch` instead of `predict`	2026-04-13 01:14:18 +02:00
Robert Sachunsky	0d21b62aee	disable autosized prediction entirely (also for _patched)… When `338c4a0e` wrapped all prediction models for automatic image size adaptation in CUDA, - tiling (`_patched`) was indeed faster - whole (`_resized`) was actually slower But CUDA-based tiling also increases GPU memory requirements a lot. And with the new parallel subprocess predictors, Numpy- based tiling is not necessarily slower anymore.	2026-04-10 18:23:10 +02:00
Robert Sachunsky	ccef63f08b	get_regions: always use resized/enhanced image… (avoid strange image handling short-cut, which uses early cropped image used for column classification instead of normal image in 1/2-column cases; fixes accuracy issues of region_1_2 model on these images)	2026-04-10 18:17:51 +02:00
Robert Sachunsky	04da66ed73	training: plot only ~ 1000 training and ~ 100 validation images	2026-03-30 13:34:05 +02:00
Robert Sachunsky	a8556f5210	run: sort parallel log messages by file name instead of prefixing… (as follow-up to `ec08004f`:) - create log queues and QueueListener separately for each job - receive job logs sequentially - drop log filter mechanism (prefixing log messages by file name) - also count ratio of successful jobs	2026-03-30 13:18:40 +02:00
Robert Sachunsky	1756443605	fixup device sel	2026-03-16 15:35:07 +01:00
Robert Sachunsky	6bbdcc39ef	CLI/Eynollah.setup_models/ModelZoo.load_models: add device option/kwarg allow setting device specifier to load models into either - CPU or single GPU0, GPU1 etc - per-model patterns, e.g. col:CPU,page:GPU0,:GPU1 pass through as kwargs until `ModelZoo.load_models()` setup up TF	2026-03-15 04:54:04 +01:00
Robert Sachunsky	67e9f84b54	`do_prediction*` for "col_classifier": pass array as float16 instead of float64	2026-03-15 03:20:39 +01:00
Robert Sachunsky	f54deff452	model_zoo/predictor: use one subprocess per model… - Eynollah: instead of one `Predictor` instance as stand-in for entire `ModelZoo`, keep the latter but have each model in `_loaded` dict become an independent predictor instance - `ModelZoo.load_models()`: instantiate `Predictor`s for each `model_category` and then call `Predictor.load_model()` on them - `Predictor.load_model()`: set args/kwargs for `ModelZoo.load_model()`, then spawn subprocess via `.start()`, which first enters `setup()`... - `Predictor.setup()`: call `ModelZoo.load_model()` instead of (plural) `.load_models()`; save to `self.model` instead of `self.model_zoo` - `ModelZoo.load_model()`: move _all_ CUDA configuration and TF/Keras-specific module initialization here (to be used only by predictor subprocess) - `Predictor`: drop stand-in `SingleModelPredictor` retrieved by `get()`; directly provide `predict()` and `output_shape` via `self.call()` - `Predictor`: drop `model` arg from all queues - now implicit; use `self.name` for model name in messages - `Predictor`: no need for requeuing other tasks (only same model now) - `Predictor`: reduce rebatching batch sizes due to increased VRAM footprint - `Eynollah.setup_models()`: set up loading `_patched` / `_resized` here instead of during `ModelZoo.load_model()` - `ModelZoo.load_models()`: for resized/patched models, call `Predictor.load_model()` with kwarg instead of resp. model name suffix - `ModelZoo.load_model()`: expect boolean kwargs `patched/resized` for `wrap_layout_model_patched/resized` model wrappers, respectively	2026-03-15 02:53:37 +01:00
Robert Sachunsky	c514bbc661	make switching between autosized and looped tiling easier	2026-03-14 02:16:26 +01:00
Robert Sachunsky	2f3b622cf5	predictor: rebatch tasks to increase CUDA throughput… - depending on model type (i.e. size), configure target batch sizes - after receiving a prediction task for some model, look up target batch size, then try to retrieve arrays from follow-up tasks for the same model on the task queue; stop when either no tasks are immediately available or when the combined batch size (input batch size * number of tasks) reaches the target - push back tasks for other models to the queue - rebatch: read all shared arrays, concatenate them along axis 0, map respective job ids they came from - predict on new (possibly larger) batch - split result along axis 0 into number of jobs - send each result along with its jobid to task queue	2026-03-14 00:52:34 +01:00
Robert Sachunsky	b550725cc5	wrap_layout_model_patched: simplify shape calculation	2026-03-14 00:51:22 +01:00
Robert Sachunsky	d6404dbbc2	`do_prediction*`: pass arrays as float16 instead of float64 to TF	2026-03-14 00:49:26 +01:00
Robert Sachunsky	135064a48e	model_zoo: `region` model not used at runtime anymore - don't load	2026-03-14 00:48:52 +01:00
Robert Sachunsky	ec08004fb0	run: add QueueListener to pool / QueueHandler to workers… - set up a Queue and QueueListener along with ProcessPoolExecutor, delegating messages from the queue to all handlers - in forked subprocesses, instead of just inheriting handlers, replace them with a single QueueHandler, and make sure log messages get prefixes by the respective job id (img_filename) so concurrent messages will still be readable - in the predictor, make sure to pass on the log level to the spawned subprocess, too	2026-03-14 00:43:58 +01:00
Robert Sachunsky	b7aa1d24cc	CLI: drop redundant negative option forms, add `--num-jobs`	2026-03-13 18:22:25 +01:00
Robert Sachunsky	576e120ba6	autosized prediction is only faster for _patched, not for _resized… When `338c4a0e` wrapped all prediction models for automatic image size adaptation in CUDA, - tiling (`_patched`) was indeed faster - whole (`_resized`) was actually slower So this reverts the latter part.	2026-03-13 18:15:30 +01:00
Robert Sachunsky	6d55f297a5	run: use ProcessPoolExecutor for parallel `run_single` across pages… - reintroduce ProcessPoolExecutor (previously for parallel deskewing within pages) - wrap Eynollah instance into global, so (with forking) serialization can be avoided – same pattern as in core ocrd.Processor - move timing/logging into `run_single()`, respectively	2026-03-13 10:15:51 +01:00
Robert Sachunsky	96cfddf92d	split_textregion_main_vs_header: avoid zero division	2026-03-13 02:44:08 +01:00
Robert Sachunsky	4e9b062b84	separate_marginals_to_left_and_right...: simplify	2026-03-13 02:44:08 +01:00
Robert Sachunsky	ae0f194241	drop ProcessPoolExecutor for intra-page parallel subprocessing… (interferes with inter-page parallelism, not as useful)	2026-03-13 02:44:08 +01:00
Robert Sachunsky	becf031c65	refactor to remove data-dependency from all Eynollah methods… - `cache_images()`: only return an image dict (plus extra keys for file name stem and dpi) - don't set any attributes - `imread()`: just take from passed image dict, also add `binary` key - `resize_and_enhance_image_with_column_classifier()`: * `imread()` from image dict * set `img_bin` key for binarization result if `input_binary` * instead of `image_page_org_size` / `page_coord` attributes, set `img_page` / `coord_page` in image dict * instead of retval, set `img_res` in image dict * also set `scale_x` and `scale_y` in image dict, resp. * simplify - `resize_image_with_column_classifier()`: * `imread()` from image dict * (as in `resize_and_enhance_with_column_classifier`:) call `calculate_width_height_by_columns_1_2` if `num_col` is 1 or 2 here * instead of retval, set `img_res` in image dict * also set `scale_x` and `scale_y` in image dict, resp. * simplify - `calculate_width_height_by_columns()`: simplify, get confidence of num_col instead of entire array - `extract_page()`: read `img_res` from image dict; simplify - `early_page_for_num_of_column_classification()`: `imread()` from image dict; simplify - `textline_contours()`: no need for `num_col_classifier` here - `run_textline()`: no need for `num_col_classifier` here - `get_regions_light_v()` → `get_regions()`: read `img_res` from image dict * get shapes via `img` from image dict instead of `image_org` attr * use `img_page` / `coord_page` from image dict instead of attrs * avoid unnecessary 3-channel arrays * simplify - `get_tables_from_model()`: no need for `num_col_classifier` here - `run_graphics_and_columns_light()` → `run_graphics_and_columns()`: * pass through image dict instead of `img_bin` (which really was `img_res`) * simplify - `run_graphics_and_columns_without_layout()`: * pass through image dict instead of `img_bin` (which really was `img_res`) * simplify - `run_enhancement()`: pass through image dict - `get_image_and_sclaes()`: drop - `run_boxes_full_layout()`: pass `image_page` instead of `img_bin` (which really was `image_page`) * simplify - `run()`: * instantiate plotter outside of loop, and independent of img files * move writer instantiation and overwrite checks into `run_single()` * add try/catch for `run_single()` w/ logging - `reset_file_name_dir`: drop - `run_single()`: * add some args/kwargs from `run()` * call `cache_images()` (reading image dict) here * instantiate writer here instead of (reused) attr in `run()` * set `scale_x` / `scale_y` in writer from image dict once known (i.e. after `run_enhancement()`) * don't return anything, but write PAGE result here - `check_any_text_region_in_model_one_is_main_or_header_light()` → `split_textregion_main_vs_header()` - plotter: * pass `name` (file stem) from image dict to all methods * for `write_images_into_directory()`: also `scale_x` and `scale_y` from image dict - writer: * init with width/height - ocrd processor: * adapt (just `run_single()` call) * drop `max_workers=1` restriction (can now run fully parallel) - `get_textregion_contours_in_org_image_light()` → `get_textregion_confidences()`: * take shape from confmat directly instead of extra array * simplify	2026-03-13 02:44:08 +01:00
Robert Sachunsky	800c55b826	predictor: fix spawn vs fork / parent vs child contexts	2026-03-13 02:44:07 +01:00
Robert Sachunsky	64281768a9	run_graphics_and_columns_light: fix double 1-off error… When `num_col_classifier` predicted result gets bypassed by heuristic result from `find_num_col()` (because prediction had too little confidence or `calculate_width_height_by_columns()` would have become too large), do not increment `num_col` further (already 1 more than colseps).	2026-03-12 10:18:14 +01:00
Robert Sachunsky	46c5f52491	CLI: don't append `/models_eynollah` here (already in default_specs)	2026-03-11 02:40:53 +01:00
Robert Sachunsky	10214dfdda	predictor: make sure all shared arrays get freed eventually	2026-03-11 02:40:53 +01:00
Robert Sachunsky	cf5caa1eca	predictor: fix termination for pytests… - rename `terminate` → `stopped` - call `terminate()` from superclass during shutdown - del `self.model_zoo` in the parent process after spawn, and in the child during shutdown	2026-03-11 02:40:53 +01:00
Robert Sachunsky	bb468bf68f	predictor: mp.Value must come from spawn context, too	2026-03-11 02:27:47 +01:00
Robert Sachunsky	9f127a0783	introduce predictor subprocess for exclusive GPU processing… - new class `Predictor(multiprocessing.Process)` as stand-in for EynollahModelZoo: * calling `load_models()` starts the subprocess (and has `.model_zoo.load_models()` run internally) * calling `get()` yields a stand-in that supports `.predict()`, which actually communicates with the singleton subprocess via task and result queues, sharing Numpy arrays via SHM * calling `predict()` with an empty dict (instead of an image) merely retrieves the respective model's output shapes (cached) * shared memory objects for arrays are cleared as soon as possible * log messages are piped through QueueHandler / QueueListener * exceptions are passed through the queues, and raised afterwards - move all TF initialization to the predictor	2026-03-07 03:54:16 +01:00
Robert Sachunsky	6f4ec53f7e	wrap_layout_model_resized/patched: compile `call` instead of `predict` (so `predict()` can directly convert back to Numpy)	2026-03-07 03:52:14 +01:00
Robert Sachunsky	338c4a0edf	wrap layout models for prediction (image resize or tiling) all in TF (to avoid back and forth between CPU and GPU memory when looping over image patches) - `patch_encoder`: define `Model` subclasses which take an existing (layout segmentation) model in the constructor, and define a new `call()` using the existing model in a GPU-only `tf.function`: * `wrap_layout_model_resized`: just `tf.image.resize()` from input image to model size, then predict, then resize back * `wrap_layout_model_patched`: ditto if smaller than model size; otherwise use `tf.image.extract_patches` for patching in a sliding-window approach, then predict patches one by one, then `tf.scatter_nd` to reconstruct to image size - when compiling `tf.function` graph, make sure to use input signature with variable image size, but avoid retracing each new size sample - in `EynollahModelZoo.load_model` for relevant model types, also wrap the loaded model * by `wrap_layout_model_resized` under model name + `_resized` * by `wrap_layout_model_patched` under model name + `_patched` - introduce `do_prediction_new_concept_autosize`, replacing `do_prediction/_new_concept`, but using passed model's `predict` directly without resizing or tiling to model size - instead of `do_prediction/_new_concept(True, ...)`, now call `do_prediction_new_concept_autosize`, but with `_patched` appended to model name - instead of `do_prediction/_new_concept(False, ...)`, now call `do_prediction_new_concept_autosize`, but with `_resized` appended to model name	2026-03-07 03:33:44 +01:00
Robert Sachunsky	f33fd57da8	model_zoo: resolve path names coming in from caller (CLI) (to make relative paths work)	2026-03-05 00:50:32 +01:00
Robert Sachunsky	41dccb216c	use (generalized) `do_prediction()` instead of `predict_enhancement()`	2026-03-05 00:50:32 +01:00
Robert Sachunsky	341480e9a0	do_prediction: if img was too small for model, also upscale results (i.e. resize back to match original size after prediction)	2026-03-05 00:50:32 +01:00
Robert Sachunsky	8ebbe65c17	`textline_contours`: remove unnecessary `resize_image`, simplify	2026-03-05 00:50:32 +01:00
Robert Sachunsky	3370a3aa85	do_prediction: avoid 3-channel results, simplify further… - `do_prediction/_new_concept`: avoid unnecessary `np.repeat` on results, aggregate intermediate artificial class mask and confidence data in extra arrays - callers: avoid unnecessary thresholding the result arrays - callers: adapt (no need to slice into channels) - simplify by refactoring thresholding and skeletonization into function `seg_mask_label` - `extract_text_regions`: drop unused second result array - `textline_contours`: avoid calculating unused unpatched prediction	2026-03-05 00:50:32 +01:00
Robert Sachunsky	ff7dc31a68	do_prediction*: rename identifiers for artificial class thresholding - `do_prediction_new_concept` w/ patches: remove branches for `thresholding_for_artificial_class` (never used, wrong name) - `do_prediction_new_concept` w/ patches: rename kwarg `thresholding_for_some_classes` → `thresholding_for_artificial_class` - `do_prediction_new_concept`: introduce kwarg `artificial_class` (for baked constant 4) - `do_prediction`: introduce kwarg `artificial_class` (for baked constant 2) - `do_prediction/_new_concept`: rename kwargs `thresholding_for..._in_light_version` → `thresholding_for...` - `do_prediction`: rename kwarg `threshold_art_class_textline` → `threshold_art_class` - `do_prediction_new_concept`: rename kwarg `threshold_art_class_layout` → `threshold_art_class`	2026-03-02 13:08:11 +01:00
Robert Sachunsky	b9cf68b51a	training: fix `b6d2440c`	2026-03-01 20:00:05 +01:00
Robert Sachunsky	686f1d34aa	do_prediction*: simplify (esp. indexing/slicing)	2026-03-01 04:37:20 +01:00
Robert Sachunsky	3b56fa2a5b	training: plot GT/prediction and metrics before training (commented)	2026-02-28 20:11:12 +01:00
Robert Sachunsky	e47653f684	training: move nCC metric/loss to .metrics and rename… - `num_connected_components_regression` → `connected_components_loss` - move from training.train to training.metrics	2026-02-28 20:11:12 +01:00

1 2 3 4 5 ...

1482 commits