eynollah

mirror of https://github.com/qurator-spk/eynollah.git synced 2026-05-01 03:32:00 +02:00

Author	SHA1	Message	Date
Robert Sachunsky	0dfc9d911f	run_boxes_no_full_layout: also map to fl labels here… (because -mbro assumes the label set from -fl)	2026-04-20 18:20:58 +02:00
Robert Sachunsky	0015f2675b	with -slro, also extract and apply page (Border) mask	2026-04-20 18:20:58 +02:00
Robert Sachunsky	569b96d1a9	find_number_of_columns_in_document: pass correct label_seps… - in fl: 6 - non-fl: 3 (now fixed)	2026-04-20 18:20:58 +02:00
Robert Sachunsky	f28a9c9e0b	add confidence for all region types, prepare for textlines… - pass on probabilities from predicted class everywhere - rename `confidence_matrix` → `confidence_regions` / `regions_confidence` - rename `get_textregion_confidences()` → `get_region_confidences()` - add same for tables, textlines and regionsfl (full layout model) - aggregate per-region confidence lists for image, table, drop-capital, left marginal and right marginal regions - add in writer - simplify/re-indent some - try to replace more number literals with class label identifiers	2026-04-20 18:20:58 +02:00
Robert Sachunsky	1164b97917	extract_text_regions_new: fix heading thresholding… - re-introduce boosting `heading` thresholding broken when refactoring (light version and do_prediction) - also return confidence for full layout prediction	2026-04-20 18:20:58 +02:00
Robert Sachunsky	20dc5c3188	also cover drop-capital in (heuristic) reading order	2026-04-20 18:20:58 +02:00
Robert Sachunsky	92e94753c7	decoding of dropcaps in -fl: ensure consistency w/ early layout… 1. use connected component analysis to get unique segments in early prediction result 2. for each drop-capital segment in full prediction result, find matching early segment 3. when they have high overlap, assign drop-capital label to the entire early segment	2026-04-20 18:20:58 +02:00
Robert Sachunsky	29b42fdfaa	decoding of drop-capitals in full layout: also allow replacing img… - rename `putt_bb_of_drop_capitals_of_model_in_patches_in_layout` → `fill_bb_of_drop_capitals` - also allow image (besides text) label in early layout prediction result when checking if entire bbox can be filled (as opposed to just drop-capital \| image \| background mask) - simplify	2026-04-16 18:37:27 +02:00
Robert Sachunsky	6e0aed35f4	run_boxes_*: simplify, document class label mappings, start using identifier constants instead of literals for labels	2026-04-16 18:37:27 +02:00
Robert Sachunsky	f29e876a7c	return_boxes_of_images_by_order_of_reading_new: sep label differs w/o -fl… fix bug where in non-full mode, the wrong class label was assumed for separator regions (3 in non- vs 6 in full layout mode): - pass in separator mask instead of full segmentation map - rename for clarity: - `regions_without_separators` → `text_mask` (alread binary) - `regions_with_separators` → `sep_mask` (now just binary)	2026-04-16 05:16:23 +02:00
Robert Sachunsky	f5f2435a38	run_marginals: drop unnecessarily passing textline_mask, mask_seps, mask_images	2026-04-16 05:13:06 +02:00
Robert Sachunsky	9309586712	split_textregion_main_vs_header → split_textregion_main_vs_head… (and simplify)	2026-04-16 05:07:22 +02:00
Robert Sachunsky	0f82b568ba	do_prediction_new_concept: aggregate confidence for all classes… (not just text; will still have to pass that on to the writer...)	2026-04-16 05:02:20 +02:00
Robert Sachunsky	5a27e46b22	keep seps over artificial boundaries to improve col separation… (thresholding and decoding with artificial boundary class can overwrite existing column separators, which in turn can contribute to missing column boundaries; this prioritises seps over boundaries, which does not impair separation of instances, as seps will separate text/image/etc instances just as well as artificial boundaries)	2026-04-16 04:56:38 +02:00
Robert Sachunsky	9d6ff65e1d	get_tables_from_model: utilise artificial bound thresholding… (to improve separation of neighbouring tables, esp. across columns; since model's threshold class is particularly weak, also use lower threshold here)	2026-04-16 04:49:07 +02:00
Robert Sachunsky	12b1271487	layout cli: add option `--halt-fail`	2026-04-13 01:19:47 +02:00
Robert Sachunsky	56e6deb02c	predictor: jit-compile and precompile (non-autosized) models	2026-04-13 01:17:04 +02:00
Robert Sachunsky	01c54eb2ef	reduce inference batch sizes to accommodate 8 GB VRAM (still pending a solution for flexible batch sizes)	2026-04-13 01:15:25 +02:00
Robert Sachunsky	f44c39667e	predictor: disable rebatching (until we have flexible batch sizes)	2026-04-13 01:14:49 +02:00
Robert Sachunsky	219954d15b	predictor: use `predict_on_batch` instead of `predict`	2026-04-13 01:14:18 +02:00
Robert Sachunsky	0d21b62aee	disable autosized prediction entirely (also for _patched)… When `338c4a0e` wrapped all prediction models for automatic image size adaptation in CUDA, - tiling (`_patched`) was indeed faster - whole (`_resized`) was actually slower But CUDA-based tiling also increases GPU memory requirements a lot. And with the new parallel subprocess predictors, Numpy- based tiling is not necessarily slower anymore.	2026-04-10 18:23:10 +02:00
Robert Sachunsky	ccef63f08b	get_regions: always use resized/enhanced image… (avoid strange image handling short-cut, which uses early cropped image used for column classification instead of normal image in 1/2-column cases; fixes accuracy issues of region_1_2 model on these images)	2026-04-10 18:17:51 +02:00
Robert Sachunsky	04da66ed73	training: plot only ~ 1000 training and ~ 100 validation images	2026-03-30 13:34:05 +02:00
Robert Sachunsky	a8556f5210	run: sort parallel log messages by file name instead of prefixing… (as follow-up to `ec08004f`:) - create log queues and QueueListener separately for each job - receive job logs sequentially - drop log filter mechanism (prefixing log messages by file name) - also count ratio of successful jobs	2026-03-30 13:18:40 +02:00
Robert Sachunsky	1756443605	fixup device sel	2026-03-16 15:35:07 +01:00
Robert Sachunsky	6bbdcc39ef	CLI/Eynollah.setup_models/ModelZoo.load_models: add device option/kwarg allow setting device specifier to load models into either - CPU or single GPU0, GPU1 etc - per-model patterns, e.g. col:CPU,page:GPU0,:GPU1 pass through as kwargs until `ModelZoo.load_models()` setup up TF	2026-03-15 04:54:04 +01:00
Robert Sachunsky	67e9f84b54	`do_prediction*` for "col_classifier": pass array as float16 instead of float64	2026-03-15 03:20:39 +01:00
Robert Sachunsky	f54deff452	model_zoo/predictor: use one subprocess per model… - Eynollah: instead of one `Predictor` instance as stand-in for entire `ModelZoo`, keep the latter but have each model in `_loaded` dict become an independent predictor instance - `ModelZoo.load_models()`: instantiate `Predictor`s for each `model_category` and then call `Predictor.load_model()` on them - `Predictor.load_model()`: set args/kwargs for `ModelZoo.load_model()`, then spawn subprocess via `.start()`, which first enters `setup()`... - `Predictor.setup()`: call `ModelZoo.load_model()` instead of (plural) `.load_models()`; save to `self.model` instead of `self.model_zoo` - `ModelZoo.load_model()`: move _all_ CUDA configuration and TF/Keras-specific module initialization here (to be used only by predictor subprocess) - `Predictor`: drop stand-in `SingleModelPredictor` retrieved by `get()`; directly provide `predict()` and `output_shape` via `self.call()` - `Predictor`: drop `model` arg from all queues - now implicit; use `self.name` for model name in messages - `Predictor`: no need for requeuing other tasks (only same model now) - `Predictor`: reduce rebatching batch sizes due to increased VRAM footprint - `Eynollah.setup_models()`: set up loading `_patched` / `_resized` here instead of during `ModelZoo.load_model()` - `ModelZoo.load_models()`: for resized/patched models, call `Predictor.load_model()` with kwarg instead of resp. model name suffix - `ModelZoo.load_model()`: expect boolean kwargs `patched/resized` for `wrap_layout_model_patched/resized` model wrappers, respectively	2026-03-15 02:53:37 +01:00
Robert Sachunsky	c514bbc661	make switching between autosized and looped tiling easier	2026-03-14 02:16:26 +01:00
Robert Sachunsky	2f3b622cf5	predictor: rebatch tasks to increase CUDA throughput… - depending on model type (i.e. size), configure target batch sizes - after receiving a prediction task for some model, look up target batch size, then try to retrieve arrays from follow-up tasks for the same model on the task queue; stop when either no tasks are immediately available or when the combined batch size (input batch size * number of tasks) reaches the target - push back tasks for other models to the queue - rebatch: read all shared arrays, concatenate them along axis 0, map respective job ids they came from - predict on new (possibly larger) batch - split result along axis 0 into number of jobs - send each result along with its jobid to task queue	2026-03-14 00:52:34 +01:00
Robert Sachunsky	b550725cc5	wrap_layout_model_patched: simplify shape calculation	2026-03-14 00:51:22 +01:00
Robert Sachunsky	d6404dbbc2	`do_prediction*`: pass arrays as float16 instead of float64 to TF	2026-03-14 00:49:26 +01:00
Robert Sachunsky	135064a48e	model_zoo: `region` model not used at runtime anymore - don't load	2026-03-14 00:48:52 +01:00
Robert Sachunsky	ec08004fb0	run: add QueueListener to pool / QueueHandler to workers… - set up a Queue and QueueListener along with ProcessPoolExecutor, delegating messages from the queue to all handlers - in forked subprocesses, instead of just inheriting handlers, replace them with a single QueueHandler, and make sure log messages get prefixes by the respective job id (img_filename) so concurrent messages will still be readable - in the predictor, make sure to pass on the log level to the spawned subprocess, too	2026-03-14 00:43:58 +01:00
Robert Sachunsky	b7aa1d24cc	CLI: drop redundant negative option forms, add `--num-jobs`	2026-03-13 18:22:25 +01:00
Robert Sachunsky	576e120ba6	autosized prediction is only faster for _patched, not for _resized… When `338c4a0e` wrapped all prediction models for automatic image size adaptation in CUDA, - tiling (`_patched`) was indeed faster - whole (`_resized`) was actually slower So this reverts the latter part.	2026-03-13 18:15:30 +01:00
Robert Sachunsky	6d55f297a5	run: use ProcessPoolExecutor for parallel `run_single` across pages… - reintroduce ProcessPoolExecutor (previously for parallel deskewing within pages) - wrap Eynollah instance into global, so (with forking) serialization can be avoided – same pattern as in core ocrd.Processor - move timing/logging into `run_single()`, respectively	2026-03-13 10:15:51 +01:00
Robert Sachunsky	96cfddf92d	split_textregion_main_vs_header: avoid zero division	2026-03-13 02:44:08 +01:00
Robert Sachunsky	4e9b062b84	separate_marginals_to_left_and_right...: simplify	2026-03-13 02:44:08 +01:00
Robert Sachunsky	ae0f194241	drop ProcessPoolExecutor for intra-page parallel subprocessing… (interferes with inter-page parallelism, not as useful)	2026-03-13 02:44:08 +01:00
Robert Sachunsky	becf031c65	refactor to remove data-dependency from all Eynollah methods… - `cache_images()`: only return an image dict (plus extra keys for file name stem and dpi) - don't set any attributes - `imread()`: just take from passed image dict, also add `binary` key - `resize_and_enhance_image_with_column_classifier()`: * `imread()` from image dict * set `img_bin` key for binarization result if `input_binary` * instead of `image_page_org_size` / `page_coord` attributes, set `img_page` / `coord_page` in image dict * instead of retval, set `img_res` in image dict * also set `scale_x` and `scale_y` in image dict, resp. * simplify - `resize_image_with_column_classifier()`: * `imread()` from image dict * (as in `resize_and_enhance_with_column_classifier`:) call `calculate_width_height_by_columns_1_2` if `num_col` is 1 or 2 here * instead of retval, set `img_res` in image dict * also set `scale_x` and `scale_y` in image dict, resp. * simplify - `calculate_width_height_by_columns()`: simplify, get confidence of num_col instead of entire array - `extract_page()`: read `img_res` from image dict; simplify - `early_page_for_num_of_column_classification()`: `imread()` from image dict; simplify - `textline_contours()`: no need for `num_col_classifier` here - `run_textline()`: no need for `num_col_classifier` here - `get_regions_light_v()` → `get_regions()`: read `img_res` from image dict * get shapes via `img` from image dict instead of `image_org` attr * use `img_page` / `coord_page` from image dict instead of attrs * avoid unnecessary 3-channel arrays * simplify - `get_tables_from_model()`: no need for `num_col_classifier` here - `run_graphics_and_columns_light()` → `run_graphics_and_columns()`: * pass through image dict instead of `img_bin` (which really was `img_res`) * simplify - `run_graphics_and_columns_without_layout()`: * pass through image dict instead of `img_bin` (which really was `img_res`) * simplify - `run_enhancement()`: pass through image dict - `get_image_and_sclaes()`: drop - `run_boxes_full_layout()`: pass `image_page` instead of `img_bin` (which really was `image_page`) * simplify - `run()`: * instantiate plotter outside of loop, and independent of img files * move writer instantiation and overwrite checks into `run_single()` * add try/catch for `run_single()` w/ logging - `reset_file_name_dir`: drop - `run_single()`: * add some args/kwargs from `run()` * call `cache_images()` (reading image dict) here * instantiate writer here instead of (reused) attr in `run()` * set `scale_x` / `scale_y` in writer from image dict once known (i.e. after `run_enhancement()`) * don't return anything, but write PAGE result here - `check_any_text_region_in_model_one_is_main_or_header_light()` → `split_textregion_main_vs_header()` - plotter: * pass `name` (file stem) from image dict to all methods * for `write_images_into_directory()`: also `scale_x` and `scale_y` from image dict - writer: * init with width/height - ocrd processor: * adapt (just `run_single()` call) * drop `max_workers=1` restriction (can now run fully parallel) - `get_textregion_contours_in_org_image_light()` → `get_textregion_confidences()`: * take shape from confmat directly instead of extra array * simplify	2026-03-13 02:44:08 +01:00
Robert Sachunsky	800c55b826	predictor: fix spawn vs fork / parent vs child contexts	2026-03-13 02:44:07 +01:00
Robert Sachunsky	64281768a9	run_graphics_and_columns_light: fix double 1-off error… When `num_col_classifier` predicted result gets bypassed by heuristic result from `find_num_col()` (because prediction had too little confidence or `calculate_width_height_by_columns()` would have become too large), do not increment `num_col` further (already 1 more than colseps).	2026-03-12 10:18:14 +01:00
Robert Sachunsky	46c5f52491	CLI: don't append `/models_eynollah` here (already in default_specs)	2026-03-11 02:40:53 +01:00
Robert Sachunsky	10214dfdda	predictor: make sure all shared arrays get freed eventually	2026-03-11 02:40:53 +01:00
Robert Sachunsky	cf5caa1eca	predictor: fix termination for pytests… - rename `terminate` → `stopped` - call `terminate()` from superclass during shutdown - del `self.model_zoo` in the parent process after spawn, and in the child during shutdown	2026-03-11 02:40:53 +01:00
Robert Sachunsky	bb468bf68f	predictor: mp.Value must come from spawn context, too	2026-03-11 02:27:47 +01:00
Robert Sachunsky	9f127a0783	introduce predictor subprocess for exclusive GPU processing… - new class `Predictor(multiprocessing.Process)` as stand-in for EynollahModelZoo: * calling `load_models()` starts the subprocess (and has `.model_zoo.load_models()` run internally) * calling `get()` yields a stand-in that supports `.predict()`, which actually communicates with the singleton subprocess via task and result queues, sharing Numpy arrays via SHM * calling `predict()` with an empty dict (instead of an image) merely retrieves the respective model's output shapes (cached) * shared memory objects for arrays are cleared as soon as possible * log messages are piped through QueueHandler / QueueListener * exceptions are passed through the queues, and raised afterwards - move all TF initialization to the predictor	2026-03-07 03:54:16 +01:00
Robert Sachunsky	6f4ec53f7e	wrap_layout_model_resized/patched: compile `call` instead of `predict` (so `predict()` can directly convert back to Numpy)	2026-03-07 03:52:14 +01:00
Robert Sachunsky	338c4a0edf	wrap layout models for prediction (image resize or tiling) all in TF (to avoid back and forth between CPU and GPU memory when looping over image patches) - `patch_encoder`: define `Model` subclasses which take an existing (layout segmentation) model in the constructor, and define a new `call()` using the existing model in a GPU-only `tf.function`: * `wrap_layout_model_resized`: just `tf.image.resize()` from input image to model size, then predict, then resize back * `wrap_layout_model_patched`: ditto if smaller than model size; otherwise use `tf.image.extract_patches` for patching in a sliding-window approach, then predict patches one by one, then `tf.scatter_nd` to reconstruct to image size - when compiling `tf.function` graph, make sure to use input signature with variable image size, but avoid retracing each new size sample - in `EynollahModelZoo.load_model` for relevant model types, also wrap the loaded model * by `wrap_layout_model_resized` under model name + `_resized` * by `wrap_layout_model_patched` under model name + `_patched` - introduce `do_prediction_new_concept_autosize`, replacing `do_prediction/_new_concept`, but using passed model's `predict` directly without resizing or tiling to model size - instead of `do_prediction/_new_concept(True, ...)`, now call `do_prediction_new_concept_autosize`, but with `_patched` appended to model name - instead of `do_prediction/_new_concept(False, ...)`, now call `do_prediction_new_concept_autosize`, but with `_resized` appended to model name	2026-03-07 03:33:44 +01:00

1 2 3 4 5 ...

1442 commits