eynollah

mirror of https://github.com/qurator-spk/eynollah.git synced 2026-08-03 01:12:46 +02:00

Author	SHA1	Message	Date
Robert Sachunsky	7e8b9311d3	Revert "test_model_zoo: fix calls" This reverts commit `5a98f55be3`.	2026-05-19 03:32:37 +02:00
Robert Sachunsky	a1449da1d1	Revert "fix model loading in mb_ro and ocr" This reverts commit `218a95e6a0`.	2026-05-19 03:32:19 +02:00
kba	1df32eba87	CD: base docker image: typo {,v}3.13.0	2026-05-11 13:41:30 +02:00
kba	d7337a3080	CD: base docker image on versioned ocrd/core-cuda-tf2:v3.13.0	2026-05-11 13:38:36 +02:00
kba	e612db2bb1	📦 v0.8.0	2026-05-11 13:16:30 +02:00
kba	6cfbd93ac7	📝 changelog	2026-05-11 13:14:56 +02:00
kba	c7104c2852	Merge branch 'prepare-release-v0.8.0'	2026-05-11 13:12:19 +02:00
kba	5a98f55be3	test_model_zoo: fix calls	2026-05-11 12:22:24 +02:00
kba	218a95e6a0	fix model loading in mb_ro and ocr	2026-05-11 12:19:20 +02:00
kba	2035b07b55	Merge remote-tracking branch 'bertsky/ro-fixes-final' into prepare-release-v0.8.0 # Conflicts: # requirements-ocr.txt	2026-05-11 09:46:17 +02:00
Robert Sachunsky	db87aa995d	reqs for OCR: relax `ad5f2272` (depending on Python version)	2026-05-11 03:15:54 +02:00
Robert Sachunsky	e183937c5d	separate_lines_new2: fix coord overflow by clipping, simplify… - found positive and negative peaks, and even more so their relative offsets, may overflow in the cropped image, causing fake textlines; avoid that by clipping to the valid y coordinates - calculation for number of tiles: sometimes one less tile is needed by making the previous last tile half-full on the right side - add some (commented) plotting - simplify (a lot, but only partially)	2026-05-11 03:09:02 +02:00
Robert Sachunsky	130f0aee42	do_work_of_slopes_curved: improve on `d257869d`… - relative images now need larger relative min_area (i.e. compensation factors) - do not attempt (even) single-line skew estimation (via linear regression) if there is no (large enough) contour at all - avoid re-computing `mask_parent` - add some (commented) plotting	2026-05-11 03:03:04 +02:00
kba	ce5d6bc43c	try to accomodate outdaten Python versions unsupported by current transformers	2026-05-09 18:03:40 +02:00
kba	03f3f9af17	update model zoo and docs to link to v0_8_0 model release on zenodo	2026-05-09 17:58:59 +02:00
Robert Sachunsky	a61fb09ec5	CI: drop py3.8 (u/a for new req `transformers >= 5`)	2026-05-09 04:14:49 +02:00
Robert Sachunsky	4406a0299e	update CLI test for binarization… - update expected log messages	2026-05-09 04:12:19 +02:00
Robert Sachunsky	4cd398bd0d	standalone binarization: update, simplify… - re-use Eynollah base class, drop copied code - simplify `run()` and `run_single()` - delegate to `do_prediction()` instead of custom (old) tiling loop - drop `predict()` - add `--device` option to CLI as well	2026-05-09 04:12:02 +02:00
Robert Sachunsky	29abae0144	update CLI test for enhancer… - update expected log messages - force `-ncu 3`, because otherwise the example images would not be deemed in need of enhancement	2026-05-09 02:59:52 +02:00
Robert Sachunsky	c1b6a61301	standalone enhancer: make this work (at all)… - re-use Eynollah base class, drop copied code - write usable `run()` and `run_single()` - delegate to `resize_image_with_column_classifier()` for column classifier, resizing and enhancement, instead of `resize_and_enhance_image_with_column_classifier()` (which does _not_ actually enhance) - drop unused `predict_enhancement()` - add defaults to `num_col` options (always numeric) - add `--device` option to CLI as well	2026-05-09 02:55:01 +02:00
Robert Sachunsky	d63ce5538c	resize_image_with_column_classifier(): apply num_col bounds here too use rules from `resize_and_enhance_image_with_column_classifier()` and apply them to `resize_image_with_column_classifier()` as well (to be used by enhancer CLI)	2026-05-09 02:53:04 +02:00
Robert Sachunsky	6df2144c0f	fix 2 typos in previous commits… - `becf031c65` - `cefe596f8b`	2026-05-09 02:31:22 +02:00
Robert Sachunsky	daf0c90d6e	Merge pull request #8 from bertsky/ro-fixes-training-reload training: reload models	2026-05-08 18:46:43 +02:00
Robert Sachunsky	395decd6d6	Merge pull request #7 from qurator-spk/ro-fixes-training-reload-additions Ro fixes training reload additions	2026-05-08 18:45:28 +02:00
Robert Sachunsky	3a9d72d3fc	Merge pull request #6 from qurator-spk/update-cd Deploy versioned docker images and update transformers	2026-05-08 18:44:49 +02:00
Robert Sachunsky	ea8f985ff1	apply cropping only after textline and early layout… (because old models seem to fare better that way, despite training documentation)	2026-05-08 18:41:47 +02:00
Robert Sachunsky	58afdf5e87	do_prediction*(): ensure always returns dtype=uint8	2026-05-08 17:36:31 +02:00
Robert Sachunsky	68a26a5c3f	do_prediction*(): smooth window transitions with sigmoid… instead of hard cut-offs between overlapping window tiles, apply sigmoid attenuation to slide from one to the next (apply all postprocessing in the end)	2026-05-08 05:18:00 +02:00
Robert Sachunsky	cefe596f8b	do_prediction*(): avoid unnecessary tiles, simplify… - calculation for number of tiles: sometimes one less tile is needed by making the previous last tile half-full on the right side - calculation of window margins: fix case if dimension extends to full image shape - simplify (identifiers, slicing etc)	2026-05-08 00:55:18 +02:00
kba	a0bf1b51f4	makefile to reload models	2026-05-07 19:30:29 +02:00
kba	34a9d458ce	training deps: use sacred fork w/o pkg_resources, pin tf/tf_keras, protobuf packages to work with tensorflow_addons	2026-05-07 18:09:27 +02:00
kba	2747385f89	remove unused deprecating-warning-causing biopyton dependency	2026-05-07 17:15:15 +02:00
Robert Sachunsky	d8c83d6137	make_valid(): avoid oversimplification, improve parameter search	2026-05-05 15:00:16 +02:00
Robert Sachunsky	45868e99cd	get_slopes_and_deskew_new_light2: ignore tiny contour areas	2026-05-04 15:55:00 +02:00
Robert Sachunsky	934ac90e92	get_slopes_and_deskew_new_light2: avoid +/- 90° cancellation… in `estimate_skew_contours()`, distinguish between angle stats scattering around <45° vs >45°: in the latter case, use modulo 180° for averages - to avoid cancelling out +90° with -90°	2026-05-04 15:52:07 +02:00
Robert Sachunsky	29bb55ceff	return_deskew_slop: no >90° search unless for full page, simplify	2026-05-01 00:27:00 +02:00
Robert Sachunsky	d7a3f4cec6	training: add cfg param `reload_weights` for building but loading… - introduce `config_params` key `reload_weights` - add respective section for all model types: - build fresh model from code - load existing weights from `dir_of_start_model` - save to `dir_output` under same basename as existing model (but without optimizer and metrics; which does not work currently) - exit immediately (i.e. no actual training) - reorder so reload_weights is after compilation but before data loading	2026-04-30 16:54:26 +02:00
Robert Sachunsky	cbb3be0e01	add diagnostic plotting for prediction masking (commented)	2026-04-30 16:12:00 +02:00
Robert Sachunsky	33c055389d	bold `run_single` refactoring (predict segmentation on cropped img)… - move `extract_page()` to the start (right after enhancement), so early layout and textline model prediction sees cropped image - `extract_page()`: also return page mask - `get_early_layout()`: * use cropped image * also run optional table prediction here, map table label and confidence already (so no need to pass these arrays everywhere) * suppress all non-text type regions in textline mask * also return text+table mask (so no need to reconstruct it everywhere) - apply page mask to textline mask and early layout result (i.e. suppress areas beyond border contour) - `run_graphics_and_columns()`: * rename → `run_columns()` * no table prediction here * no page extraction here * no page cropping+masking here * no textline mask suppression here - `run_graphics_and_columns_without_layout()`: drop (not needed anymore) - `run_marginals()` vs. `get_marginals()`: extract `text_mask` internally from early layout - early page cropping for col-classifier: also use cropped image in input binarization mode - early page cropping for col-classifier: get external contours instead of indiscriminate tree - writer: skip layout mode now also uses cropped coordinates (so drop kwarg for it)	2026-04-30 16:12:00 +02:00
Robert Sachunsky	7e7cc6a801	do_order_of_regions(): use region mask instead of textline mask… for local (within-box) ordering of region contours, use the same text mask (merely eroded) as for the contour extraction itself: the text+table+drop mask from early+full layout prediction, rather than the textline mask, because the latter may be empty in some boxes and is unlikely to be more useful than the region mask itself	2026-04-30 16:11:59 +02:00
Robert Sachunsky	63df9be4db	find_number_of_columns_in_document(): pass in (reuse) masks	2026-04-30 16:11:59 +02:00
Robert Sachunsky	da9e00cfe5	consistently handle textline mask with respect to drop-capital mask… - suppress drop-capital in textline mask for textline contours - elevate drop-capital in textline mask for reading order boxes	2026-04-30 16:11:59 +02:00
Robert Sachunsky	2641171fb1	return_boxes_...order_of_reading...: avoid negative slices… fix rare bug when horizontal separators are detected by the very top (of a major vertical part of the page), causing box intervals to become negative	2026-04-30 16:11:59 +02:00
Robert Sachunsky	6a92f0d49c	make get_deskewed_masks() unconditional, call only when needed	2026-04-30 16:11:59 +02:00
Robert Sachunsky	52eb4c9a0a	move label definition and deskewing cancellation up	2026-04-30 16:11:59 +02:00
Robert Sachunsky	fa882e1dbe	move `run_boxes_order()` call to RO section of `run_single()`	2026-04-30 16:11:59 +02:00
Robert Sachunsky	d88bd485ff	get_slopes*(): does not need passing boxes separately	2026-04-30 16:11:59 +02:00
Robert Sachunsky	869646cbf5	`get_full_layout()` does not need the textline mask	2026-04-30 16:11:59 +02:00
Robert Sachunsky	b5bc161a4c	extract_page(): get external contours instead of indiscriminate tree	2026-04-30 16:11:59 +02:00
Robert Sachunsky	287bebde0d	get_marginals(): fix height factor for mask resizing	2026-04-30 16:11:59 +02:00

1 2 3 4 5 ...

1527 commits