eynollah

mirror of https://github.com/qurator-spk/eynollah.git synced 2026-08-03 01:12:46 +02:00

Author	SHA1	Message	Date
kba	15ddb7750e	📦 v0.9.2	2026-07-28 15:40:46 +02:00
kba	027d6aaa26	📝 changelog	2026-07-28 13:59:53 +02:00
Sai Asish Y	42d383920d	return_deskew_slop: pass refined angles, not sigma_des, on landscape main page Signed-off-by: Sai Asish Y <say.apm35@gmail.com>	2026-07-26 18:29:07 -07:00
kba	88ca39dedb	📦 v0.9.1	2026-07-20 20:08:51 +02:00
kba	b10fb9927d	📝 changelog	2026-07-20 20:07:41 +02:00
kba	17028a2193	require core v3.13.2 for ONNX base image	2026-07-20 19:57:56 +02:00
kba	c94ff96057	📦 v0.9.0	2026-07-20 19:15:54 +02:00
Robert Sachunsky	b0b77ce2c7	📝 changelog	2026-07-20 19:14:42 +02:00
kba	21abde833d	Merge branch 'fix-0.8-modelzoo-and-predictor'	2026-07-20 19:12:44 +02:00
Robert Sachunsky	5e3fde31d9	calculate_width_height_by_columns: do allow highest enlargement when confident	2026-07-19 03:22:56 +02:00
Robert Sachunsky	d0a55a1fcb	reading order: drop hsep/head elongation mechanism (too many FP)	2026-07-19 03:21:38 +02:00
Robert Sachunsky	4f7c5675fc	ocr: ensure forked logging handlers also work under pytest	2026-07-18 18:02:40 +02:00
Robert Sachunsky	79a9bb0128	cnn-rnn-ocr: increase default batch size and VRAM limit	2026-07-18 00:59:57 +02:00
Robert Sachunsky	be8b161607	cnn-rnn-ocr: batch flipped line candidates together with rest	2026-07-18 00:37:19 +02:00
Robert Sachunsky	7e776612a4	cnn-rnn-ocr: if `dir_in_bin==dir_in`, then split PNG and rest… (supports common case that binarized images have same stem, but different file name extension)	2026-07-18 00:35:50 +02:00
Robert Sachunsky	6840b67961	ocr: run `dir_in` mode in parallel (like layout), too… - add CLI options `--num-jobs` and `--halt-fail` - separate `Eynollah_ocr.run_single()` to be scheduled - use ProcessPoolExecutor w/ forking and QueueListener - also skip if input XML file is missing - log processing times per job and overall	2026-07-18 00:24:08 +02:00
Robert Sachunsky	1c8ac38d31	OCR w/o `overwrite`: skip one file, not the entire run	2026-07-17 19:19:20 +02:00
Robert Sachunsky	aace571368	cnn-rnn-ocr: fix	2026-07-17 19:15:54 +02:00
Robert Sachunsky	5d129dc8c1	cnn-rnn-ocr: fix typo causing rare failures	2026-07-17 19:02:42 +02:00
Robert Sachunsky	4b9fa543ae	processor: fix typo (empty `model_overrides`)	2026-07-17 16:23:03 +02:00
Robert Sachunsky	5939845d1d	utils.contour.make_valid: be more robust (avoiding MultiPolygon)	2026-07-17 15:43:47 +02:00
Robert Sachunsky	c1b276fea1	processor: pass on more Eynollah parameters… - `device` selection - `model_overrides` (as dict; including relative path resolution), e.g. ```JSON { "binarization": { "": "models_inference_layout_v0_9_1/models_eynollah/eynollah-binarization_20210425.onnx" } } ``` - `skip_layout_and_reading_order` - `num_col_upper` - `num_col_lower` - `binarize` (for `input_binary`, which is a misnomer)	2026-07-17 15:37:39 +02:00
Robert Sachunsky	f579d12866	processor: resolve `models` path as processor resource	2026-07-17 12:56:19 +02:00
Robert Sachunsky	0a9b3097f1	processor: avoid writing XML twice (once by writer, once by OCR-D)	2026-07-17 12:53:13 +02:00
kba	d2755d1e93	update references to new v0_9_1 model release	2026-07-15 20:05:36 +02:00
kba	1440d454cc	models: bump version	2026-07-15 17:18:33 +02:00
kba	32c5e9ae76	add missing models, remove microsoft	2026-07-15 17:03:04 +02:00
kba	03915b24a5	models: symlinks for the model packages for zenodo upload	2026-07-15 16:02:49 +02:00
kba	909ccfd38b	Merge branch 'fix-0.8-modelzoo-and-predictor' of https://github.com/bertsky/eynollah into fix-0.8-modelzoo-and-predictor	2026-07-15 15:37:11 +02:00
Robert Sachunsky	9804d736ac	CI: avoid CUDA dependencies here	2026-07-15 13:59:43 +02:00
Robert Sachunsky	4d97e3bf7f	training: fix typos found by ruff	2026-07-15 12:46:33 +02:00
Robert Sachunsky	0ab6e19f33	configure ruff search path	2026-07-15 12:46:13 +02:00
Robert Sachunsky	b89e1b4296	ocrd-tool.json: differentiate inference-all and inference-layout	2026-07-15 04:24:33 +02:00
Robert Sachunsky	25865372d0	dependencies for `[OCR]`: add TF, avoid newer Torch (pulling CUDA 13)	2026-07-15 04:11:29 +02:00
Robert Sachunsky	efe4aa8b0c	mbreorder: fix init (wrong way to load model)	2026-07-15 04:10:47 +02:00
Robert Sachunsky	839d7b9a7e	ModelZoo: avoid Azure EP (displacing CPU)	2026-07-15 04:09:33 +02:00
kba	c5713e010e	deps: OCR requires explicit dep on tensorflow/keras now	2026-07-14 21:04:06 +02:00
kba	b4165114ea	update zenodo links to v0_9_0 models	2026-07-14 21:00:09 +02:00
kba	c43e219858	model packaging/uploading	2026-07-14 20:51:46 +02:00
Robert Sachunsky	83fca95914	Merge pull request #9 from qurator-spk/integrating_trocr_and_torch_ensembling_and_updating_characters_list-refactor Integrating trocr and torch ensembling and updating characters list refactor	2026-07-14 16:23:12 +02:00
Robert Sachunsky	1591d2091c	update docs	2026-07-14 16:22:47 +02:00
Robert Sachunsky	5dc9a3456c	Dockerfile: do not install OCR (as Torch and ONNX clash over CUDA)	2026-07-14 15:58:00 +02:00
Robert Sachunsky	12be983487	cnn-rnn-ocr inference: switch back to beam search, only run in TF… - training.models.CTCDecoder: prefer beam search over greedy (because it is more accurate) - training reload makefile: skip ONNX and TF-Serving conversion for cnn-rnn-ocr models (because these would not work) - training reload makefile: default to onnx and tf conversions for all models (tf for training and onnx for inference) instead of tf-serving export	2026-07-12 03:49:41 +02:00
Robert Sachunsky	c680dae2d1	ModelZoo for ONNX backend: allow setting execution providers via env	2026-07-10 17:22:50 +02:00
kba	39e054e718	train: remove tf-specifics and weird __getitem__ def from transformer-ocr setup	2026-07-09 19:23:21 +02:00
kba	affddd6c85	train: make preprocess_imgs_ocr work for transformer-ocr	2026-07-09 19:12:09 +02:00
kba	b2f3a8f2d8	Merge branch 'fix-0.8-modelzoo-and-predictor-kba0709' into integrating_trocr_and_torch_ensembling_and_updating_characters_list-refactor # Conflicts: # train/requirements.txt	2026-07-09 17:32:23 +02:00
kba	b1f2f43051	upgrade tf2onnx fork dep, remove spurious tf-data dep	2026-07-09 17:30:48 +02:00
kba	492fcbacb7	switch to fork of tf2onnx	2026-07-09 15:17:11 +02:00
kba	b9ba43b444	training: require tf2onnx and pin ml_dtypes >= 0.5	2026-07-09 14:37:51 +02:00

1 2 3 4 5 ...

1660 commits