eynollah

mirror of https://github.com/qurator-spk/eynollah.git synced 2026-08-03 09:22:32 +02:00

Author	SHA1	Message	Date
vahidrezanezhad	1b95f8f38d	threshold for textline ocr + new ocr model	2025-07-25 13:18:38 +02:00
Clemens Neudecker	2996fc8b30	Merge pull request #166 from qurator-spk/updating_readme_for_eynollah_use_cases-cli Updating readme for eynollah use cases cli	2025-07-24 15:30:57 +02:00
vahidrezanezhad	fd0595f920	Update Makefile	2025-07-24 13:52:38 +02:00
vahidrezanezhad	da141bb42e	resolving tests error	2025-07-23 16:44:17 +02:00
vahidrezanezhad	6b8893b188	Merge pull request #167 from qurator-spk/ocrd-fixes Ocrd fixes	2025-07-22 14:46:25 +02:00
vahidrezanezhad	daa597dbaa	should merged text for the whole page be written in xml?	2025-07-21 14:50:05 +02:00
vahidrezanezhad	673e67a847	update model names	2025-07-21 10:54:20 +02:00
vahidrezanezhad	fee40049cd	ocr model renamed - image text font for ocr result is now using Charis-7.000 font (downloaded from here https://software.sil.org/charis/download/)	2025-07-16 14:00:12 +02:00
vahidrezanezhad	04fead348f	ocr: make sure that image height or width is not zero	2025-07-03 15:24:52 +02:00
vahidrezanezhad	53dd4b26a9	decorated with confidence value for cnnrnn ocr model	2025-07-03 11:50:47 +02:00
vahidrezanezhad	1b222594d6	Update README.md: how to train model using docker image	2025-06-25 18:33:55 +02:00
vahidrezanezhad	f5a1d1a255	docker file to train model with desired cuda and cudnn	2025-06-25 18:24:16 +02:00
kba	b7b218ff11	OCR-D processor: same behavior as standalone wrt light_version/textline_light	2025-06-12 15:30:17 +02:00
vahidrezanezhad	c194a20c9c	Fixed duplicate textline_light assignments (true and false) in the OCR-D framework for the Eynollah light version, which caused rectangles to be used instead of contours for textlines	2025-06-12 15:27:22 +02:00
kba	32889ef1e0	adapt binarization CLI according to #156	2025-06-12 13:57:41 +02:00
vahidrezanezhad	9b4e78c55c	Fixed duplicate textline_light assignments (true and false) in the OCR-D framework for the Eynollah light version, which caused rectangles to be used instead of contours for textlines	2025-06-11 18:57:08 +02:00
Clemens Neudecker	0e7de52f5e	Merge pull request #24 from johnlockejrr/unifying-training-models Unifying training models	2025-06-03 09:00:56 +02:00
vahidrezanezhad	eb91000490	layout visualization updated	2025-06-02 18:23:34 +02:00
vahidrezanezhad	f79af201ab	Fix: Resolved OCR bug when text region type is undefined	2025-06-02 18:21:33 +02:00
vahidrezanezhad	e26c4ab9b4	image enhancer updated	2025-06-01 22:44:50 +02:00
vahidrezanezhad	9342b76038	saving enhanced image in org or scaled resolution	2025-06-01 22:10:13 +02:00
vahidrezanezhad	3b475915c7	image enhancer is integrated	2025-06-01 15:53:04 +02:00
vahidrezanezhad	df903aa1b4	Parametrize OCR for handling curved lines	2025-05-31 01:09:14 +02:00
vahidrezanezhad	1e7cecfcf9	updating ocr	2025-05-28 01:17:21 +02:00
vahidrezanezhad	03f52e7a46	updating ocr	2025-05-27 23:45:22 +02:00
vahidrezanezhad	31d9fa0c80	strings alignment function is added + new changes needed for prediction with both bin and rgb inputs is implemented	2025-05-25 21:44:36 +02:00
vahidrezanezhad	b18691f96a	rnn ocr for all layout textregion types	2025-05-25 03:33:54 +02:00
vahidrezanezhad	ba3420b2d8	Drop capitals are written separately and are not attached to their corresponding text line. The OCR use case also supports single-image input.	2025-05-25 01:12:58 +02:00
vahidrezanezhad	25e3a2a99f	visualizing ro for single xml file	2025-05-23 18:30:51 +02:00
vahidrezanezhad	0250a6d3d0	enhancing ocr	2025-05-23 18:06:53 +02:00
vahidrezanezhad	089029cec7	commit `21ec4fb` is picked + rnn ocr at the same time with segmentation + enhancement of mb reading order	2025-05-23 15:55:03 +02:00
vahidrezanezhad	ee2c7e9013	enhancing curved lines OCR	2025-05-21 17:42:44 +02:00
vahidrezanezhad	14b70c2556	Implement hyphenated textline merging in OCR engine and a bug fixed for curved textline OCR	2025-05-21 14:39:31 +02:00
vahidrezanezhad	3ad621e956	ocr for curved lines	2025-05-20 19:01:52 +02:00
vahidrezanezhad	44ff51f5c1	mb reading order now can be done faster. Text regions are clustered using dilation, and mb reading order needs to be implemented for fewer regions	2025-05-20 16:51:08 +02:00
vahidrezanezhad	5016039cd7	enhancing marginal detection for light version	2025-05-18 02:48:05 +02:00
vahidrezanezhad	f9390c71e7	updating inference for mb reading order	2025-05-17 02:18:27 +02:00
vahidrezanezhad	1cbc669d36	marginals detection enhanced for light version	2025-05-15 15:33:50 +02:00
vahidrezanezhad	1b229ba7ae	enhancement for vertical textlines	2025-05-15 00:45:22 +02:00
cneud	7a22e51f5d	resolve some comments from review	2025-05-14 21:56:03 +02:00
vahidrezanezhad	ed46615f00	enhance ocr for vertical textlines	2025-05-14 18:34:58 +02:00
johnlockejrr	25abc0fabc	Update gt_gen_utils.py Keep safely the full basename without extension	2025-05-14 03:34:51 -07:00
vahidrezanezhad	88e0315321	Accurately writing text line contours into xml file when the deskewing exceeds 45 degrees and the text line is in light mode	2025-05-13 15:53:05 +02:00
vahidrezanezhad	54088c6b04	The initial attempt at reading heavily deskewed or vertically aligned lines.	2025-05-13 14:40:57 +02:00
vahidrezanezhad	4a7728bb34	visuliazation layout from eynollah page-xml output	2025-05-12 22:39:47 +02:00
vahidrezanezhad	4ddc84dee8	visulizing textline detection from eynollah page-xml output	2025-05-12 18:31:40 +02:00
vahidrezanezhad	c12b09a868	I have tried to address the issues #163 and #161 . The changes have also improved marginal detection and enhanced the isolation of headers.	2025-05-12 00:10:18 +02:00
johnlockejrr	3a9fc0efde	Update utils.py Changed unsafe basename extraction: `file_name = i.split('.')[0]` to `file_name = os.path.splitext(i)[0]` and `filename = n[i].split('.')[0]` to `filename = os.path.splitext(n[i])[0]` because `"Vat.sam.2_206.jpg` -> `Vat` instead of `"Vat.sam.2_206`	2025-05-11 06:09:17 -07:00
johnlockejrr	6fa766d6a5	Update utils.py	2025-05-11 05:31:34 -07:00
vahidrezanezhad	21ec4fbfb5	The text region coordinates are now correctly written into the XML output when using the skip layout and reading order option	2025-05-07 14:04:01 +02:00

1 2 3 4 5 ...

1010 commits