eynollah

mirror of https://github.com/qurator-spk/eynollah.git synced 2026-03-02 13:22:00 +01:00

Author	SHA1	Message	Date
vahidrezanezhad	0711166524	changed the drop capitals bonding box to contour ratio threshold	2025-09-01 11:37:22 +02:00
vahidrezanezhad	e15640aa8a	new page extraction model integration	2025-09-15 13:36:58 +02:00
vahidrezanezhad	9b9d21d8ac	eynollah ocr: support using either a specific model name or a models directory (default model)	2025-08-28 11:30:59 +02:00
vahidrezanezhad	41365645ef	Marginals are divided into left and right, and written from top to bottom.	2025-08-26 22:38:03 +02:00
vahidrezanezhad	7741502876	reading order on given layout	2025-08-18 02:31:13 +02:00
vahidrezanezhad	268aa141d7	avoiding float in range	2025-08-12 12:50:15 +02:00
vahidrezanezhad	52d9cc9baf	deskewing with faster multiprocessing	2025-08-08 11:32:02 +02:00
vahidrezanezhad	322b04145f	use the latest ocr model with balanced fraktur-antiqua training dataset	2025-08-05 14:22:22 +02:00
vahidrezanezhad	1b95f8f38d	threshold for textline ocr + new ocr model	2025-07-25 13:18:38 +02:00
vahidrezanezhad	daa597dbaa	should merged text for the whole page be written in xml?	2025-07-21 14:50:05 +02:00
vahidrezanezhad	673e67a847	update model names	2025-07-21 10:54:20 +02:00
vahidrezanezhad	fee40049cd	ocr model renamed - image text font for ocr result is now using Charis-7.000 font (downloaded from here https://software.sil.org/charis/download/)	2025-07-16 14:00:12 +02:00
vahidrezanezhad	04fead348f	ocr: make sure that image height or width is not zero	2025-07-03 15:24:52 +02:00
vahidrezanezhad	53dd4b26a9	decorated with confidence value for cnnrnn ocr model	2025-07-03 11:50:47 +02:00
vahidrezanezhad	f79af201ab	Fix: Resolved OCR bug when text region type is undefined	2025-06-02 18:21:33 +02:00
vahidrezanezhad	e26c4ab9b4	image enhancer updated	2025-06-01 22:44:50 +02:00
vahidrezanezhad	9342b76038	saving enhanced image in org or scaled resolution	2025-06-01 22:10:13 +02:00
vahidrezanezhad	3b475915c7	image enhancer is integrated	2025-06-01 15:53:04 +02:00
vahidrezanezhad	df903aa1b4	Parametrize OCR for handling curved lines	2025-05-31 01:09:14 +02:00
vahidrezanezhad	1e7cecfcf9	updating ocr	2025-05-28 01:17:21 +02:00
vahidrezanezhad	03f52e7a46	updating ocr	2025-05-27 23:45:22 +02:00
vahidrezanezhad	31d9fa0c80	strings alignment function is added + new changes needed for prediction with both bin and rgb inputs is implemented	2025-05-25 21:44:36 +02:00
vahidrezanezhad	b18691f96a	rnn ocr for all layout textregion types	2025-05-25 03:33:54 +02:00
vahidrezanezhad	ba3420b2d8	Drop capitals are written separately and are not attached to their corresponding text line. The OCR use case also supports single-image input.	2025-05-25 01:12:58 +02:00
vahidrezanezhad	0250a6d3d0	enhancing ocr	2025-05-23 18:06:53 +02:00
vahidrezanezhad	089029cec7	commit `21ec4fb` is picked + rnn ocr at the same time with segmentation + enhancement of mb reading order	2025-05-23 15:55:03 +02:00
vahidrezanezhad	ee2c7e9013	enhancing curved lines OCR	2025-05-21 17:42:44 +02:00
vahidrezanezhad	14b70c2556	Implement hyphenated textline merging in OCR engine and a bug fixed for curved textline OCR	2025-05-21 14:39:31 +02:00
vahidrezanezhad	3ad621e956	ocr for curved lines	2025-05-20 19:01:52 +02:00
vahidrezanezhad	44ff51f5c1	mb reading order now can be done faster. Text regions are clustered using dilation, and mb reading order needs to be implemented for fewer regions	2025-05-20 16:51:08 +02:00
vahidrezanezhad	5016039cd7	enhancing marginal detection for light version	2025-05-18 02:48:05 +02:00
vahidrezanezhad	1cbc669d36	marginals detection enhanced for light version	2025-05-15 15:33:50 +02:00
vahidrezanezhad	1b229ba7ae	enhancement for vertical textlines	2025-05-15 00:45:22 +02:00
vahidrezanezhad	ed46615f00	enhance ocr for vertical textlines	2025-05-14 18:34:58 +02:00
vahidrezanezhad	88e0315321	Accurately writing text line contours into xml file when the deskewing exceeds 45 degrees and the text line is in light mode	2025-05-13 15:53:05 +02:00
vahidrezanezhad	54088c6b04	The initial attempt at reading heavily deskewed or vertically aligned lines.	2025-05-13 14:40:57 +02:00
vahidrezanezhad	c12b09a868	I have tried to address the issues #163 and #161 . The changes have also improved marginal detection and enhanced the isolation of headers.	2025-05-12 00:10:18 +02:00
vahidrezanezhad	89aa545049	let to add dataset abbrevation to extracted textline images and text	2025-05-03 02:59:16 +02:00
vahidrezanezhad	48e8dd4ab3	machine based model name changed to public one	2025-05-02 12:57:26 +02:00
vahidrezanezhad	a1a004b19d	inference batch size for ocr is passed as an argument	2025-05-02 12:53:33 +02:00
vahidrezanezhad	5d8c864c08	adding space between splitted textline predicted text in the case of trocr	2025-05-02 01:02:32 +02:00
vahidrezanezhad	184af46664	displaying detexted text on an image is provided for trocr case	2025-05-02 00:30:36 +02:00
Robert Sachunsky	21615a986d	OCR-D processor: expose reading_order_machine_based	2025-08-13 14:14:37 +02:00
kba	b7b218ff11	OCR-D processor: same behavior as standalone wrt light_version/textline_light	2025-06-12 15:30:17 +02:00
vahidrezanezhad	c194a20c9c	Fixed duplicate textline_light assignments (true and false) in the OCR-D framework for the Eynollah light version, which caused rectangles to be used instead of contours for textlines	2025-06-12 15:27:22 +02:00
vahidrezanezhad	e2da7a6239	Fix model name to return the correct machine-based model name	2025-04-30 16:06:29 +02:00
vahidrezanezhad	b227736094	Fix OCR text cleaning to correctly handle 'U', 'K', and 'N' starting sentence; update text line splitting size	2025-04-30 16:04:34 +02:00
vahidrezanezhad	4cb4414740	Resolve remaining issue with #158 and resolving #124	2025-04-30 16:01:52 +02:00
vahidrezanezhad	208bde706f	resolving issue #158	2025-04-30 13:55:09 +02:00
vahidrezanezhad	a22df11ebb	Restoring the contour in the original image caused an error due to an empty tuple. This issue has been resolved, and as expected, the confidence score for this contour is set to zero	2025-04-14 00:42:08 +02:00

1 2 3 4

161 commits