kba
|
8080bd823c
|
📦 v0.4.0
|
1 month ago |
vahidrezanezhad
|
e2907f67e0
|
'from PIL.Image import Image' causes an error when using Image.new(), and since Image is already imported, this line can be safely commented out.
|
1 month ago |
Robert Sachunsky
|
4339444e47
|
binarization CLI: fix option checks, simplify to asserts, fix dir_in mode
|
1 month ago |
Robert Sachunsky
|
91a340f619
|
CLI: simplify option checks to asserts (also avoid stack trace)
|
1 month ago |
Robert Sachunsky
|
e0a7fde537
|
logger: fix type hint
|
1 month ago |
Robert Sachunsky
|
108ce1f5a1
|
Merge remote-tracking branch 'origin/main' into v3-api-release-foreal
(bad-ass difficult diff diffing)
|
1 month ago |
vahidrezanezhad
|
2e3a29f66b
|
In light mode: To determine whether a main region is a header, I adjusted the ratio to achieve better results.
|
1 month ago |
vahidrezanezhad
|
38a2d60fa2
|
Confidence value for textregions and in the case of not light version is set to zero. This is done to let the pipeline go through. It will be updated to return the correct value in upcomming commits
|
1 month ago |
vahidrezanezhad
|
6b52da227c
|
docorating eynollah with textregion confidence score #135
|
1 month ago |
Robert Sachunsky
|
559d001eef
|
another fix to avoid frequent warnings
|
1 month ago |
Robert Sachunsky
|
dd478279a4
|
CLI: also --overwrite in single-image mode
|
1 month ago |
Robert Sachunsky
|
8159e6336a
|
fix typo (preventing log messages)
|
1 month ago |
Robert Sachunsky
|
2919538382
|
minor fixes to avoid frequent warnings
|
1 month ago |
Robert Sachunsky
|
dcf2ed5e22
|
run: also write out XML in single filename mode
|
1 month ago |
Robert Sachunsky
|
fe77171d45
|
run_single: reduce indentation
|
1 month ago |
Robert Sachunsky
|
79003a083c
|
CLI: ValueError instead of print+exit
|
1 month ago |
Robert Sachunsky
|
e17d34fafa
|
factor run_single() out of run(), simplify kwargs
|
1 month ago |
Robert Sachunsky
|
1a0a1cb00b
|
remove session methods and redundant model loaders
|
1 month ago |
Robert Sachunsky
|
dd51f900b9
|
OCR-D: init Eynollah in 'setup', re-use instance for each page via non-public API
|
1 month ago |
Robert Sachunsky
|
ffeb4a343d
|
Eynollah: remove useless 'pcgts' attr
|
1 month ago |
vahidrezanezhad
|
91b2201b07
|
cnnrnn Ocr: width of input textline image can not be zero!
|
1 month ago |
Robert Sachunsky
|
515b4023f6
|
sbb_binarize: fix missing reference
|
1 month ago |
vahidrezanezhad
|
4de441eaaa
|
OCR prediction is now enabled to integrate results from both RGB and binarized images or to be performed on each individually
|
1 month ago |
vahidrezanezhad
|
b1da0a3327
|
In OCR, the predicted text is now drawn on the image, and the results are saved in a specified directory. This makes it easier to review the predicted output
|
1 month ago |
Robert Sachunsky
|
c01609ff4e
|
allow even more empty imports for optional dependencies
|
1 month ago |
Robert Sachunsky
|
46618f4229
|
allow more empty imports for optional dependencies
|
1 month ago |
Robert Sachunsky
|
4be89910a2
|
CLI: fix arg vs kwarg from merge
|
2 months ago |
Robert Sachunsky
|
9d61acf173
|
simplify
|
2 months ago |
Robert Sachunsky
|
a1068ff2eb
|
OCR-D: move sbb-binarize to ocrd-tool.json, update to v3
|
2 months ago |
Robert Sachunsky
|
c794d4d29f
|
OCR-D: fix typo light_mode→light_version
|
2 months ago |
Robert Sachunsky
|
4338259ca1
|
OCR-D: ensure page image gets replaced in result as well if not the original file
|
2 months ago |
Robert Sachunsky
|
55969b0173
|
OCR-D: add docstring
|
2 months ago |
Robert Sachunsky
|
6d02e90570
|
OCR-D: restrict max_workers=1
|
2 months ago |
Robert Sachunsky
|
efd3fa6775
|
allow empty imports for optional dependencies
|
2 months ago |
Robert Sachunsky
|
238132e260
|
use 'image_filename' for pseudo-iteration outside 'dir_in' mode
|
2 months ago |
Robert Sachunsky
|
af4e2a4ffc
|
do not require 'dir_out' outside 'dir_in' mode
|
2 months ago |
Robert Sachunsky
|
ea136e3ddd
|
'overwrite' check: only in 'dir_in' mode
|
2 months ago |
Robert Sachunsky
|
1f4a17b60d
|
Merge remote-tracking branch 'origin/machine_based_reading_order_integration' into v3-api
|
2 months ago |
Robert Sachunsky
|
edf924c2cb
|
ocrd-tool: add dockerhub
|
2 months ago |
vahidrezanezhad
|
9b04688ebc
|
The rotate_image function has been updated. Additionally, the reading order is now correct in the case of the light version, provided that slope_deskew exceeds the slope_threshold.
|
2 months ago |
vahidrezanezhad
|
cf40f9ecc5
|
The rotate_image function produces the exact same rotation as Imutils. Therefore, there is no need to retain the remove-imutils-1 branch.
|
2 months ago |
vahidrezanezhad
|
f756b08c9b
|
Revert "replace usages of `imutils` with opencv equivalents"
|
2 months ago |
vahidrezanezhad
|
52c605185a
|
Merge pull request #146 from qurator-spk/remove-imutils-1
replace usages of `imutils` with opencv equivalents
|
2 months ago |
vahidrezanezhad
|
6f36c7177f
|
For OCR, the splitting ratio of text lines is adjusted
|
2 months ago |
cneud
|
181c0c584f
|
bbox rotation with opencv
|
2 months ago |
cneud
|
eaff9e3537
|
Merge branch 'main' into remove-imutils-1
|
2 months ago |
vahidrezanezhad
|
7df0427b04
|
In the context of OCR, if Page-XML files already contain text, the new predicted text will replace the existing text.
|
2 months ago |
vahidrezanezhad
|
370d44a66b
|
Slope deskew in the light version is set to zero because when the slope_deskew value exceeds the slope_threshold, the reading order becomes incorrect. This issue needs to be addressed. Additionally, the textlines order within text region in the light version was reversed, and this has been corrected.
|
2 months ago |
vahidrezanezhad
|
d3a4c06e7f
|
This commit enables the export of cropped text line images along with their corresponding texts from a Page-XML file. These exported text line images and texts can be utilized for training a text line-based OCR model.
|
2 months ago |
vahidrezanezhad
|
c8b8529951
|
For the CNN-RNN OCR model, long text lines are split into two segments
|
2 months ago |