Commit Graph

627 Commits (284c2aed7fb8d70f770fad5c8e3b40e52f35e40f)
 

Author SHA1 Message Date
cneud 284c2aed7f unresolved references
cneud 881f6f37c1 rename `ocrd-sbb-binarize` to `ocrd-eynollah-binarize`
cneud 3f1e140da1 pep8: class names should use CapWords
cneud 62e8d78e73 few minor fixes
cneud b4d168cae3 pep8-e265: comment should start with `# `
cneud 87ae6d11a9 pep8: whitespaces around operators
cneud fa7bb63481 pep8-e302: two blank lines between functions
cneud caf0fbe90f pep8-e302: two blank lines between functions
cneud 0e8ea64ba4 don't use equality operators to compare with None
cneud 2722a9a464 simplify chained comparisons
cneud b9030f5203 replace multi-step list initialization with list literal
cneud 5326104d26 remove unnecessary backslash
cneud badf80039f remove redundant parentheses
Clemens Neudecker 005b6988f4
Merge pull request from qurator-spk/machine_based_reading_order_integration
Machine based reading order integration
vahidrezanezhad d3a4c06e7f This commit enables the export of cropped text line images along with their corresponding texts from a Page-XML file. These exported text line images and texts can be utilized for training a text line-based OCR model.
vahidrezanezhad c8b8529951 For the CNN-RNN OCR model, long text lines are split into two segments
vahidrezanezhad aa72ca3006 Resolved an issue in the OCR-D framework where dir_out received a None value
vahidrezanezhad a4f1f35125 Resolving test failure
kba 54040c1db4 Merge remote-tracking branch 'bertsky/machine_based_reading_order_integration_fixes' into machine_based_reading_order_integration
vahidrezanezhad 7110bd971f resolved an error for light version in the case that slope_deskew is smaller than slope_threshold
vahidrezanezhad 25116a2c79 resolved 2 errors
vahidrezanezhad 33fda2f8be changing cnn ocr model name
Robert Sachunsky 335aa273a1 simplify, wrap extremely long lines
Robert Sachunsky cfc65128b1 reduce redundancy/indentation
Robert Sachunsky 01376af905 do_order_of_regions_with_model: simplify
vahidrezanezhad 92bfac4b41 Provide OCR as an option to process a directory of XML files, incorporating layout and text line coordinates.
vahidrezanezhad fbeef79d50 adding scatter_nd inference
Robert Sachunsky 0ae28f7d3e switch from stdlib to loky.ProcessPoolExecutor, ensure shutdown
vahidrezanezhad f93c6c288d function of patch-wise inference with scatter_nd is added
vahidrezanezhad 0e8c561618 debugging issues
Robert Sachunsky e9c0d716f6 CI: install optional dependencies, too
Robert Sachunsky dcaf796283 change polarity of orientation angle (PAGE schema required cw=positive)
Robert Sachunsky b4b0890294 add option to overwrite output xml, but skip by default if file exists
Robert Sachunsky b9ca7a6191 log num_cols-dependent resizing
Robert Sachunsky 9270ea4550 annotate region angles in PAGE
Robert Sachunsky 3b70b11ea6 avoid deskewing patches if binary-empty
Robert Sachunsky 7e9ee90e6e switch from (ad-hoc) mp.Pool to (attribute) concurrent.futures.ProcessPoolExecutor
Robert Sachunsky 68456ea002 do_work_of_slopes_new*, do_back_rotation_and_get_cnt_back, do_work_of_contours_in_image: use mp.Pool, simplify
Robert Sachunsky 25e967397d exit early if no text regions found (to avoid segfault)
Robert Sachunsky 21efea8711 no del on function argument
Robert Sachunsky 5e0c1da711 simplify
Robert Sachunsky 54cb15056b do_image_rotation / return_deskew_slop: avoid code duplication, simplify via mp.Pool
Robert Sachunsky 6fe02df973 do_image_rotation: fix f93fa12 (do return results)
Robert Sachunsky d68017037c do_prediction: trigger GC to avoid CUDA OOM
Robert Sachunsky ad748d0039 do_prediction: avoid code duplication
Robert Sachunsky c3163caefd avoid indentation
Robert Sachunsky 055463d23a avoid indentation
Robert Sachunsky aaea2ef463 simplify
Robert Sachunsky 3d88b207fc run: log instead of print
Robert Sachunsky a520bd1f77 wrap extremely long lines