Commit Graph

65 Commits (e4798c6811c8069f7d3081d7376df846298805a9)

Author SHA1 Message Date
Clemens Neudecker e4798c6811
replace 'PrintSpace' with 'Border'
Clemens Neudecker 36adbe29d8
replace 'PrintSpace' with 'Border'
Konstantin Baierer 05deb03ec8 use make_file_id and assert_file_grp_cardinality
Gerber, Mike 8b01d9e671 🐛 sbb_textline_detection: Set pcGtsId
Newest OCR-D workspace validation requires that the pcGtsId of a
PAGE-XML file matches its METS mets:file/ID. Fix this by setting
it correctly.
Mike Gerber 3593506e72
🔧 ocrd-tool.json: Update description, steps and categories
Fixes .
Lucas Sulzbach ead1eae114 ocrd-tool.json: Make description OCR-D compliant
vahidrezanezhad f94944ee80
change scaling
b-vr103 b9caa8e12c resolve 2020-02-17-bug-sbb_textline_detector
b-vr103 1446d7c662 getting robust and doing sth for verticals
b-vr103 3941f2f17d gettin robust and doing sth for verticals
Gerber, Mike f90b3cfa86 🔊 sbb_textline_detector: In OCR-D interface, warn if overwriting existing segmentation
Gerber, Mike 11c0e9cee5 🐛 sbb_textline_detector: Do not print PAGE output to stdout
ocrd-sbb-textline-detector uses ocrd_page's parse() to parse XML input,
which writes the XML to stdout by default.

Fix this by silencing it using parse()'s silence=True.
wrznr 4fc57d7756 Assign page id
wrznr 9e9163e852 Simplify the iteration over files in the input file group
Mike Gerber 6e0decb5ec
Merge pull request from kba/rename-tool
Rename ocrd_sbb.. to ocrd-sbb... in ocrd_cli.py, ht @bertsky
Gerber, Mike 5fb30a7a1f Revert "Merge branch 'master' of https://github.com/qurator-spk/sbb_textline_detector"
This reverts commit 417b9235d5, reversing
changes made to a74974b7b6.
Konstantin Baierer cf6381c148 Rename ocrd_sbb.. to ocrd-sbb... in ocrd_cli.py, ht @bertsky
Clemens Neudecker 51e241fd84
Merge pull request from cneud/cneud-fix-typos
Fix typos
Clemens Neudecker 12c07f389d
Merge pull request from cneud/cneud-fix-docstring
fix docstring
Clemens Neudecker 29870f26e1
Merge pull request from cneud/cneud-PAGE2019
PAGE2019
Konstantin Baierer b6ca1a7c53 kebab-case snake_case executable, fix
Clemens Neudecker 6c0bfba686
fix typos
Clemens Neudecker c8bc468628
fix docstring
Clemens Neudecker e696a068cb
Fix typos
Clemens Neudecker d90dad48fd
PAGE2019
Rezanezhad, Vahid 19116091f9 Update config_params.json
Gerber, Mike af5cbe9052 🐛 sbb_textline_detector: Fix making the output file id
Rezanezhad, Vahid 2112bb18c6 fixed the bug: local variable 't4' referenced before assignment
Rezanezhad, Vahid a11f6740cb Update main.py - robust deskewing and better page extraction
Rezanezhad, Vahid 0182b7087f remove multiprocessing bug
Gerber, Mike 8fa7179560 🐛 sbb_textline_detector: Disable multiprocessing to fix race condition
Lines were sorted in the wrong regions. Work around this by disabling
multiprocessing until a proper fix is done.
Gerber, Mike 4aed06a325 sbb_textline_detection: Preserve input PAGE info by merging segmentation results
ocrd_sbb_textline_detection used the output XML by main.py as is, and
– by doing this – threw away any input data from the input PAGE,
including the critical pc:AlternativeImage and the less important
pc:MetadataItem.

Fix this by merging the segmentation results into a file created from
the input file.

Also add a pc:MetadataItem processingStep about the segmentation
operation.
Gerber, Mike 4fb3e70ef6 🧹 sbb_textline_detector: Do not create empty/space-only TextEquivs (again)
Gerber, Mike bf41a29e7b 🐛 sbb_textline_detector: Do not hardcode Created/LastChange elements
Gerber, Mike fbd21cdb81 🧹 sbb_textline_detector: Do not create empty/space-only TextEquivs (again)
Rezanezhad, Vahid 2d6dd92b31 Update main.py
Rezanezhad, Vahid 9f97f34255 Update main.py
Rezanezhad, Vahid 8c954a6c7a Update main.py
Rezanezhad, Vahid 6714481556 Update main.py
Rezanezhad, Vahid 719824f19d Update main.py
Gerber, Mike f94511a1d8 Merge branch 'master' of code.dev.sbb.berlin:qurator/mono-repo
Gerber, Mike 4f28cd905a 🧹 sbb_textline_detector: Do not create empty/space-only TextEquivs
ocrd_tesserocr or ocrd_cis complain about already existing text if
empty/space-only TextEquivs elements exist after segmentation. Also, it
does not make sense to create them in a segmentation step.

Fix by removing the code generating the elements.
Rezanezhad, Vahid 00929ab391 Update main.py
Gerber, Mike f0dd955606 Merge branch 'master' of code.dev.sbb.berlin:qurator/mono-repo
Gerber, Mike 2528573b4f sbb_textline_detector: Allow PAGE input in OCR-D interface
Previous OCR-D processors may output PAGE files instead of image files.
Resolve images file from PAGE files if necessary.
Rezanezhad, Vahid d8e04e3de4 memory leakage is removed. New deskewing methid is integrated.
Rezanezhad, Vahid 47d972b459 Update main.py
Gerber, Mike 103cfa0565 Merge branch 'master' of code.dev.sbb.berlin:qurator/mono-repo
Gerber, Mike 7884ab93c6 🧹 sbb_textline_detector: Destroy Keras session at the end of a run() to free up memory
Gerber, Mike 5d440857e7 🧹 sbb_textline_detector: Delete textline session/model after using it