Commit Graph

48 Commits (cf36092dc9eb16d1c2eb269b56e084c13aa94f69)

Author SHA1 Message Date
Clemens Neudecker 51e241fd84
Merge pull request from cneud/cneud-fix-typos
Fix typos
Clemens Neudecker 12c07f389d
Merge pull request from cneud/cneud-fix-docstring
fix docstring
Clemens Neudecker 29870f26e1
Merge pull request from cneud/cneud-PAGE2019
PAGE2019
Konstantin Baierer b6ca1a7c53 kebab-case snake_case executable, fix
Clemens Neudecker 6c0bfba686
fix typos
Clemens Neudecker c8bc468628
fix docstring
Clemens Neudecker e696a068cb
Fix typos
Clemens Neudecker d90dad48fd
PAGE2019
Rezanezhad, Vahid 19116091f9 Update config_params.json
Gerber, Mike af5cbe9052 🐛 sbb_textline_detector: Fix making the output file id
Rezanezhad, Vahid 2112bb18c6 fixed the bug: local variable 't4' referenced before assignment
Rezanezhad, Vahid a11f6740cb Update main.py - robust deskewing and better page extraction
Rezanezhad, Vahid 0182b7087f remove multiprocessing bug
Gerber, Mike 8fa7179560 🐛 sbb_textline_detector: Disable multiprocessing to fix race condition
Lines were sorted in the wrong regions. Work around this by disabling
multiprocessing until a proper fix is done.
Gerber, Mike 4aed06a325 sbb_textline_detection: Preserve input PAGE info by merging segmentation results
ocrd_sbb_textline_detection used the output XML by main.py as is, and
– by doing this – threw away any input data from the input PAGE,
including the critical pc:AlternativeImage and the less important
pc:MetadataItem.

Fix this by merging the segmentation results into a file created from
the input file.

Also add a pc:MetadataItem processingStep about the segmentation
operation.
Gerber, Mike 4fb3e70ef6 🧹 sbb_textline_detector: Do not create empty/space-only TextEquivs (again)
Gerber, Mike bf41a29e7b 🐛 sbb_textline_detector: Do not hardcode Created/LastChange elements
Gerber, Mike fbd21cdb81 🧹 sbb_textline_detector: Do not create empty/space-only TextEquivs (again)
Rezanezhad, Vahid 2d6dd92b31 Update main.py
Rezanezhad, Vahid 9f97f34255 Update main.py
Rezanezhad, Vahid 8c954a6c7a Update main.py
Rezanezhad, Vahid 6714481556 Update main.py
Rezanezhad, Vahid 719824f19d Update main.py
Gerber, Mike f94511a1d8 Merge branch 'master' of code.dev.sbb.berlin:qurator/mono-repo
Gerber, Mike 4f28cd905a 🧹 sbb_textline_detector: Do not create empty/space-only TextEquivs
ocrd_tesserocr or ocrd_cis complain about already existing text if
empty/space-only TextEquivs elements exist after segmentation. Also, it
does not make sense to create them in a segmentation step.

Fix by removing the code generating the elements.
Rezanezhad, Vahid 00929ab391 Update main.py
Gerber, Mike f0dd955606 Merge branch 'master' of code.dev.sbb.berlin:qurator/mono-repo
Gerber, Mike 2528573b4f sbb_textline_detector: Allow PAGE input in OCR-D interface
Previous OCR-D processors may output PAGE files instead of image files.
Resolve images file from PAGE files if necessary.
Rezanezhad, Vahid d8e04e3de4 memory leakage is removed. New deskewing methid is integrated.
Rezanezhad, Vahid 47d972b459 Update main.py
Gerber, Mike 103cfa0565 Merge branch 'master' of code.dev.sbb.berlin:qurator/mono-repo
Gerber, Mike 7884ab93c6 🧹 sbb_textline_detector: Destroy Keras session at the end of a run() to free up memory
Gerber, Mike 5d440857e7 🧹 sbb_textline_detector: Delete textline session/model after using it
cneud 4201fa7d0f sbb_textline_detector: typo (polugons --> polygons)
Gerber, Mike 9b2c415125 🐛 sbb_textline_detector: Use the correct image filename in the output PAGE
Rezanezhad, Vahid 1702472401 Update main.py
Rezanezhad, Vahid ca9f47eb20 Update main.py
Rezanezhad, Vahid 419beed836 Update main.py
Gerber, Mike 2199bf0d8c 🧹 sbb_textline_detector: Remove extra .xml suffix from METS file id
Gerber, Mike b4bef6460c 🐛 sbb_textline_detector: Use the correct image filename in the output PAGE
Gerber, Mike 1c7d45d3d0 ♻ sbb_textline_detector: Remove redundant and wrongly named parameter dir_of_image
Gerber, Mike d5a020fb3a 🧹 sbb_textline_detector: Remove debug print()s
Gerber, Mike b960d00018 🚧 sbb_textline_detector: XXX image_dir is probably a file, not dir
Gerber, Mike 5fd04677f9 🐛 sbb_textline_detector: Fix filenames of created OCR-D file group
Gerber, Mike 0c915c75de sbb_textline_detector: Add a OCR-D interface
Gerber, Mike 561a6f8a90 ⚙ sbb_textline_detector: Use click instead of argparse
Gerber, Mike 91fb2e01a6 📝 sbb_textline_detector: Fix help for input filename
Gerber, Mike 599bbf1c86 🧹 sbb_textline_detector: Use same structure as the other projects