92 Commits (4fc57d7756f809e52e189b51d6b585b235d0ce7f)
 

Author SHA1 Message Date
Rezanezhad, Vahid 2112bb18c6 fixed the bug: local variable 't4' referenced before assignment 5 years ago
Rezanezhad, Vahid a11f6740cb Update main.py - robust deskewing and better page extraction 5 years ago
Rezanezhad, Vahid 0182b7087f remove multiprocessing bug 5 years ago
Gerber, Mike 8fa7179560 🐛 sbb_textline_detector: Disable multiprocessing to fix race condition
Lines were sorted in the wrong regions. Work around this by disabling
multiprocessing until a proper fix is done.
5 years ago
Gerber, Mike 4aed06a325 sbb_textline_detection: Preserve input PAGE info by merging segmentation results
ocrd_sbb_textline_detection used the output XML by main.py as is, and
– by doing this – threw away any input data from the input PAGE,
including the critical pc:AlternativeImage and the less important
pc:MetadataItem.

Fix this by merging the segmentation results into a file created from
the input file.

Also add a pc:MetadataItem processingStep about the segmentation
operation.
5 years ago
Gerber, Mike 4fb3e70ef6 🧹 sbb_textline_detector: Do not create empty/space-only TextEquivs (again) 5 years ago
Gerber, Mike bf41a29e7b 🐛 sbb_textline_detector: Do not hardcode Created/LastChange elements 5 years ago
Gerber, Mike fbd21cdb81 🧹 sbb_textline_detector: Do not create empty/space-only TextEquivs (again) 5 years ago
Rezanezhad, Vahid 2d6dd92b31 Update main.py 5 years ago
Rezanezhad, Vahid 9f97f34255 Update main.py 5 years ago
Rezanezhad, Vahid 8c954a6c7a Update main.py 5 years ago
Rezanezhad, Vahid 6714481556 Update main.py 5 years ago
Rezanezhad, Vahid 719824f19d Update main.py 5 years ago
Gerber, Mike f94511a1d8 Merge branch 'master' of code.dev.sbb.berlin:qurator/mono-repo 5 years ago
Gerber, Mike 4f28cd905a 🧹 sbb_textline_detector: Do not create empty/space-only TextEquivs
ocrd_tesserocr or ocrd_cis complain about already existing text if
empty/space-only TextEquivs elements exist after segmentation. Also, it
does not make sense to create them in a segmentation step.

Fix by removing the code generating the elements.
5 years ago
Rezanezhad, Vahid 00929ab391 Update main.py 5 years ago
Gerber, Mike f0dd955606 Merge branch 'master' of code.dev.sbb.berlin:qurator/mono-repo 5 years ago
Gerber, Mike 2528573b4f sbb_textline_detector: Allow PAGE input in OCR-D interface
Previous OCR-D processors may output PAGE files instead of image files.
Resolve images file from PAGE files if necessary.
5 years ago
Rezanezhad, Vahid d8e04e3de4 memory leakage is removed. New deskewing methid is integrated. 5 years ago
Rezanezhad, Vahid 47d972b459 Update main.py 5 years ago
Kai Labusch 0b7bc8d93e add missing requirement 5 years ago
Kai Labusch 9eda874985 add missing requirement 5 years ago
Gerber, Mike 103cfa0565 Merge branch 'master' of code.dev.sbb.berlin:qurator/mono-repo 5 years ago
Gerber, Mike 7884ab93c6 🧹 sbb_textline_detector: Destroy Keras session at the end of a run() to free up memory 5 years ago
Gerber, Mike 5d440857e7 🧹 sbb_textline_detector: Delete textline session/model after using it 5 years ago
cneud 4201fa7d0f sbb_textline_detector: typo (polugons --> polygons) 5 years ago
Gerber, Mike 9b2c415125 🐛 sbb_textline_detector: Use the correct image filename in the output PAGE 5 years ago
Rezanezhad, Vahid 1702472401 Update main.py 5 years ago
Rezanezhad, Vahid ca9f47eb20 Update main.py 5 years ago
Rezanezhad, Vahid 419beed836 Update main.py 5 years ago
Gerber, Mike c4d0d98ebf 🐛 sbb_textline_detector: Install *.json 5 years ago
Gerber, Mike 2199bf0d8c 🧹 sbb_textline_detector: Remove extra .xml suffix from METS file id 5 years ago
Gerber, Mike b4bef6460c 🐛 sbb_textline_detector: Use the correct image filename in the output PAGE 5 years ago
Gerber, Mike 1c7d45d3d0 ♻ sbb_textline_detector: Remove redundant and wrongly named parameter dir_of_image 5 years ago
Gerber, Mike d5a020fb3a 🧹 sbb_textline_detector: Remove debug print()s 5 years ago
Gerber, Mike b960d00018 🚧 sbb_textline_detector: XXX image_dir is probably a file, not dir 5 years ago
Gerber, Mike 5fd04677f9 🐛 sbb_textline_detector: Fix filenames of created OCR-D file group 5 years ago
Gerber, Mike 0c915c75de sbb_textline_detector: Add a OCR-D interface 5 years ago
Gerber, Mike 561a6f8a90 ⚙ sbb_textline_detector: Use click instead of argparse 5 years ago
Gerber, Mike 91fb2e01a6 📝 sbb_textline_detector: Fix help for input filename 5 years ago
Gerber, Mike 599bbf1c86 🧹 sbb_textline_detector: Use same structure as the other projects 5 years ago
Gerber, Mike b85a9dc256 🧹 sbb_textline_docker: Rename to sbb_textline_detector 5 years ago