Commit Graph

106 Commits (252a125ed26fd2af50429c7f18ccbadcf45d67c1)

Author SHA1 Message Date
Gerber, Mike 1af18c629e 🧹 Validate imagefilename again
Gerber, Mike de49aa715b ⬆ Update to OCR-D 1.0.0
Gerber, Mike 7025d960b4 Use ocrd_olena for binarization
Gerber, Mike 3687d6d7b4 🧹 Do not remove line confidences anymore
Gerber, Mike 6454d20998 Use sbb_textline_detector to segment lines
Gerber, Mike bdab016e2c Use GT4HistOCR_2000000 model from qurator-data for Tesseract
Gerber, Mike 47dd5d3b62 🎨 Move XML schemata to a better path
Gerber, Mike af2034400a 🎨 Add extra newlines to separate steps
Gerber, Mike 1863439d92 💩 Remove extra Pillow dependency workarounds
Gerber, Mike e5cd5b937e Run pip3 list for easier checking
Gerber, Mike bd24624bd7 ⬆ Do not downgrade to PAGE 2018 anymore
Gerber, Mike 0b2b66a0b4 🔧 Allow setting LOG_LEVEL
Gerber, Mike f19bba45b8 💩 Remove mysterious TEMP directory for now
Gerber, Mike 68902f923d 📜 Downgrading to PAGE 2018 is not the last step anymore
Gerber, Mike 6c0d7e0aee 💩 Do not fix PAGE image references for now
Gerber, Mike 343a3fbf82 🔧 Evaluate both Tesseract and Calamari results
Gerber, Mike 0bc06c2fad Run Calamari OCR
Gerber, Mike daed87566e 🚑 Don't install typegroups classifier for now
Gerber, Mike d8f3438ac5 🚑 Don't check pixel density
Gerber, Mike 85ff80d548 Use dinglehopper's new OCR-D interface
Gerber, Mike d5aa273b44 🚧 Use ocr-eval aka dinglehopper
Gerber, Mike be5750f4e1 As a last step, downgrade to PAGE 2018 to support PAGE Viewer
Gerber, Mike cf2b4de2a0 🧹 Validate again after fixing image references
Gerber, Mike 21e00932be 🐛 Use a valid filegrp USE for fontident
Gerber, Mike ade39a278c 🎨 Align file groups
Gerber, Mike 3fee2d4fe6 📌 Use my ocrd_typegroups_classifier fix for passing down the page id
Gerber, Mike 44772f1923 🚧 Work around problems with ocrd-tesserocr producing TextEquiv/@conf
Gerber, Mike 8b67866aac Validate PAGE XML after OCR
Gerber, Mike 0d7fd21446 Validate workspace after each step
Gerber, Mike de841746e3 Use PAGE 2019
Gerber, Mike ff0570e151 Use frk for now
Gerber, Mike cc81afa1a5 🧹 No need to clean up after tesserocr
Gerber, Mike 89a2893e4e I do not care for the multiple mets:agents elements
Gerber, Mike 0e63fa1756 ⁉ PyTessApi seems to use both engine modes
Gerber, Mike e3a1afbc93 📝 Document the functions
Gerber, Mike f3e37dd16c Do not hardcode path to typegroups model binary
Gerber, Mike 8d66469621 Binarize images before segmenting
Gerber, Mike 5e1ece4877 Use ocrd-tesserocr-segment-*
Gerber, Mike e30f03699c TODO Binarization
Gerber, Mike 0d5b5b1b17 XXX does ocrd_tesserocr use the LSTM engine?
Gerber, Mike 16f2f16dbe XXX <error>INCONSISTENCY in TextRegion ID 'dummy'
Gerber, Mike 89abc507e0 XXX ocrd-ocropy-segment throws an exception for buerger_gedichte_1778.ocrd
Gerber, Mike ad3a7c2b95 XXX remove_filegrp link to OCR-D issue
Gerber, Mike f94230c587 Set log level to DEBUG again
Gerber, Mike 2b2c39d6d4 Add a global LOG_LEVEL option
Gerber, Mike fbc3b8ca4f Fix image references
Gerber, Mike b6c490e18b Add a PAGE fix XML step
Gerber, Mike d98ce2d2d4 Add a PAGE validation step
Gerber, Mike 10c4068a99 XXX Global -l DEBUG
Gerber, Mike f8f44e990d Clean up after ocrd-ocropy-segment's mess
Gerber, Mike 243ddea674 Use ocrd-ocropy-segment instead of non-functional ocrd-tesserocr-segment-line
Gerber, Mike 9bd3853c78 Add OCR step
Gerber, Mike a64b9cf5c8 XXX Multiple calls create multiple identical mets:agent elements
Gerber, Mike c207859bcd Refactor: Extract functions for the steps
Gerber, Mike a2d547b857 Reformat to use shorter lines
Gerber, Mike b5f9dcb7f3 Initial commit