Commit Graph

69 Commits (c8039db686da42bb358b24efe603a905fa2ec7b6)

Author SHA1 Message Date
Gerber, Mike c8039db686 🎨 Put validate options into a variable
Gerber, Mike 5ece7f1b0a 🧹 Remove remnants of ocrd-ocropy-segment
Gerber, Mike 135489eaeb 🧹 Remove page_downgrade_to_2018
Gerber, Mike 423d9c2ed6 🚧 do_validate: Skip dimension checking
Gerber, Mike 948e9074df ⬆ Update to ocrd_calamari 0.0.4
Gerber, Mike 1ef850992c 🎨 Use same style of specifying parameters for all processors
Gerber, Mike b468d688f2 🧹 Remove font identification for now
Gerber, Mike 07555e8270 🎨 Use new OCR-D JSON string parameters
Gerber, Mike 9c31d604e9 ⬆ Update ocrd-sbb-textline-detector command
Gerber, Mike fd56731464 🚧 Do not check PAGE coordinates for now
Gerber, Mike 87a2bce93c ⬆ Update calamari-models URL + path
Gerber, Mike d166077a55 Update to sbb_textline_detector with the fixed AlternativeImage support (= merged PAGE results)
Gerber, Mike de47a3e5b1 🔥 Remove now unused page_fix_image_references()
Gerber, Mike 1af18c629e 🧹 Validate imagefilename again
Gerber, Mike de49aa715b ⬆ Update to OCR-D 1.0.0
Gerber, Mike 7025d960b4 Use ocrd_olena for binarization
Gerber, Mike 3687d6d7b4 🧹 Do not remove line confidences anymore
Gerber, Mike 6454d20998 Use sbb_textline_detector to segment lines
Gerber, Mike bdab016e2c Use GT4HistOCR_2000000 model from qurator-data for Tesseract
Gerber, Mike 47dd5d3b62 🎨 Move XML schemata to a better path
Gerber, Mike af2034400a 🎨 Add extra newlines to separate steps
Gerber, Mike 1863439d92 💩 Remove extra Pillow dependency workarounds
Gerber, Mike e5cd5b937e Run pip3 list for easier checking
Gerber, Mike bd24624bd7 ⬆ Do not downgrade to PAGE 2018 anymore
Gerber, Mike 0b2b66a0b4 🔧 Allow setting LOG_LEVEL
Gerber, Mike f19bba45b8 💩 Remove mysterious TEMP directory for now
Gerber, Mike 68902f923d 📜 Downgrading to PAGE 2018 is not the last step anymore
Gerber, Mike 6c0d7e0aee 💩 Do not fix PAGE image references for now
Gerber, Mike 343a3fbf82 🔧 Evaluate both Tesseract and Calamari results
Gerber, Mike 0bc06c2fad Run Calamari OCR
Gerber, Mike daed87566e 🚑 Don't install typegroups classifier for now
Gerber, Mike d8f3438ac5 🚑 Don't check pixel density
Gerber, Mike 85ff80d548 Use dinglehopper's new OCR-D interface
Gerber, Mike d5aa273b44 🚧 Use ocr-eval aka dinglehopper
Gerber, Mike be5750f4e1 As a last step, downgrade to PAGE 2018 to support PAGE Viewer
Gerber, Mike cf2b4de2a0 🧹 Validate again after fixing image references
Gerber, Mike 21e00932be 🐛 Use a valid filegrp USE for fontident
Gerber, Mike ade39a278c 🎨 Align file groups
Gerber, Mike 3fee2d4fe6 📌 Use my ocrd_typegroups_classifier fix for passing down the page id
Gerber, Mike 44772f1923 🚧 Work around problems with ocrd-tesserocr producing TextEquiv/@conf
Gerber, Mike 8b67866aac Validate PAGE XML after OCR
Gerber, Mike 0d7fd21446 Validate workspace after each step
Gerber, Mike de841746e3 Use PAGE 2019
Gerber, Mike ff0570e151 Use frk for now
Gerber, Mike cc81afa1a5 🧹 No need to clean up after tesserocr
Gerber, Mike 89a2893e4e I do not care for the multiple mets:agents elements
Gerber, Mike 0e63fa1756 ⁉ PyTessApi seems to use both engine modes
Gerber, Mike e3a1afbc93 📝 Document the functions
Gerber, Mike f3e37dd16c Do not hardcode path to typegroups model binary
Gerber, Mike 8d66469621 Binarize images before segmenting