Commit Graph

105 Commits (a470b0f24e8bf63b66eae9b669a59e409110ed8a)

Author SHA1 Message Date
Gerber, Mike 42d3e8c9e7 🐛 ocrd-galley: Fix ocrd-eynollah-segment example
Gerber, Mike b0ae5b9c6a ocrd-galley: Add support for ocrd-eynollah-segment
Gerber, Mike baddafa1ea ⬆️ Update ocrd_fileformat + use in default workflow to convert to ALTO
Gerber, Mike 53d752f58d 🐛 Fix model path for ocrd_calamari 1.0
Gerber, Mike c1163405c4 🎨 s/pip3/pip/g
Gerber, Mike 9e90ef08cd 📝 Add an example ALTO transformation to default workflow (Closes GH-34)
Gerber, Mike c23930c8df Add an example for ocrd-cis-ocropy-segment
Gerber, Mike 0841af5491 🚧 Prepare supporting ocrd-sbb-binarize
ocrd-sbb-binarize seems to work but its input does not work with
ocrd-sbb-textline-detector:

https://github.com/qurator-spk/sbb_binarization/issues/8
https://github.com/qurator-spk/sbb_textline_detection/issues/47
Gerber, Mike 568cf60b4c ⚙️ Consistently set LOG_LEVEL to INFO by default
Gerber, Mike 17c6b15a1b 🐛 (Better) Handle missing pip3 in the main script
Gerber, Mike fc853d4d13 🐛 Handle missing pip3 in the main script
Gerber, Mike 0b1da9a5db 🧹 Update Calamari model path
Gerber, Mike d1a2bfe669 🐛 Deal with ocrd_olena >= 1.2.0 using one output file group only
Gerber, Mike 1a308a5522 🧹 Use OCR-D's -P, remove now redundant validation and remove now unnecessary functions
Gerber, Mike efd955c04f 🧹 Modernize my_ocrd_workflow and use OCR-D's new --overwrite
Gerber, Mike c5ae23d2ef Validate before even starting, to find data problems
Gerber, Mike a5b4e06a09 Allow skipping validation
Gerber, Mike 78f632a523 Support --input-file-grp/-I command line parameter
Gerber, Mike 58282c9e95 Include glyph output
Gerber, Mike 11a30892c5 🔍 Only do pip3 list when LOG_LEVEL >= DEBUG
Gerber, Mike 9f111ca362 🧹 Do not validate OCR results twice
Gerber, Mike 8ca25f3c56 🎨 Expose OCR textequiv_level as a environment variable
Gerber, Mike 979c7044a8 Make OCR-D-IMG-BIN output group explicit
Gerber, Mike 28bb482ceb Produce word results
Gerber, Mike 6ae85063c5 📝 Document do_validate() options better
Gerber, Mike 2cf68f149d ♻ Extract a main() function for the main stuff
Gerber, Mike be0a0c353a 📝 Document the two remaining un-documented functions
Gerber, Mike 848dd143fd 🎨 Use long command lines again
Gerber, Mike 6b83d5ae1e 🧹 Update/move some XXXs/TODOs
Gerber, Mike 5a55598d0c 🧹 Remove image reference fixing remnants - jpageviewer now has --resolve-dir
Gerber, Mike 44979e7fa2 🧹 do_linesegmentation_sbb: It's now clear that sbb segmentation works with RGB images
Gerber, Mike 460b6c34d1 ✏ Fix typo in $ocrd_olena_binarize_parameters
Gerber, Mike 71d54c6978 🔧 Set up logging level using /etc/ocrd_logging.py instead of "-l"
Gerber, Mike 1a538dce1a 🧹 Remove superfluous mets.xml options
Gerber, Mike c192bfdbfe 🧹 Remove workaround for TEMP/ directory bug
Gerber, Mike d7a2aac44b ♻ Remove file groups using "ocrd workspace remove-group"
Gerber, Mike c8039db686 🎨 Put validate options into a variable
Gerber, Mike 5ece7f1b0a 🧹 Remove remnants of ocrd-ocropy-segment
Gerber, Mike 135489eaeb 🧹 Remove page_downgrade_to_2018
Gerber, Mike 423d9c2ed6 🚧 do_validate: Skip dimension checking
Gerber, Mike 948e9074df ⬆ Update to ocrd_calamari 0.0.4
Gerber, Mike 1ef850992c 🎨 Use same style of specifying parameters for all processors
Gerber, Mike b468d688f2 🧹 Remove font identification for now
Gerber, Mike 07555e8270 🎨 Use new OCR-D JSON string parameters
Gerber, Mike 9c31d604e9 ⬆ Update ocrd-sbb-textline-detector command
Gerber, Mike fd56731464 🚧 Do not check PAGE coordinates for now
Gerber, Mike 87a2bce93c ⬆ Update calamari-models URL + path
Gerber, Mike d166077a55 Update to sbb_textline_detector with the fixed AlternativeImage support (= merged PAGE results)
Gerber, Mike de47a3e5b1 🔥 Remove now unused page_fix_image_references()
Gerber, Mike 1af18c629e 🧹 Validate imagefilename again