Commit graph

58 commits

Author SHA1 Message Date
d166077a55 Update to sbb_textline_detector with the fixed AlternativeImage support (= merged PAGE results) 2019-11-20 12:40:05 +01:00
de47a3e5b1 🔥 Remove now unused page_fix_image_references() 2019-11-20 12:39:02 +01:00
1af18c629e 🧹 Validate imagefilename again 2019-10-30 11:25:34 +01:00
de49aa715b ⬆ Update to OCR-D 1.0.0 2019-10-21 17:04:49 +02:00
7025d960b4 Use ocrd_olena for binarization 2019-10-21 17:04:06 +02:00
3687d6d7b4 🧹 Do not remove line confidences anymore 2019-10-11 19:17:30 +02:00
6454d20998 Use sbb_textline_detector to segment lines 2019-10-11 19:16:43 +02:00
bdab016e2c Use GT4HistOCR_2000000 model from qurator-data for Tesseract 2019-10-02 16:48:28 +02:00
47dd5d3b62 🎨 Move XML schemata to a better path 2019-09-30 18:25:54 +02:00
af2034400a 🎨 Add extra newlines to separate steps 2019-09-30 12:26:14 +02:00
1863439d92 💩 Remove extra Pillow dependency workarounds 2019-09-30 12:25:31 +02:00
e5cd5b937e Run pip3 list for easier checking 2019-09-27 13:16:14 +02:00
bd24624bd7 ⬆ Do not downgrade to PAGE 2018 anymore 2019-09-27 13:02:46 +02:00
0b2b66a0b4 🔧 Allow setting LOG_LEVEL 2019-09-27 12:09:37 +02:00
f19bba45b8 💩 Remove mysterious TEMP directory for now 2019-09-26 16:55:54 +02:00
68902f923d 📜 Downgrading to PAGE 2018 is not the last step anymore 2019-09-26 16:55:02 +02:00
6c0d7e0aee 💩 Do not fix PAGE image references for now 2019-09-26 16:46:12 +02:00
343a3fbf82 🔧 Evaluate both Tesseract and Calamari results 2019-08-21 13:07:27 +02:00
0bc06c2fad Run Calamari OCR 2019-08-21 11:54:01 +02:00
daed87566e 🚑 Don't install typegroups classifier for now 2019-08-16 18:23:15 +02:00
d8f3438ac5 🚑 Don't check pixel density 2019-08-16 18:21:59 +02:00
85ff80d548 Use dinglehopper's new OCR-D interface 2019-08-16 14:04:41 +02:00
d5aa273b44 🚧 Use ocr-eval aka dinglehopper 2019-08-13 18:13:49 +02:00
be5750f4e1 As a last step, downgrade to PAGE 2018 to support PAGE Viewer 2019-08-05 18:46:36 +02:00
cf2b4de2a0 🧹 Validate again after fixing image references 2019-08-05 17:46:20 +02:00
21e00932be 🐛 Use a valid filegrp USE for fontident 2019-08-05 17:38:24 +02:00
ade39a278c 🎨 Align file groups 2019-08-05 17:08:58 +02:00
3fee2d4fe6 📌 Use my ocrd_typegroups_classifier fix for passing down the page id 2019-08-05 17:00:54 +02:00
44772f1923 🚧 Work around problems with ocrd-tesserocr producing TextEquiv/@conf 2019-08-05 15:40:39 +02:00
8b67866aac Validate PAGE XML after OCR 2019-08-05 15:31:24 +02:00
0d7fd21446 Validate workspace after each step 2019-08-05 15:27:38 +02:00
de841746e3 Use PAGE 2019 2019-08-02 11:58:56 +02:00
ff0570e151 Use frk for now 2019-08-02 11:58:46 +02:00
cc81afa1a5 🧹 No need to clean up after tesserocr 2019-07-03 13:46:49 +02:00
89a2893e4e I do not care for the multiple mets:agents elements 2019-07-03 12:35:15 +02:00
0e63fa1756 ⁉ PyTessApi seems to use both engine modes 2019-07-03 12:30:52 +02:00
e3a1afbc93 📝 Document the functions 2019-07-03 12:22:55 +02:00
f3e37dd16c Do not hardcode path to typegroups model binary 2019-06-24 17:31:25 +02:00
8d66469621 Binarize images before segmenting 2019-06-24 12:34:08 +02:00
5e1ece4877 Use ocrd-tesserocr-segment-* 2019-06-24 12:13:49 +02:00
e30f03699c TODO Binarization 2019-06-24 12:12:12 +02:00
0d5b5b1b17 XXX does ocrd_tesserocr use the LSTM engine? 2019-06-24 12:09:35 +02:00
16f2f16dbe XXX <error>INCONSISTENCY in TextRegion ID 'dummy' 2019-06-21 12:13:20 +02:00
89abc507e0 XXX ocrd-ocropy-segment throws an exception for buerger_gedichte_1778.ocrd 2019-06-21 12:10:55 +02:00
ad3a7c2b95 XXX remove_filegrp link to OCR-D issue 2019-06-21 12:10:19 +02:00
f94230c587 Set log level to DEBUG again 2019-06-21 12:09:44 +02:00
2b2c39d6d4 Add a global LOG_LEVEL option 2019-06-19 17:48:38 +02:00
fbc3b8ca4f Fix image references 2019-06-19 17:20:05 +02:00
b6c490e18b Add a PAGE fix XML step 2019-06-19 15:03:16 +02:00
d98ce2d2d4 Add a PAGE validation step 2019-06-19 14:56:00 +02:00