Commit graph

561 commits

Author SHA1 Message Date
b0157ff1a2 🚧 Travis: Fix build stage 2020-08-14 18:02:42 +02:00
21c1f310b1 🚧 Travis: Fix build stage 2020-08-14 17:54:13 +02:00
73ffa01d12 🎨 Rename boxed-* to my_ocrd_workflow-* 2020-08-14 17:52:57 +02:00
a820d72526 🧹 s/base/core 2020-08-14 17:39:31 +02:00
9680dd8299 ⚙️ Install pip via get-pip.py 2020-08-14 17:17:25 +02:00
92391747a7 🧹 Remove obsolete xsd/ directory 2020-08-14 14:38:02 +02:00
02eae7b6fa Move processors into their own Docker container 2020-08-14 14:37:20 +02:00
894cbeee32 Merge branch 'test/pip-2020-resolver' 2020-08-07 19:27:21 +02:00
5806674fa0 🎉 Checkout pip 20.2's new 2020-resolver feature 2020-08-07 18:50:46 +02:00
d3b6974316 💩 I hate pip to much 2020-08-07 18:43:22 +02:00
037f64518f ⬆️ Update dinglehopper 2020-08-07 18:29:10 +02:00
c2946395e3 🐛 Fix downloading repacked tessdata_best 2020-08-06 13:08:46 +02:00
daa8095a25 ⬆️ Update ocrd_calamari to fix pcGtsId (also ocrd for good measure) 2020-08-06 12:57:16 +02:00
bf7c6abcbf ⬆️ Update sbb_textline_detector to fix pcGtsId issue 2020-08-05 20:14:10 +02:00
73a125d893 ⬆️ Update ocrd_tesserocr to fix pcGtsId issue 2020-08-05 20:13:56 +02:00
0b1da9a5db 🧹 Update Calamari model path 2020-08-05 20:13:14 +02:00
c5536d3722 💩 Increase pip default timeout 2020-08-05 20:12:49 +02:00
b75d17e42c Merge branch 'master' of github.com:mikegerber/my_ocrd_workflow 2020-08-05 16:05:08 +02:00
72693071e0 ⚙️ Get tessdata_best from git-annex/our mirror to make downloads more robust 2020-08-05 16:03:17 +02:00
6ae31395a3 ⚙️ Configure retries for apt-get/curl to make downloads more robust 2020-08-05 16:02:06 +02:00
b0b1c021a8 🧹 Update Calamari model path 2020-08-05 12:27:05 +02:00
31c36eb534 ⬆️ Update qurator_data_lib.sh to allow not unpacking a downloaded file 2020-08-05 12:01:41 +02:00
3f511bc8e3 🗒️ README: Break jpageviewer line 2020-08-05 11:16:36 +02:00
d1a2bfe669 🐛 Deal with ocrd_olena >= 1.2.0 using one output file group only 2020-07-31 14:25:35 +02:00
7111d28f9b ⬆️ ocrd_olena → 1.2.0 2020-07-31 13:52:21 +02:00
1a308a5522 🧹 Use OCR-D's -P, remove now redundant validation and remove now unnecessary functions 2020-07-30 20:55:11 +02:00
efd955c04f 🧹 Modernize my_ocrd_workflow and use OCR-D's new --overwrite 2020-07-30 20:20:52 +02:00
3af3c6dd00 ⬆️ Update qurator_data_lib.sh 2020-07-30 19:23:45 +02:00
0f8b2d82d5 🧹 Travis: Comment out transfer.sh output 2020-07-30 19:12:08 +02:00
032f58e4b8 Merge branch 'master' of https://github.com/mikegerber/my_ocrd_workflow 2020-07-30 18:04:12 +02:00
131e862762 💩 Travis: Skip validation until https://github.com/OCR-D/ocrd_olena/issues/60 is fixed 2020-07-30 18:03:24 +02:00
7fe2ce84b5 🐛 Update sbb_textline_detector to fix Keras/TF issue 2020-07-29 16:46:00 +02:00
ef3a8a69e0 ⬆️ Update ocrd_olena 2020-07-29 16:45:21 +02:00
f7b43bbefa ppn2ocr: Support TIFF in the BEST group 2020-06-23 19:03:58 +02:00
4e37a52899 Merge branch 'master' of github.com:mikegerber/my_ocrd_workflow 2020-06-23 15:17:03 +02:00
bb703152db 🐛 ppn2ocr: Verify oai.sbb.berlin's certificate again
Now that oai.sbb.berlin's certificate chain is fixed, remove the
workaround again.

Fixes GH#15.
2020-06-23 15:15:21 +02:00
c5ae23d2ef Validate before even starting, to find data problems 2020-06-19 19:27:32 +02:00
f7b0b4121d ⁉️ Check dependencies using pipdeptree to triage Travis build fail 2020-06-19 17:47:35 +02:00
c334b1e7ac 🧹 Move check-FULLTEXT-Page-dimensions-vs-BEST-dimensions.py code to mono-repo/experiments 2020-06-19 16:01:07 +02:00
af4557fb33 Merge branch 'master' of https://github.com/mikegerber/my_ocrd_workflow 2020-06-18 15:46:46 +02:00
0aa541fa18 📓 README: Reference howto/*proxy*.md instead of duplicating the proxy settings 2020-06-18 14:46:01 +02:00
3f4ec30349 🧹 .gitignore __pychache__/*.pyc 2020-06-18 10:51:43 +02:00
f98a1ec2c8 🐛 run: XXX Work around podman vs docker uid behaviour 2020-06-18 10:50:24 +02:00
746fb768da 🚧 Add a script that checks FULLTEXT dimensions against BEST dimensions 2020-06-18 10:49:31 +02:00
d2c316285c 🧹 ppn2ocr: Remove obsolete show_help() 2020-06-17 16:44:17 +02:00
f5b2eed8a6 🐛 ppn2ocr: Work around oai.sbb.berlin certificate problem
oai.sbb.berlin does not have a valid certificate:

% curl https://oai.sbb.berlin
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

Work around this by setting verify=False.
2020-06-09 11:19:25 +02:00
448bf9e256 🐛 ppn2ocr: Remove LOCAL file group too 2020-06-04 19:55:00 +02:00
4e19e2a655 💄 ppn2ocr: Add a proper CLI interface 2020-06-03 15:53:45 +02:00
70eb73e4c7 🧹 ppn2ocr: (Re)Move TODOs 2020-06-03 15:34:00 +02:00
6c74672916 🚧 ppn2ocr: Update README to use the correct path to requirements-ppn2ocr.txt 2020-06-03 11:18:36 +02:00