b75d17e42c
Merge branch 'master' of github.com:mikegerber/my_ocrd_workflow
2020-08-05 16:05:08 +02:00
72693071e0
⚙️ Get tessdata_best from git-annex/our mirror to make downloads more robust
2020-08-05 16:03:17 +02:00
6ae31395a3
⚙️ Configure retries for apt-get/curl to make downloads more robust
2020-08-05 16:02:06 +02:00
b0b1c021a8
🧹 Update Calamari model path
2020-08-05 12:27:05 +02:00
31c36eb534
⬆️ Update qurator_data_lib.sh to allow not unpacking a downloaded file
2020-08-05 12:01:41 +02:00
3f511bc8e3
🗒️ README: Break jpageviewer line
2020-08-05 11:16:36 +02:00
d1a2bfe669
🐛 Deal with ocrd_olena >= 1.2.0 using one output file group only
2020-07-31 14:25:35 +02:00
7111d28f9b
⬆️ ocrd_olena → 1.2.0
2020-07-31 13:52:21 +02:00
1a308a5522
🧹 Use OCR-D's -P, remove now redundant validation and remove now unnecessary functions
2020-07-30 20:55:11 +02:00
efd955c04f
🧹 Modernize my_ocrd_workflow and use OCR-D's new --overwrite
2020-07-30 20:20:52 +02:00
3af3c6dd00
⬆️ Update qurator_data_lib.sh
2020-07-30 19:23:45 +02:00
0f8b2d82d5
🧹 Travis: Comment out transfer.sh output
2020-07-30 19:12:08 +02:00
032f58e4b8
Merge branch 'master' of https://github.com/mikegerber/my_ocrd_workflow
2020-07-30 18:04:12 +02:00
131e862762
💩 Travis: Skip validation until https://github.com/OCR-D/ocrd_olena/issues/60 is fixed
2020-07-30 18:03:24 +02:00
7fe2ce84b5
🐛 Update sbb_textline_detector to fix Keras/TF issue
2020-07-29 16:46:00 +02:00
ef3a8a69e0
⬆️ Update ocrd_olena
2020-07-29 16:45:21 +02:00
f7b43bbefa
✨ ppn2ocr: Support TIFF in the BEST group
2020-06-23 19:03:58 +02:00
4e37a52899
Merge branch 'master' of github.com:mikegerber/my_ocrd_workflow
2020-06-23 15:17:03 +02:00
bb703152db
🐛 ppn2ocr: Verify oai.sbb.berlin's certificate again
...
Now that oai.sbb.berlin's certificate chain is fixed, remove the
workaround again.
Fixes GH#15.
2020-06-23 15:15:21 +02:00
c5ae23d2ef
✨ Validate before even starting, to find data problems
2020-06-19 19:27:32 +02:00
f7b0b4121d
⁉️ Check dependencies using pipdeptree to triage Travis build fail
2020-06-19 17:47:35 +02:00
c334b1e7ac
🧹 Move check-FULLTEXT-Page-dimensions-vs-BEST-dimensions.py code to mono-repo/experiments
2020-06-19 16:01:07 +02:00
af4557fb33
Merge branch 'master' of https://github.com/mikegerber/my_ocrd_workflow
2020-06-18 15:46:46 +02:00
0aa541fa18
📓 README: Reference howto/*proxy*.md instead of duplicating the proxy settings
2020-06-18 14:46:01 +02:00
3f4ec30349
🧹 .gitignore __pychache__/*.pyc
2020-06-18 10:51:43 +02:00
f98a1ec2c8
🐛 run: XXX Work around podman vs docker uid behaviour
2020-06-18 10:50:24 +02:00
746fb768da
🚧 Add a script that checks FULLTEXT dimensions against BEST dimensions
2020-06-18 10:49:31 +02:00
d2c316285c
🧹 ppn2ocr: Remove obsolete show_help()
2020-06-17 16:44:17 +02:00
f5b2eed8a6
🐛 ppn2ocr: Work around oai.sbb.berlin certificate problem
...
oai.sbb.berlin does not have a valid certificate:
% curl https://oai.sbb.berlin
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
Work around this by setting verify=False.
2020-06-09 11:19:25 +02:00
448bf9e256
🐛 ppn2ocr: Remove LOCAL file group too
2020-06-04 19:55:00 +02:00
4e19e2a655
💄 ppn2ocr: Add a proper CLI interface
2020-06-03 15:53:45 +02:00
70eb73e4c7
🧹 ppn2ocr: (Re)Move TODOs
2020-06-03 15:34:00 +02:00
6c74672916
🚧 ppn2ocr: Update README to use the correct path to requirements-ppn2ocr.txt
2020-06-03 11:18:36 +02:00
9b7437601f
🚧 ppn2ocr: Update README
2020-06-03 11:17:18 +02:00
3e0b7436a5
🚧 ppn2ocr: Add requirements-ppn2ocr.txt
2020-06-03 11:14:44 +02:00
05dbffeb7a
🚧 ppn2ocr: Do not call workflow for now
2020-06-03 10:12:36 +02:00
10f5198fa6
🚧 ppn2ocr: s/contain/encapsulate
2020-06-03 10:11:23 +02:00
f893b339c5
🚧 ppn2ocr: Properly remove the PRESENTATION file group
2020-06-03 10:10:54 +02:00
014e70fe35
🚧 ppn2ocr: Actually run the workflow
2020-06-02 19:25:31 +02:00
74cb361723
🚧 ppn2ocr: Extract a function to contain the IIIF hack
2020-06-02 19:18:06 +02:00
c7c8934e89
🚧 ppn2ocr: Convert to Python + fumble in IIIF URLs
2020-06-02 19:06:31 +02:00
7c5cbc7244
📝 ppn2ocr: Add to README, including proxy configuration
2020-05-22 17:23:49 +02:00
1585247482
✨ ppn2ocr: Make PPN a command line parameter
2020-05-22 17:15:50 +02:00
2a4b204fbe
🎨 ppn2ocr: Extract a function to make a workspace
2020-05-22 16:53:20 +02:00
18d4ab0ba1
✨ ppn2ocr: Use a better example document
2020-05-22 16:45:19 +02:00
8024064697
🐛 ppn2ocr: Fix file:/ links to use file:///, and remove unavaiblable LOCAL file group
2020-05-22 16:09:00 +02:00
612d44b074
🚧 zdb2ocr: Add TODOs from notes.md
2020-05-22 13:49:34 +02:00
9303f4b4df
🚧 zdb2ocr: Produce OCR of ZEFYS newspapers (WIP)
2020-05-22 13:43:11 +02:00
3b60b26c53
🐛 ppn2ocr: Do not set no_proxy here
2020-05-18 21:03:06 +02:00
5675047047
🧹 ppn2ocr: We already use run-docker-hub
2020-05-14 16:25:34 +02:00