From 7c5cbc7244ec73c19880ce4d647d002eff66a948 Mon Sep 17 00:00:00 2001 From: "Gerber, Mike" Date: Fri, 22 May 2020 17:23:49 +0200 Subject: [PATCH] =?UTF-8?q?=F0=9F=93=9D=20ppn2ocr:=20Add=20to=20README,=20?= =?UTF-8?q?including=20proxy=20configuration?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- README.md | 20 ++++++++++++++++++++ ppn2ocr | 3 --- 2 files changed, 20 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index cb01bd6..74cf4a1 100644 --- a/README.md +++ b/README.md @@ -55,3 +55,23 @@ available: ~~~ firefox OCR-D-OCR-CALAMARI-EVAL/OCR-D-OCR-CALAMARI-EVAL_00000024.html ~~~ + +ppn2ocr +------- +The `ppn2ocr` script produces OCR output for a given document in the State +Library Berlin (SBB)'s digitized collection. The document must be specified by its +PPN, for example: +~~~ +./ppn2ocr PPN77164308X +~~~ + +This produces a workspace directory `PPN77164308X` with the OCR results in it; +the results are viewable as explained above. + +ppn2ocr requires a working Docker setup and properly set up environment +variables for the proxy configuration. At SBB, this means: +~~~ +export HTTP_PROXY=http://http-proxy.sbb.spk-berlin.de:3128/ +export HTTPS_PROXY=$HTTP_PROXY; export http_proxy=$HTTP_PROXY; export https_proxy=$HTTP_PROXY +export no_proxy=localhost,digital.staatsbibliothek-berlin.de,content.staatsbibliothek-berlin.de +~~~ diff --git a/ppn2ocr b/ppn2ocr index 6d64540..7d32611 100755 --- a/ppn2ocr +++ b/ppn2ocr @@ -67,9 +67,6 @@ $self_dir/run-docker-hub -I PRESENTATION --skip-validation # TODO -# * README: Users must configure their proxy properly via environment variables; -# This includes setting no_proxy (e.g. for use at SBB). - # my_ocrd_workflow # ---------------- # * Need option to add volumes e.g. /srv/digisam_images