From 3416a155ece8ae45c0ccabc665db8680a86602ea Mon Sep 17 00:00:00 2001 From: "Gerber, Mike" Date: Wed, 5 Feb 2020 17:39:37 +0100 Subject: [PATCH] =?UTF-8?q?=F0=9F=93=9D=20README:=20Provide=20a=20complete?= =?UTF-8?q?=20example=20using=20real=20data=20and=20other=20processors?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit See #33. --- .gitignore | 1 + Makefile | 4 ++++ README.md | 27 ++++++++++++++++----------- 3 files changed, 21 insertions(+), 11 deletions(-) diff --git a/.gitignore b/.gitignore index 42c4957..4061f82 100644 --- a/.gitignore +++ b/.gitignore @@ -107,5 +107,6 @@ venv.bak/ /calamari /calamari_models /gt4histocr-calamari +/actevedef_718448162* /repo /test/assets diff --git a/Makefile b/Makefile index 5a37869..c3e85ab 100644 --- a/Makefile +++ b/Makefile @@ -44,6 +44,10 @@ gt4histocr-calamari: tar xfv model.tar.xz && \ rm model.tar.xz +# Example data +actevedef_718448162: + wget https://qurator-data.de/examples/actevedef_718448162.zip && \ + unzip actevedef_718448162.zip # pip install calamari diff --git a/README.md b/README.md index 6f80434..2f6947b 100644 --- a/README.md +++ b/README.md @@ -46,18 +46,23 @@ ls gt4histocr-calamari ``` ## Example Usage +Before using `ocrd-calamari-recognize` get some example data and model, and +prepare the document for OCR: +``` +# Download model and example data +make gt4histocr-calamari +make actevedef_718448162 + +# Create binarized images and line segmentation using other OCR-D projects +ocrd-olena-binarize -p '{ "impl": "sauvola-ms-split" }' -I OCR-D-IMG -O OCR-D-IMG-BINPAGE,OCR-D-IMG-BIN +ocrd-tesserocr-segment-region -I OCR-D-IMG-BINPAGE -O OCR-D-SEG-REGION +ocrd-tesserocr-segment-line -I OCR-D-SEG-REGION -O OCR-D-SEG-LINE +``` -~~~ -ocrd-calamari-recognize -p test-parameters.json -m mets.xml -I OCR-D-SEG-LINE -O OCR-D-OCR-CALAMARI -~~~ - -With `test-parameters.json`: -~~~ -{ - "checkpoint": "/path/to/for/example/gt4histocr-calamari/*.ckpt.json", - "textequiv_level": "line" -} -~~~ +Finally recognize the text using ocrd_calamari and the downloaded model: +``` +ocrd-calamari-recognize -p '{ "checkpoint": "../gt4histocr-calamari/*.ckpt.json" }' -I OCR-D-SEG-LINE -O OCR-D-OCR-CALAMARI +``` You may want to have a look at the [ocrd-tool.json](ocrd_calamari/ocrd-tool.json) descriptions for additional parameters and default values.