From 0f3857d8d37aa4f67a53312cdeb88cc4764ba018 Mon Sep 17 00:00:00 2001 From: "Gerber, Mike" Date: Wed, 21 Oct 2020 16:51:53 +0200 Subject: [PATCH] =?UTF-8?q?=F0=9F=93=9D=20Document=20OCR-D=20parameters=20?= =?UTF-8?q?and=20restructure=20README=20a=20bit?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- README.md | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 8c39217..a7f8c22 100644 --- a/README.md +++ b/README.md @@ -60,6 +60,15 @@ dinglehopper some-document.gt.page.xml some-document.ocr.alto.xml This generates `report.html` and `report.json`. +### dinglehopper-extract +The tool `dinglehopper-extract` extracts the text of the given input file on +stdout, for example: + +~~~ +dinglehopper-extract --textequiv-level line OCR-D-GT-PAGE/00000024.page.xml +~~~ + +### OCR-D As a OCR-D processor: ~~~ ocrd-dinglehopper -I OCR-D-GT-PAGE,OCR-D-OCR-TESS -O OCR-D-OCR-TESS-EVAL @@ -69,18 +78,18 @@ This generates HTML and JSON reports in the `OCR-D-OCR-TESS-EVAL` filegroup. ![dinglehopper displaying metrics and character differences](.screenshots/dinglehopper.png?raw=true) -You may also want to disable metrics and the green-red color scheme by -parameter: +The OCR-D processor has these parameters: +| Parameter | Meaning | +| ------------------------- | ------------------------------------------------------------------- | +| `-P metrics false` | Disable metrics and the green-red color scheme (default: enabled) | +| `-P textequiv_level line` | (PAGE) Extract text from TextLine level (default: TextRegion level) | + +For example: ~~~ ocrd-dinglehopper -I ABBYY-FULLTEXT,OCR-D-OCR-CALAMARI -O OCR-D-OCR-COMPARE-ABBYY-CALAMARI -P metrics false ~~~ -The tool `dinglehopper-extract` extracts the text of the given input file on -stdout, for example: - -`dinglehopper-extract OCR-D-GT-PAGE/00000024.page.xml` - Developer information --------------------- *Please refer to [README-DEV.md](README-DEV.md).*