From 0a572df0ba5ef61e1c162d10a30c9ce3a65f47d5 Mon Sep 17 00:00:00 2001 From: "Gerber, Mike" Date: Mon, 3 Feb 2020 15:31:36 +0100 Subject: [PATCH] =?UTF-8?q?=F0=9F=93=9D=20README:=20Add=20information=20ab?= =?UTF-8?q?out=20the=20new=20glyph=20and=20word=20segmentation?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- README.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/README.md b/README.md index aea4d41..18041e8 100644 --- a/README.md +++ b/README.md @@ -13,6 +13,14 @@ This offers a OCR-D compliant workspace processor for some of the functionality This processor only operates on the text line level and so needs a line segmentation (and by extension a binarized image) as its input. +In addition to the line text it also outputs glyph segmentation including +per-glyph confidence values and per-glyph alternative predictions as provided +by the Calamari OCR engine. Note that while Calamari does not provide word +segmentation, this processor produces word segmentation inferred from Unicode +text segmentation and the glyph positions. The provided glyph and word +segmentation can be used for text extraction and highlighting, but is probably +not useful for further image-based processing. + ## Installation ### From PyPI