From 3d0a59e0d7a6fce93da2d8ac9a9efe642d147baa Mon Sep 17 00:00:00 2001 From: Robert Sachunsky Date: Fri, 29 Nov 2019 18:45:10 +0100 Subject: [PATCH] update README --- README.md | 39 ++++++++++++++++++++++++++++++++++----- 1 file changed, 34 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 3d056f5..3b51455 100644 --- a/README.md +++ b/README.md @@ -1,20 +1,49 @@ # ocrd_repair_inconsistencies -Automatically re-order lines, words and glyphs to become textually consistent with their parents. + Automatically re-order lines, words and glyphs to become textually consistent with their parents. + +## Introduction PAGE-XML elements with textual annotation are re-ordered by their centroid coordinates -in top-to-bottom/left-to-right fashion iff such re-ordering fixes the inconsistency -between their appropriately concatenated `TextEquiv` texts with their parent's `TextEquiv` text. +iff such re-ordering fixes the inconsistency between their appropriately concatenated +`TextEquiv` texts with their parent's `TextEquiv` text. + +If `TextEquiv` is missing, skip the respective elements. + +Where available, respect the annotated visual order: +- For regions vs lines, sort in `top-to-bottom` fashion, unless another `textLineOrder` is annotated. + (Both `left-to-right` and `right-to-left` will be skipped currently.) +- For lines vs words and words vs glyphs, sort in `left-to-right` fashion, unless another `readingDirection` is annotated. + (Both `top-to-bottom` and `bottom-to-top` will be skipped currently.) This processor does not affect `ReadingOrder` between regions, just the order of the XML elements below the region level, and only if not contradicting the annotated `textLineOrder`/`readingDirection`. We wrote this as a one-shot script to fix some files. Use with caution. +## Installation + +(In your venv, run:) + +```sh +make deps # or pip install -r requirements.txt +make install # or pip install . +``` + +## Usage + +Offers the following user interfaces: + +### [OCR-D processor](https://ocr-d.github.io/cli) CLI `ocrd-repair-inconsistencies` + +To be used with [PageXML](https://github.com/PRImA-Research-Lab/PAGE-XML) +documents in an [OCR-D](https://ocr-d.github.io) annotation workflow. + +### Example -## Example usage +Use the following script to repair `OCR-D-GT-PAGE` annotation in workspaces, +and then replace it with the output on success: -For example, use this fix script: ~~~sh #!/bin/bash set -e