mirror of
https://github.com/qurator-spk/ocrd_repair_inconsistencies.git
synced 2025-06-26 03:59:57 +02:00
update README
This commit is contained in:
parent
bb925a6a5b
commit
3d0a59e0d7
1 changed files with 34 additions and 5 deletions
39
README.md
39
README.md
|
@ -1,20 +1,49 @@
|
|||
# ocrd_repair_inconsistencies
|
||||
|
||||
Automatically re-order lines, words and glyphs to become textually consistent with their parents.
|
||||
Automatically re-order lines, words and glyphs to become textually consistent with their parents.
|
||||
|
||||
## Introduction
|
||||
|
||||
PAGE-XML elements with textual annotation are re-ordered by their centroid coordinates
|
||||
in top-to-bottom/left-to-right fashion iff such re-ordering fixes the inconsistency
|
||||
between their appropriately concatenated `TextEquiv` texts with their parent's `TextEquiv` text.
|
||||
iff such re-ordering fixes the inconsistency between their appropriately concatenated
|
||||
`TextEquiv` texts with their parent's `TextEquiv` text.
|
||||
|
||||
If `TextEquiv` is missing, skip the respective elements.
|
||||
|
||||
Where available, respect the annotated visual order:
|
||||
- For regions vs lines, sort in `top-to-bottom` fashion, unless another `textLineOrder` is annotated.
|
||||
(Both `left-to-right` and `right-to-left` will be skipped currently.)
|
||||
- For lines vs words and words vs glyphs, sort in `left-to-right` fashion, unless another `readingDirection` is annotated.
|
||||
(Both `top-to-bottom` and `bottom-to-top` will be skipped currently.)
|
||||
|
||||
This processor does not affect `ReadingOrder` between regions, just the order of the XML elements
|
||||
below the region level, and only if not contradicting the annotated `textLineOrder`/`readingDirection`.
|
||||
|
||||
We wrote this as a one-shot script to fix some files. Use with caution.
|
||||
|
||||
## Installation
|
||||
|
||||
## Example usage
|
||||
(In your venv, run:)
|
||||
|
||||
```sh
|
||||
make deps # or pip install -r requirements.txt
|
||||
make install # or pip install .
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
Offers the following user interfaces:
|
||||
|
||||
### [OCR-D processor](https://ocr-d.github.io/cli) CLI `ocrd-repair-inconsistencies`
|
||||
|
||||
To be used with [PageXML](https://github.com/PRImA-Research-Lab/PAGE-XML)
|
||||
documents in an [OCR-D](https://ocr-d.github.io) annotation workflow.
|
||||
|
||||
### Example
|
||||
|
||||
Use the following script to repair `OCR-D-GT-PAGE` annotation in workspaces,
|
||||
and then replace it with the output on success:
|
||||
|
||||
For example, use this fix script:
|
||||
~~~sh
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue