diff --git a/README.md b/README.md index ba20b4e..e6e8071 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,12 @@ # ocrd_repair_inconsistencies -Automatically fix PAGE-XML order inconsistencies in regions, lines and words. -Children elements are only reordered if reordering by coordinates -top-to-bottom/left-to-right fixes the appropriately concatenated `TextEquiv` -texts of the children to match the parent's `TextEquiv` text. This processor -does not change reading order, just the order of the XML elements in the file. +Automatically re-order lines, words and glyphs to become textually consistent with their parents. + +PAGE-XML elements with textual annotation are re-ordered by their centroid coordinates +in top-to-bottom/left-to-right fashion iff such re-ordering fixes the inconsistency +between their appropriately concatenated `TextEquiv` texts with their parent's `TextEquiv` text. + +This processor does not affect `ReadingOrder` between regions, just the order of the XML elements below the region level, and only if not contradicting the annotated `textLineOrder`/`readingDirection`. We wrote this as a one-shot script to fix some files. Use with caution.