No description
Find a file
2019-11-26 15:27:53 +01:00
examples 📝 Update fix script 2019-11-25 18:06:05 +01:00
ocrd_repair_inconsistencies Report missing reading direction/textline order 2019-11-25 18:05:10 +01:00
.gitignore 🎉 Initial commit 2019-11-22 16:18:05 +01:00
README.md 📝 README: Fix grammar 2019-11-26 15:27:53 +01:00
requirements.txt 🎉 Initial commit 2019-11-22 16:18:05 +01:00
setup.py 🎉 Initial commit 2019-11-22 16:18:05 +01:00

ocrd_repair_inconsistencies

Automatically fix PAGE-XML order inconsistencies in regions, lines and words. Child elements are only reordered if reordering by coordinates top-to-bottom/left-to-right fixes the appropriately concatenated TextEquiv texts of the children to match the parent's TextEquiv text. This processor does not change reading order, just the order of the XML elements in the file.

We wrote this as a one-shot script to fix some files. Use with caution.

Example usage

For example, use this fix script:

#!/bin/bash
set -e

tmp_fg=FIXED_$RANDOM

ocrd_repair_inconsistencies -I OCR-D-GT-PAGE -O $tmp_fg

for f in "$tmp_fg"/*; do
  g="OCR-D-GT-PAGE/OCR-D-GT-PAGE_${f#${tmp_fg}/${tmp_fg}_}"
  cp "$f" "$g"
done

ocrd workspace remove-group -rf $tmp_fg
rmdir $tmp_fg