You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

801 B

ocrd_repair_inconsistencies

Automatically fix PAGE-XML order inconsistencies in regions, lines and words. Child elements are only reordered if reordering by coordinates top-to-bottom/left-to-right fixes the appropriately concatenated TextEquiv texts of the children to match the parent's TextEquiv text. This processor does not change reading order, just the order of the XML elements in the file.

We wrote this as a one-shot script to fix some files. Use with caution.

Example usage

For example, use this fix script:

#!/bin/bash
set -e

tmp_fg=FIXED_$RANDOM

ocrd_repair_inconsistencies -I OCR-D-GT-PAGE -O $tmp_fg

for f in "$tmp_fg"/*; do
  g="OCR-D-GT-PAGE/OCR-D-GT-PAGE_${f#${tmp_fg}/${tmp_fg}_}"
  cp "$f" "$g"
done

ocrd workspace remove-group -rf $tmp_fg
rmdir $tmp_fg