You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Gerber, Mike 7d4afef3f8 | 5 years ago | |
---|---|---|
examples | 5 years ago | |
ocrd_repair_inconsistencies | 5 years ago | |
.gitignore | 5 years ago | |
README.md | 5 years ago | |
requirements.txt | 5 years ago | |
setup.py | 5 years ago |
README.md
ocrd_repair_inconsistencies
Automatically fix PAGE-XML order inconsistencies in regions, lines and words.
Children elements are only reordered if reordering by coordinates
top-to-bottom/left-to-right fixes the appropriately concatenated TextEquiv
texts of the children to match the parent's TextEquiv
text. This processor
does not change reading order, just the order of the XML elements in the file.
We wrote this as a one-shot script to fix some files. Use with caution.
Example usage
For example, use this fix script:
#!/bin/bash
set -e
tmp_fg=FIXED_$RANDOM
ocrd_repair_inconsistencies -I OCR-D-GT-PAGE -O $tmp_fg
for f in "$tmp_fg"/*; do
g="OCR-D-GT-PAGE/OCR-D-GT-PAGE_${f#${tmp_fg}/${tmp_fg}_}"
cp "$f" "$g"
done
ocrd workspace remove-group -rf $tmp_fg
rmdir $tmp_fg