1
0
Fork 0
mirror of https://github.com/qurator-spk/dinglehopper.git synced 2025-06-09 11:50:00 +02:00

📝 Document CER/WER and the format detection (Fixes GH-26)

This commit is contained in:
Gerber, Mike 2020-09-30 17:58:05 +02:00
parent da47e41c85
commit d706ef4621
2 changed files with 10 additions and 2 deletions

View file

@ -105,13 +105,17 @@ def main(gt, ocr, report_prefix, metrics):
"""
Compare the PAGE/ALTO/text document GT against the document OCR.
dinglehopper detects if GT/OCR are ALTO or PAGE XML documents to extract
their text and falls back to plain text if no ALTO or PAGE is detected.
The files GT and OCR are usually a ground truth document and the result of
an OCR software, but you may use dinglehopper to compare two OCR results. In
that case, use --no-metrics to disable the then meaningless metrics and also
change the color scheme from green/red to blue.
The comparison report will be written to $REPORT_PREFIX.{html,json}, where
$REPORT_PREFIX defaults to "report".
$REPORT_PREFIX defaults to "report". The reports include the character error
rate (CER) and the word error rate (WER).
"""
process(gt, ocr, report_prefix, metrics=metrics)