mirror of
https://github.com/qurator-spk/dinglehopper.git
synced 2025-06-30 22:19:57 +02:00
📝 Document CER/WER and the format detection (Fixes GH-26)
This commit is contained in:
parent
da47e41c85
commit
d706ef4621
2 changed files with 10 additions and 2 deletions
|
@ -31,13 +31,17 @@ Usage: dinglehopper [OPTIONS] GT OCR [REPORT_PREFIX]
|
||||||
|
|
||||||
Compare the PAGE/ALTO/text document GT against the document OCR.
|
Compare the PAGE/ALTO/text document GT against the document OCR.
|
||||||
|
|
||||||
|
dinglehopper detects if GT/OCR are ALTO or PAGE XML documents to extract
|
||||||
|
their text and falls back to plain text if no ALTO or PAGE is detected.
|
||||||
|
|
||||||
The files GT and OCR are usually a ground truth document and the result of
|
The files GT and OCR are usually a ground truth document and the result of
|
||||||
an OCR software, but you may use dinglehopper to compare two OCR results.
|
an OCR software, but you may use dinglehopper to compare two OCR results.
|
||||||
In that case, use --no-metrics to disable the then meaningless metrics and
|
In that case, use --no-metrics to disable the then meaningless metrics and
|
||||||
also change the color scheme from green/red to blue.
|
also change the color scheme from green/red to blue.
|
||||||
|
|
||||||
The comparison report will be written to $REPORT_PREFIX.{html,json}, where
|
The comparison report will be written to $REPORT_PREFIX.{html,json}, where
|
||||||
$REPORT_PREFIX defaults to "report".
|
$REPORT_PREFIX defaults to "report". The reports include the character
|
||||||
|
error rate (CER) and the word error rate (WER).
|
||||||
|
|
||||||
Options:
|
Options:
|
||||||
--metrics / --no-metrics Enable/disable metrics and green/red
|
--metrics / --no-metrics Enable/disable metrics and green/red
|
||||||
|
|
|
@ -105,13 +105,17 @@ def main(gt, ocr, report_prefix, metrics):
|
||||||
"""
|
"""
|
||||||
Compare the PAGE/ALTO/text document GT against the document OCR.
|
Compare the PAGE/ALTO/text document GT against the document OCR.
|
||||||
|
|
||||||
|
dinglehopper detects if GT/OCR are ALTO or PAGE XML documents to extract
|
||||||
|
their text and falls back to plain text if no ALTO or PAGE is detected.
|
||||||
|
|
||||||
The files GT and OCR are usually a ground truth document and the result of
|
The files GT and OCR are usually a ground truth document and the result of
|
||||||
an OCR software, but you may use dinglehopper to compare two OCR results. In
|
an OCR software, but you may use dinglehopper to compare two OCR results. In
|
||||||
that case, use --no-metrics to disable the then meaningless metrics and also
|
that case, use --no-metrics to disable the then meaningless metrics and also
|
||||||
change the color scheme from green/red to blue.
|
change the color scheme from green/red to blue.
|
||||||
|
|
||||||
The comparison report will be written to $REPORT_PREFIX.{html,json}, where
|
The comparison report will be written to $REPORT_PREFIX.{html,json}, where
|
||||||
$REPORT_PREFIX defaults to "report".
|
$REPORT_PREFIX defaults to "report". The reports include the character error
|
||||||
|
rate (CER) and the word error rate (WER).
|
||||||
"""
|
"""
|
||||||
process(gt, ocr, report_prefix, metrics=metrics)
|
process(gt, ocr, report_prefix, metrics=metrics)
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue