mirror of
https://github.com/mikegerber/ocrd_calamari.git
synced 2025-06-08 19:29:53 +02:00
📝 Reduce process() docstring again
This commit is contained in:
parent
b4c3b026db
commit
34013ddb02
1 changed files with 3 additions and 23 deletions
|
@ -69,29 +69,9 @@ class CalamariRecognize(Processor):
|
||||||
"""
|
"""
|
||||||
Perform text recognition with Calamari on the workspace.
|
Perform text recognition with Calamari on the workspace.
|
||||||
|
|
||||||
For each page of the input file group, open and deserialize input PAGE-XML
|
If ``texequiv_level`` is ``word`` or ``glyph``, then additionally create word / glyph level segments by
|
||||||
and its respective images. Then iterate over the element hierarchy down to
|
splitting at white space characters / glyph boundaries. In the case of ``glyph``, add all alternative character
|
||||||
the line level.
|
hypotheses down to ``glyph_conf_cutoff`` confidence threshold.
|
||||||
|
|
||||||
For each textline, retrieve a segment image according to the layout annotation
|
|
||||||
(from an existing ``AlternativeImage``, or by cropping into the higher-level
|
|
||||||
images, and deskewing when applicable).
|
|
||||||
|
|
||||||
If the line element contained any previous text results or word segmentation,
|
|
||||||
delete it.
|
|
||||||
|
|
||||||
Convert the line image to a Numpy array and pass it to the recognizer. Aggregate
|
|
||||||
character results on the line level, stripping leading and trailing white space,
|
|
||||||
and selecting the best hypothesis for each position. Annotate the resulting
|
|
||||||
TextEquiv string and (average) confidence on the line segment.
|
|
||||||
|
|
||||||
If ``texequiv_level`` is ``word`` or ``glyph``, then additionally create word
|
|
||||||
level segments by splitting at white space characters, using the vertical
|
|
||||||
line coordinates and horizontal white space boundaries. In the case of ``glyph``,
|
|
||||||
create glyph level segments as well, adding all alternative character hypotheses
|
|
||||||
down to ``glyph_conf_cutoff`` confidence threshold.
|
|
||||||
|
|
||||||
Produce a new PAGE output file by serialising the resulting hierarchy.
|
|
||||||
"""
|
"""
|
||||||
log = getLogger('processor.CalamariRecognize')
|
log = getLogger('processor.CalamariRecognize')
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue