WIP. Given a OCR-D workspace with document images in the OCR-D-IMG file group,
this workflow produces:
* Binarized images
* Line segmentation
* OCR text (using Calamari and Tesseract, both with GT4HistOCR models)
* (Given ground truth in OCR-D-GT-PAGE, also an OCR text evaluation report)
If you're interested in the exact processors, versions and parameters, please take a look at the [script](my_ocrd_workflow) and possibly the [Dockerfile](Dockerfile) and the [requirements](requirements.txt).
Goal
----
Provide an environment to produce OCR output using OCR-D, especially [ocrd_calamari](https://github.com/OCR-D/ocrd_calamari) and [sbb_textline_detection](https://github.com/qurator-spk/sbb_textline_detection), including all dependencies in Docker.
How to use
----------
It's easiest to use it as a container. To build the container using Docker:
It's easiest to use it as a container. To build the container using Docker:
~~~
~~~
cd ~/devel/my_ocrd_workflow
cd ~/devel/my_ocrd_workflow
@ -23,6 +37,7 @@ cd actevedef_718448162.first-page