You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

23 lines
894 B
Markdown

Train a GT4HistOCR Calamari model
=================================
`train.sh` trains a Calamari 1 model based on GT4HistOCR. Or rather 5 using
cross-validation to use for confidence voting. This repository mainly
serves as documentation of the provenance of the model published at
https://qurator-data.de/calamari-models/, not as the definitive guide to
training such a model.
Trained models
--------------
For a finished model have a look here:
https://qurator-data.de/calamari-models/
Training your own model
-----------------------
If you really want to, you can use this script to train your own. It takes
about 1 week on a Nvidia RTX 2080 GPU. Please use [requirements.txt](requirements.txt)
in that case to setup a virtualenv.
`train.sh` is able to download GT4HistOCR from the web if the `data` submodule
is not available, that is if you're not a member of the Qurator team at SBB.