Go to file
Gerber, Mike decaa7b69f 🎨 Use polygon_from_x0y0x1y1 to build word/glyph polygon
.circleci 🐛 CircleCI: Try upgrading pip
.idea 🔧 Add PyCharm project files
ocrd_calamari 🎨 Use polygon_from_x0y0x1y1 to build word/glyph polygon
test Remove broken __main__ handling (stick to pytest)
.coveragerc Only do the coverage on our code
.gitignore Fix tests by 1. binarizing and 2. use the GT4HistOCR model
Dockerfile Dockerfile
LICENSE Initial commit
Makefile circle: set locale to a UTF-8 variant so python doesn't fall back to ascii
README-DEV.md 📝 README-DEV: Document installing test requirements
README.md 🧹 Do not advertise and support untested models
ocrd-tool.json .
requirements-test.txt Use GT segmentation to test
requirements.txt Include proper word + glyph segmentation
setup.py 📦 v0.0.3 – To fix version inconsistency

README.md

ocrd_calamari

Recognize text using Calamari OCR.

image image image

Introduction

This offers a OCR-D compliant workspace processor for some of the functionality of Calamari OCR.

This processor only operates on the text line level and so needs a line segmentation (and by extension a binarized image) as its input.

Installation

From PyPI

pip install ocrd_calamari

From Repo

pip install .

Install models

Download models trained on GT4HistOCR data:

make gt4histocr-calamari
ls gt4histocr-calamari

Example Usage

ocrd-calamari-recognize -p test-parameters.json -m mets.xml -I OCR-D-SEG-LINE -O OCR-D-OCR-CALAMARI

With test-parameters.json:

{
    "checkpoint": "/path/to/some/trained/models/*.ckpt.json"
}

Development & Testing

For information regarding development and testing, please see README-DEV.md.