mirror of https://github.com/qurator-spk/neat.git
You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
498 B
498 B
Preprocessing
The preprocessing pipeline that is developed at the Berlin State Library comprises the following steps:
- textline extraction @sbb_pixelwise_segmentation
- word segmentation @ocrd_tesserocr
- OCR @ocrd_calamari
- Tokenization
- Pretagging @sbb_ner