You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

10 lines
436 B
Markdown

# Preprocessing
The preprocessing pipeline that is developed at the
[Berlin State Library](http://staatsbibliothek-berlin.de/)
comprises the following steps:
- textline extraction @[sbb_pixelwise_segmentation](https://github.com/qurator-spk/pixelwise_segmentation_SBB)
- OCR + word segmentation @[ocrd_tesserocr](https://github.com/OCR-D/ocrd_tesserocr)
- Tokenization
- Pretagging @[sbb_ner](https://github.com/qurator-spk/sbb_ner)