1
0
Fork 0
mirror of https://github.com/qurator-spk/neat.git synced 2025-06-11 20:59:54 +02:00

Create Preprocessing.md

This commit is contained in:
cneud 2019-11-19 23:32:29 +01:00
parent 7c89b012cb
commit 860a3c45f0

10
docs/Preprocessing.md Normal file
View file

@ -0,0 +1,10 @@
# Preprocessing
The preprocessing pipeline that is developed at the
[Berlin State Library](http://staatsbibliothek-berlin.de/)
comprises the following steps:
- textline extraction @[sbb_pixelwise_segmentation](https://github.com/qurator-spk/pixelwise_segmentation_SBB)
- word segmentation @[ocrd_tesserocr](https://github.com/OCR-D/ocrd_tesserocr)
- OCR @[ocrd_calamari](https://github.com/qurator-spk/ocrd_calamari)
- Tokenization
- Pretagging @[sbb_ner](https://github.com/qurator-spk/sbb_ner)