Create Preprocessing.md

2026-07-29 06:32:29 +02:00 · 2019-11-19 23:32:29 +01:00 · 2019-11-19 23:32:29 +01:00 · 860a3c45f0
commit 860a3c45f0
parent 7c89b012cb
1 changed files with 10 additions and 0 deletions
--- a/docs/Preprocessing.md
+++ b/docs/Preprocessing.md
@ -0,0 +1,10 @@
+# Preprocessing
+
+The preprocessing pipeline that is developed at the 
+[Berlin State Library](http://staatsbibliothek-berlin.de/) 
+comprises the following steps:
+- textline extraction @[sbb_pixelwise_segmentation](https://github.com/qurator-spk/pixelwise_segmentation_SBB)
+- word segmentation @[ocrd_tesserocr](https://github.com/OCR-D/ocrd_tesserocr)
+- OCR @[ocrd_calamari](https://github.com/qurator-spk/ocrd_calamari)
+- Tokenization
+- Pretagging @[sbb_ner](https://github.com/qurator-spk/sbb_ner)