From 860a3c45f017154c4775610e87fc0e93835360a3 Mon Sep 17 00:00:00 2001
From: cneud <clemens.neudecker@gmail.com>
Date: Tue, 19 Nov 2019 23:32:29 +0100
Subject: [PATCH] Create Preprocessing.md

---
 docs/Preprocessing.md | 10 ++++++++++
 1 file changed, 10 insertions(+)
 create mode 100644 docs/Preprocessing.md

diff --git a/docs/Preprocessing.md b/docs/Preprocessing.md
new file mode 100644
index 0000000..8a7dc9c
--- /dev/null
+++ b/docs/Preprocessing.md
@@ -0,0 +1,10 @@
+# Preprocessing
+
+The preprocessing pipeline that is developed at the 
+[Berlin State Library](http://staatsbibliothek-berlin.de/) 
+comprises the following steps:
+- textline extraction @[sbb_pixelwise_segmentation](https://github.com/qurator-spk/pixelwise_segmentation_SBB)
+- word segmentation @[ocrd_tesserocr](https://github.com/OCR-D/ocrd_tesserocr)
+- OCR @[ocrd_calamari](https://github.com/qurator-spk/ocrd_calamari)
+- Tokenization
+- Pretagging @[sbb_ner](https://github.com/qurator-spk/sbb_ner)
\ No newline at end of file