From 5f6b8bc9c38301833c8b635433bcab3d016b211e Mon Sep 17 00:00:00 2001 From: Clemens Neudecker Date: Tue, 19 Nov 2019 23:50:52 +0100 Subject: [PATCH] Update Preprocessing.md --- docs/Preprocessing.md | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/docs/Preprocessing.md b/docs/Preprocessing.md index 0f38bf0..3c8521e 100644 --- a/docs/Preprocessing.md +++ b/docs/Preprocessing.md @@ -3,7 +3,15 @@ The preprocessing pipeline that is developed at the [Berlin State Library](http://staatsbibliothek-berlin.de/) comprises the following steps: -- textline extraction @[sbb_pixelwise_segmentation](https://github.com/qurator-spk/pixelwise_segmentation_SBB) -- OCR + word segmentation @[ocrd_tesserocr](https://github.com/OCR-D/ocrd_tesserocr) +- Layout Analysis & Textline Extraction @[sbb_pixelwise_segmentation](https://github.com/qurator-spk/pixelwise_segmentation_SBB) +- OCR & Word Segmentation @[ocrd_tesserocr](https://github.com/OCR-D/ocrd_tesserocr) - Tokenization -- Pretagging @[sbb_ner](https://github.com/qurator-spk/sbb_ner) +- Named Entity Recognition @[sbb_ner](https://github.com/qurator-spk/sbb_ner) + +### Layout Analysis & Textline Extraction + +### OCR & Word Segmentation + +### Tokenization + +### Named Entity Recognition