From 601ae8bff75995b1661a65b3365b26e52cab46d1 Mon Sep 17 00:00:00 2001
From: vahidrezanezhad <vahid631983@gmail.com>
Date: Mon, 3 Aug 2020 13:04:29 +0200
Subject: [PATCH] Update README.md

---
 README.md | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/README.md b/README.md
index 130bd3f..46aed13 100644
--- a/README.md
+++ b/README.md
@@ -9,6 +9,12 @@ The goal of this project is to extract textlines of a document to feed an ocr mo
 * Layout analysis
 * Textline detection
 * Heuristic methods
+First three stages are done by using a pixel-wise segmentation. You can train your own model using this tool (https://github.com/qurator-spk/sbb_pixelwise_segmentation).
+
+## Printspace or border extraction
+From ocr point of view and in order to avoid texts outside printspace region, you need to detect and extract printspace region. As mentioned briefly earlier this is done by a binary pixelwise-segmentation. We have trained our model by a dataset of 2000 documents where about 1200 of them was from dhsegment project (you can download the dataset from here https://github.com/dhlab-epfl/dhSegment/releases/download/v0.2/pages.zip) and the rest was annotated by myself using our dataset in SBB. 
+This is worthy to mention that for page (printspace or border) extractation you have to feed model whole image at once and not in patches.
+
 
 ## Installation
 `pip install .`