From 51d90dad50fb8794512d7f7d6f49df4085539172 Mon Sep 17 00:00:00 2001 From: vahidrezanezhad Date: Mon, 3 Aug 2020 12:45:55 +0200 Subject: [PATCH] Update README.md --- README.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/README.md b/README.md index c6bd51b..321a70d 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,11 @@ ## Introduction This tool performs printspace, region and textline detection from document image data and returns the results as [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML). +The goal of this project is to extract textlines of a document to feed an ocr model. This is achieved by four successive stages as follows: +* Item 1 Printspace or border extraction +* Item 2 Layout analysis +* Item 3 Textline detection +* Item 4 Heuristic methods ## Installation `pip install .`