| 
									
										
										
										
											2019-10-10 16:13:07 +02:00
										 |  |  | # Textline-Recognition
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-12-06 00:47:53 +01:00
										 |  |  | ## Introduction
 | 
					
						
							|  |  |  | This tool performs textline detection from document image data and returns the results as PAGE-XML. | 
					
						
							| 
									
										
										
										
											2019-10-10 16:13:07 +02:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-12-06 00:47:53 +01:00
										 |  |  | ## Installation
 | 
					
						
							| 
									
										
										
										
											2019-12-05 16:30:09 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-12-06 00:48:38 +01:00
										 |  |  | `pip install .` | 
					
						
							| 
									
										
										
										
											2019-12-05 16:06:55 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-12-06 00:47:53 +01:00
										 |  |  | ## Models
 | 
					
						
							|  |  |  | In order to run this tool you also need trained models. You can download our pre-trained models from here:    | 
					
						
							| 
									
										
										
										
											2019-12-05 22:06:44 +01:00
										 |  |  | https://file.spk-berlin.de:8443/textline_detection/ | 
					
						
							| 
									
										
										
										
											2019-10-10 16:13:07 +02:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-12-06 00:47:53 +01:00
										 |  |  | ## Usage
 | 
					
						
							| 
									
										
										
										
											2019-10-10 16:13:07 +02:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-12-05 22:06:44 +01:00
										 |  |  | `sbb_textline_detector -i <image file name> -o <directory to write output xml> -m <directory of models>` | 
					
						
							| 
									
										
										
										
											2019-12-05 16:15:07 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-12-06 11:42:23 +01:00
										 |  |  | ## Usage with OCR-D
 | 
					
						
							| 
									
										
										
										
											2019-12-06 20:03:51 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-12-06 11:42:23 +01:00
										 |  |  | ~~~ | 
					
						
							|  |  |  | ocrd-example-binarize -I OCR-D-IMG -O OCR-D-IMG-BIN | 
					
						
							| 
									
										
										
										
											2019-12-06 12:34:15 +01:00
										 |  |  | ocrd_sbb_textline_detector -I OCR-D-IMG-BIN -O OCR-D-SEG-LINE-SBB \ | 
					
						
							|  |  |  |         -p '{ "model": "/path/to/the/models/textline_detection" }' | 
					
						
							| 
									
										
										
										
											2019-12-06 11:42:23 +01:00
										 |  |  | ~~~ | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Segmentation works on raw RGB images, but respects and retains | 
					
						
							|  |  |  | `AlternativeImage`s from binarization steps, so it's a good idea to do | 
					
						
							|  |  |  | binarization first, then perform the textline detection. The used binarization | 
					
						
							|  |  |  | processor must produce an `AlternativeImage` for the binarized image, not | 
					
						
							| 
									
										
										
										
											2019-12-06 20:03:51 +01:00
										 |  |  | replace the original raw RGB image. |