a51f0b3dcd 
								
							 
						 
						
							
							
								
								Merge pull request  #42  from b2m/test-python-cache-for-travis  
							
							... 
							
							
							
							Add travis pip caching 
							
						 
						
							2020-10-30 12:35:20 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Benjamin Rosemann 
								
							 
						 
						
							
							
							
							
								
							
							
								b10af9f138 
								
							 
						 
						
							
							
								
								Test travis pip caching  
							
							
							
						 
						
							2020-10-29 16:41:19 +01:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
								
								
							
							
							
								
							
							
								089f6d299e 
								
							 
						 
						
							
							
								
								Merge pull request  #37  from b2m/fix-sort-with-none  
							
							... 
							
							
							
							Sort textlines with missing indices 
							
						 
						
							2020-10-29 15:05:46 +01:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
								
								
							
							
							
								
							
							
								5138a1de21 
								
							 
						 
						
							
							
								
								Merge pull request  #39  from b2m/test-python-3.9  
							
							... 
							
							
							
							Add Python 3.9 to .travis.yml 
							
						 
						
							2020-10-29 13:42:24 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Benjamin Rosemann 
								
							 
						 
						
							
							
							
							
								
							
							
								c02569b41e 
								
							 
						 
						
							
							
								
								Fix f-strings for Python 3.5  
							
							
							
						 
						
							2020-10-29 12:33:54 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Benjamin Rosemann 
								
							 
						 
						
							
							
							
							
								
							
							
								7b27b2834e 
								
							 
						 
						
							
							
								
								More complex sorting for text extraction  
							
							... 
							
							
							
							When extracting text from TextEquiv nodes we may encounter nodes without
index or nodes that should get sorted via the conf attribute.
Therefore we added a more complex algorithm to extract a TextEquiv and
inform the user via log messages if we encounter structures that we can
handle but may produce unexpected results. 
							
						 
						
							2020-10-29 10:03:40 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Benjamin Rosemann 
								
							 
						 
						
							
							
							
							
								
							
							
								6ff831dfd2 
								
							 
						 
						
							
							
								
								Sort textlines with missing indices  
							
							... 
							
							
							
							Python's `sorted` method will fail with a TypeError when called with
`None` and Integers:
```python
>>> sorted([None, 1])
TypeError: '<' not supported between instances of 'int' and 'NoneType'
```
Therefore we are using `float('inf')` instead of `None` in case of
missing textline indices. 
							
						 
						
							2020-10-29 10:03:40 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Benjamin Rosemann 
								
							 
						 
						
							
							
							
							
								
							
							
								e77f19fefc 
								
							 
						 
						
							
							
								
								Add Python 3.9 to .travis.yml  
							
							
							
						 
						
							2020-10-29 10:02:51 +01:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
								
								
							
							
							
								
							
							
								082fc9e09a 
								
							 
						 
						
							
							
								
								Merge pull request  #38  from b2m/add-editorconfig  
							
							... 
							
							
							
							Add .editorconfig 
							
						 
						
							2020-10-28 15:16:04 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Benjamin Rosemann 
								
							 
						 
						
							
							
							
							
								
							
							
								20661487d6 
								
							 
						 
						
							
							
								
								Add .editorconfig  
							
							... 
							
							
							
							Add a proposal for a .editorconfig file (see https://editorconfig.org/ ).
This is natively supported by a lot of editors, others are supported via
plugins.
This will close  #19 . 
							
						 
						
							2020-10-28 11:31:18 +01:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								6e47acda1c 
								
							 
						 
						
							
							
								
								📝  dinglehopper: Move screenshot higher  
							
							
							
						 
						
							2020-10-21 19:31:53 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								5cbe148741 
								
							 
						 
						
							
							
								
								🐛  dinglehopper: Skip pages if there is no GT nor OCR (Fixes GH-34)  
							
							
							
						 
						
							2020-10-21 19:29:45 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								e4e2777cb7 
								
							 
						 
						
							
							
								
								🐛  dinglehopper: Do try to get text when no TextEquivs exist  
							
							
							
						 
						
							2020-10-21 17:59:44 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								f14ae46870 
								
							 
						 
						
							
							
								
								Merge branch 'feat/text-extraction-levels'  
							
							
							
						 
						
							2020-10-21 17:51:44 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								1c88891a98 
								
							 
						 
						
							
							
								
								✔️  Add test data for LAREX's indexed TextEquivs (unused)  
							
							
							
						 
						
							2020-10-21 17:51:15 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								19d15e3ecc 
								
							 
						 
						
							
							
								
								🐛  dinglehopper: Honor TextEquiv index (Closes GH-33)  
							
							
							
						 
						
							2020-10-21 17:50:21 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								f626a2ebe6 
								
							 
						 
						
							
							
								
								🧹  dinglehopper: Remove warning when there is a non-TextRegion in the ReadingOrder  
							
							
							
						 
						
							2020-10-21 17:03:55 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								0f3857d8d3 
								
							 
						 
						
							
							
								
								📝  Document OCR-D parameters and restructure README a bit  
							
							
							
						 
						
							2020-10-21 16:54:23 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								8b4ee20a40 
								
							 
						 
						
							
							
								
								✨  Add a new CLI tool dinglehopper-extract to just give the extracted text  
							
							
							
						 
						
							2020-10-21 16:30:48 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								b23b75b601 
								
							 
						 
						
							
							
								
								✨  dinglehopper: Give segment ids from the extracted textequiv_level  
							
							
							
						 
						
							2020-10-21 16:04:33 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								b23e4ce30e 
								
							 
						 
						
							
							
								
								✨  dinglehopper: Add OCR-D parameter to choose TextEquiv level  
							
							
							
						 
						
							2020-10-21 14:38:19 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								9744fa2567 
								
							 
						 
						
							
							
								
								✨  dinglehopper: Add CLI option to choose TextEquiv level  
							
							
							
						 
						
							2020-10-20 19:33:39 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								75733039b8 
								
							 
						 
						
							
							
								
								🧹  dinglehopper: Do not hardcode joiner to \n  
							
							
							
						 
						
							2020-10-20 18:43:56 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								3848412349 
								
							 
						 
						
							
							
								
								✨  dinglehopper: Implement the basic text extraction from PAGE TextLines  
							
							
							
						 
						
							2020-10-20 18:40:21 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								f2367ac0c3 
								
							 
						 
						
							
							
								
								🐛  Fix OCR-D CLI for newest OCR-D  
							
							... 
							
							
							
							Now that find_files() is a generator, we can't use [0] to get the file. 
							
						 
						
							2020-10-16 14:58:27 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								5ed184c8c4 
								
							 
						 
						
							
							
								
								✨  dinglehopper: Show a progressbar on --progress  
							
							
							
						 
						
							2020-10-15 16:09:54 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								4951823a29 
								
							 
						 
						
							
							
								
								🧹  dinglehopper: Disable metrics in JSON report, too  
							
							
							
						 
						
							2020-10-15 15:38:15 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								5303eea80c 
								
							 
						 
						
							
							
								
								📝  dinglehopper: Update README to use OCR-D's new and more readable -P option  
							
							
							
						 
						
							2020-10-15 15:37:51 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								82217a25bb 
								
							 
						 
						
							
							
								
								🧹  dinglehopper: Move all normalization code to extracted_text.py  
							
							
							
						 
						
							2020-10-08 17:29:25 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								009fa55c09 
								
							 
						 
						
							
							
								
								Merge branch 'master' of  https://github.com/qurator-spk/dinglehopper  
							
							
							
						 
						
							2020-10-08 17:17:40 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								c20bbbfa25 
								
							 
						 
						
							
							
								
								📝  dinglehopper: Update screenshot to include a region id tooltip  
							
							
							
						 
						
							2020-10-08 17:17:34 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
								
								
							
							
							
								
							
							
								252bf9b3e7 
								
							 
						 
						
							
							
								
								📝  dinglehopper: Fix markdown in README.md  
							
							
							
						 
						
							2020-10-08 17:14:29 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								c6c6b8efab 
								
							 
						 
						
							
							
								
								📝  dinglehopper: Add detail about the text extraction and ExtractedText  
							
							
							
						 
						
							2020-10-08 17:05:36 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								7025ea54a8 
								
							 
						 
						
							
							
								
								📝  dinglehopper: Move developer info to README-DEV.md  
							
							
							
						 
						
							2020-10-08 16:59:50 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								f50591abac 
								
							 
						 
						
							
							
								
								Merge branch 'feat/display-segment-id'  
							
							
							
						 
						
							2020-10-08 13:39:38 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								c514abfb9f 
								
							 
						 
						
							
							
								
								🧹  dinglehopper: Sanitize imports  
							
							
							
						 
						
							2020-10-08 13:33:19 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								1077dc64ce 
								
							 
						 
						
							
							
								
								➡️  dinglehopper: Move ExtractedText to its own file  
							
							
							
						 
						
							2020-10-08 13:25:20 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								9dd4ff0aae 
								
							 
						 
						
							
							
								
								✨  dinglehopper: Extract line IDs for ALTO  
							
							
							
						 
						
							2020-10-08 12:54:28 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								f3aafb6fdf 
								
							 
						 
						
							
							
								
								✨  dinglehopper: Validate ExtractedText.{segments,_text} in both directions  
							
							
							
						 
						
							2020-10-08 12:20:27 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								1f9a680fe7 
								
							 
						 
						
							
							
								
								⚙️  dinglehopper: PyCharm should use dinglehopper-github virtualenv  
							
							
							
						 
						
							2020-10-08 12:16:42 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								b14c35e147 
								
							 
						 
						
							
							
								
								🎨  dinglehopper: Use multimethod to handle str vs ExtractedText  
							
							
							
						 
						
							2020-10-08 12:15:58 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								a17ee2afec 
								
							 
						 
						
							
							
								
								🚧  dinglehopper: Guarantee NFC + rename from_text → from_str  
							
							
							
						 
						
							2020-10-08 11:25:01 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								7843824eaf 
								
							 
						 
						
							
							
								
								🚧  dinglehopper: Support str & ExtractedText in CER and distance functions  
							
							
							
						 
						
							2020-10-08 10:47:20 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								5bee55c896 
								
							 
						 
						
							
							
								
								💩  dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments  
							
							
							
						 
						
							2020-10-07 18:40:06 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								96b55f1806 
								
							 
						 
						
							
							
								
								🚧  dinglehopper: Hierarchical text representation  
							
							
							
						 
						
							2020-10-07 18:31:52 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								db6292611f 
								
							 
						 
						
							
							
								
								🧹  dinglehopper: Remove merged text extraction test code  
							
							
							
						 
						
							2020-10-07 16:07:27 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								d706ef4621 
								
							 
						 
						
							
							
								
								📝  Document CER/WER and the format detection (Fixes GH-26)  
							
							
							
						 
						
							2020-09-30 17:58:05 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								da47e41c85 
								
							 
						 
						
							
							
								
								💩  dinglehopper: Fix OCR-D CLI test by working around ocrd_cli_wrap_processor() check for arguments  
							
							
							
						 
						
							2020-09-25 14:53:19 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
								
								
							
							
							
								
							
							
								7085ee0fd8 
								
							 
						 
						
							
							
								
								Merge pull request  #29  from kba/getlogger  
							
							... 
							
							
							
							getLogger per method 
							
						 
						
							2020-09-25 13:20:58 +02:00 
							
								 
							
						 
					 
				
					
						
							
						 
						
							
							
							
							
								
							
							
								77154ef256 
								
							 
						 
						
							
							
								
								📝  dinglehopper: Document REPORT_PREFIX (Closes GH-27.)  
							
							
							
						 
						
							2020-09-24 20:58:15 +02:00