Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								75823f9bed 
								
							 
						 
						
							
							
								
								run_single: call writer.build_pagexml_no_full_layout w/ kwargs  
							
							
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								cbbb3248c7 
								
							 
						 
						
							
							
								
								writer: simplify  
							
							... 
							
							
							
							- `build_pagexml_no_full_layout`: delegate to
  `build_pagexml_full_layout` (removing redundant code) 
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								e32479765c 
								
							 
						 
						
							
							
								
								writer: simplify  
							
							... 
							
							
							
							- simplify serialization of coordinates
- re-use `serialize_lines_in_region` (drop `*_in_dropcapital` and `*_in_marginal`)
- re-use `calculate_polygon_coords` 
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								d88ca18eec 
								
							 
						 
						
							
							
								
								get/do_work_of_slopes etc.: reduce call/return signatures  
							
							... 
							
							
							
							- `get_textregion_contours_in_org_image_light`: no more need
  to also return unchanged contours here (see 41cc38c5 
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								02a347a48a 
								
							 
						 
						
							
							
								
								no more need to rm from contours_only_text_parent_d_ordered now  
							
							
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								fd43e78442 
								
							 
						 
						
							
							
								
								filter_contours_without_textline_inside: simplify  
							
							... 
							
							
							
							- np.delete in index array instead of contour lists
- yield actual resulting indices 
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								0a80cd5dff 
								
							 
						 
						
							
							
								
								avoid unnecessary 3-channel conversions: for tables, too  
							
							
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								dfdc705375 
								
							 
						 
						
							
							
								
								do_work_of_slopes: rm unused old variant  
							
							
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								2e907875c1 
								
							 
						 
						
							
							
								
								get_text_region_boxes_by_given_contours: simplify  
							
							
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								d53f829dfd 
								
							 
						 
						
							
							
								
								filter_contours_inside_a_bigger_one: fix edge case in  81827c29 
							
							
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								18bbdb7c48 
								
							 
						 
						
							
							
								
								CI: run deps-test with OCR extra so symlink rule fires  
							
							
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								23535998f7 
								
							 
						 
						
							
							
								
								tests: symlink OCR models into layout model directory  
							
							... 
							
							
							
							(so layout with OCR options works with our split model packages) 
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								a1904fa660 
								
							 
						 
						
							
							
								
								tests: cover layout with OCR in various modes  
							
							
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								595ed02743 
								
							 
						 
						
							
							
								
								run_single: simplify; allow running TrOCR in non-fl mode, too  
							
							... 
							
							
							
							- refactor final `self.full_layout` conditional, removing copied code
- allow running `self.ocr` and `self.tr` branch in both cases (non/fl)
- when running TrOCR, use model / processor / device initialised during init
  (instead of ad-hoc loading) 
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								6e57ab3741 
								
							 
						 
						
							
							
								
								textline_contours_postprocessing: do not catch arbitrary exceptions  
							
							
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								fe603188f4 
								
							 
						 
						
							
							
								
								avoid unnecessary 3-channel conversions  
							
							
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								155b8f68b8 
								
							 
						 
						
							
							
								
								matching deskewed text region contours with predicted: improve  
							
							... 
							
							
							
							- avoid duplicate and missing mappings by using a different approach:
  instead of just minimising the center distance for the N contours
  that we expect,
  1. get all N:M distances
  2. iterate over them from small to large
  3. continue adding correspondences until both every original contour
     and every deskewed contour have at least one match
  4. where one original matches multiple deskewed contours,
     join the latter polygons to map as single contour
  5. where one deskewed contour matches multiple originals,
     split the former by intersecting with each of the latter
     (after bringing them into the same coordinate space),
     so ultimately only the respective match gets assigned 
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								0e00d7868b 
								
							 
						 
						
							
							
								
								matching deskewed text region contours with predicted: improve  
							
							... 
							
							
							
							- apply same min-area filter to deskewed contours as to original ones 
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								0f33c21eb3 
								
							 
						 
						
							
							
								
								matching deskewed text region contours with predicted: improve  
							
							... 
							
							
							
							- when matching undeskewed and new contours, do not just
  pick the closest centers, respectively, but also of similar
  size (by making the contour area the 3rd dimension of the
  vector norm in the distance calculation) 
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								73e5a1def8 
								
							 
						 
						
							
							
								
								matching deskewed text region contours with predicted: simplify  
							
							... 
							
							
							
							- (no need for argmax if already sorted) 
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								d774a23daa 
								
							 
						 
						
							
							
								
								matching deskewed text region contours with predicted: simplify  
							
							... 
							
							
							
							- avoid loops in favour of array processing
- improve readability and identifiers 
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								29b4527bde 
								
							 
						 
						
							
							
								
								do_order_of_regions: simplify  
							
							... 
							
							
							
							- remove duplicate code via inline def for the try-catch 
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								e674ea08f3 
								
							 
						 
						
							
							
								
								do_order_of_regions: drop redundant no/full_layout  
							
							... 
							
							
							
							(`_no_full_layout` is the same copied code as `_full_layout`;
 the latter runs just the same if passed an empty list for headings) 
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								e9bb62bd86 
								
							 
						 
						
							
							
								
								do_order_of_regions: simplify  
							
							... 
							
							
							
							- avoid loops in favour of array processing 
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								7387f5a929 
								
							 
						 
						
							
							
								
								do_order_of_regions: improve box matching, simplify  
							
							... 
							
							
							
							- when searching for boxes matching contour, be more precise:
  - avoid heuristic rules ("xmin + 80 within xrange") in favour
    of exact criteria (contour properly contained in box)
  - for fallback criterion (nearest centers), also require
    proper containment of center in box
- `order_of_regions`: remove (now) unnecessary (and insufficient)
  workaround for missing indexes (if boxes are not covering contours
  exactly) 
							
						 
						
							2025-10-09 20:14:11 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								4950e6bd78 
								
							 
						 
						
							
							
								
								order_of_regions: simplify  
							
							... 
							
							
							
							- use new `find_center_of_contours`
- avoid unused calculations
- avoid loops in favour of array processing 
							
						 
						
							2025-10-09 20:14:10 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								a1c8fd4467 
								
							 
						 
						
							
							
								
								do_order_of_regions / order_of_regions: simplify  
							
							... 
							
							
							
							- array-convert only once (before returning from `order_of_regions`)
- avoid passing `matrix_of_orders` unnecessarily between
  `order_of_regions` and `order_and_id_of_texts` 
							
						 
						
							2025-10-09 20:14:10 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								415b2cbad8 
								
							 
						 
						
							
							
								
								eynollah, drop_capitals: simplify  
							
							... 
							
							
							
							- use new `find_center_of_contours` 
							
						 
						
							2025-10-09 20:14:10 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								3f3353ec3a 
								
							 
						 
						
							
							
								
								do_order_of_regions: simplify  
							
							... 
							
							
							
							- avoid loops in favour of array processing 
							
						 
						
							2025-10-09 20:14:10 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								8c3d5eb0eb 
								
							 
						 
						
							
							
								
								separate_marginals_to_left_and_right_and_order_from_top_to_down: simplify  
							
							... 
							
							
							
							- use new `find_center_of_contours`
- avoid loops in favour of array processing
- avoid repeated sorting 
							
						 
						
							2025-10-09 20:14:10 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									kba 
								
							 
						 
						
							
							
							
							
								
							
							
								8215814a3f 
								
							 
						 
						
							
							
								
								Merge branch 'changelog-v0.5.0'  
							
							
							
						 
						
							2025-10-09 14:03:45 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									kba 
								
							 
						 
						
							
							
							
							
								
							
							
								4ffe6190d2 
								
							 
						 
						
							
							
								
								📝  changelog  
							
							
							
						 
						
							2025-10-09 14:03:26 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									vahidrezanezhad 
								
							 
						 
						
							
							
							
							
								
							
							
								8869c20c33 
								
							 
						 
						
							
							
								
								updating CHANGELOG for v0.5.0  
							
							
							
						 
						
							2025-10-09 13:54:29 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								81827c2942 
								
							 
						 
						
							
							
								
								filter_contours_inside_a_bigger_one: simplify  
							
							... 
							
							
							
							- use new `find_center_of_contours`
- avoid loops in favour of array processing
- use sets instead of `np.unique` and `np.delete` instead of list.pop 
							
						 
						
							2025-10-06 13:32:34 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Sachunsky 
								
							 
						 
						
							
							
							
							
								
							
							
								0b9d4901a6 
								
							 
						 
						
							
							
								
								contour features: avoid unused calculations, simplify, add shortcuts  
							
							... 
							
							
							
							- new function: `find_center_of_contours`
- simplified: `find_(new_)features_of_contours` 
							
						 
						
							2025-10-02 20:51:03 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									kba 
								
							 
						 
						
							
							
							
							
								
							
							
								8a9b4f8f55 
								
							 
						 
						
							
							
								
								remove commented-out requirement for tf == 2.12.1, rely on same version as in eynollah proper  
							
							
							
						 
						
							2025-10-02 12:16:26 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									kba 
								
							 
						 
						
							
							
							
							
								
							
							
								f60e0543ab 
								
							 
						 
						
							
							
								
								training: update docs  
							
							
							
						 
						
							2025-10-01 19:16:58 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									kba 
								
							 
						 
						
							
							
							
							
								
							
							
								1c043c586a 
								
							 
						 
						
							
							
								
								eynollah-training: all training CLI into single click group  
							
							
							
						 
						
							2025-10-01 19:16:45 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									kba 
								
							 
						 
						
							
							
							
							
								
							
							
								690d47444c 
								
							 
						 
						
							
							
								
								make relative wildcard imports explicit  
							
							
							
						 
						
							2025-10-01 18:43:20 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									kba 
								
							 
						 
						
							
							
							
							
								
							
							
								2baf42e878 
								
							 
						 
						
							
							
								
								organize imports, use relative imports  
							
							
							
						 
						
							2025-10-01 18:15:54 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									kba 
								
							 
						 
						
							
							
							
							
								
							
							
								4f5cdf3140 
								
							 
						 
						
							
							
								
								move training scripts to src/eynollah/training  
							
							
							
						 
						
							2025-10-01 18:12:45 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									kba 
								
							 
						 
						
							
							
							
							
								
							
							
								f0ef2b5db2 
								
							 
						 
						
							
							
								
								remove unused imports  
							
							
							
						 
						
							2025-10-01 18:10:13 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									kba 
								
							 
						 
						
							
							
							
							
								
							
							
								95bb5908bb 
								
							 
						 
						
							
							
								
								Merge branch 'integrate-training-from-sbb_pixelwise_segmentation' of  https://github.com/qurator-spk/eynollah  into integrate-training-from-sbb_pixelwise_segmentation  
							
							
							
						 
						
							2025-10-01 18:02:09 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									kba 
								
							 
						 
						
							
							
							
							
								
							
							
								48266b1ee0 
								
							 
						 
						
							
							
								
								make training dependencies optional-dependencies of eynollah  
							
							... 
							
							
							
							i.e. `pip install "eynollah[training]"` will install the requirements for training 
							
						 
						
							2025-10-01 18:01:25 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									kba 
								
							 
						 
						
							
							
							
							
								
							
							
								733af1e9a7 
								
							 
						 
						
							
							
								
								📝  update train/README.md, align with docs/train.md  
							
							
							
						 
						
							2025-10-01 17:43:32 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									vahidrezanezhad 
								
							 
						 
						
							
							
							
							
								
							
							
								5725e4fd1f 
								
							 
						 
						
							
							
								
								-Continue processing when num_col is None but textregions exist. -Convert marginal-only  to main body if no main body is present. -Reset deskew angle to 0 when text region density (textregion area to page area) < 0.3 and angle > 45°.  
							
							
							
						 
						
							2025-10-01 15:58:03 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									cneud 
								
							 
						 
						
							
							
							
							
								
							
							
								4514d417a7 
								
							 
						 
						
							
							
								
								force GH markdown code block in list  
							
							
							
						 
						
							2025-10-01 01:16:25 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									cneud 
								
							 
						 
						
							
							
							
							
								
							
							
								e027bc038e 
								
							 
						 
						
							
							
								
								Update README.md  
							
							
							
						 
						
							2025-10-01 01:05:15 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									cneud 
								
							 
						 
						
							
							
							
							
								
							
							
								91d2a74ac9 
								
							 
						 
						
							
							
								
								remove redundant parentheses  
							
							
							
						 
						
							2025-10-01 00:38:01 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									cneud 
								
							 
						 
						
							
							
							
							
								
							
							
								f2f93e0251 
								
							 
						 
						
							
							
								
								list literal is faster than using list constructor to create a new list  
							
							
							
						 
						
							2025-10-01 00:26:27 +02:00