Commit graph

  • 584cde7eb8 updating CHANGELOG for v0.5.0 updating_CHANGELOG_v0.5.0 vahidrezanezhad 2025-10-06 14:53:47 +02:00
  • 353dc8c424
    Merge a48e52c00e into 5725e4fd1f Konstantin Baierer 2025-10-04 15:19:18 +05:30
  • 331fd9f59a
    Merge 42a3cc2335 into 5725e4fd1f Robert Sachunsky 2025-10-02 20:30:09 +02:00
  • db3f8c6db0
    Merge f60e0543ab into 8a9b4f8f55 Konstantin Baierer 2025-10-02 12:17:04 +02:00
  • f9274990bf
    Merge 8a9b4f8f55 into 5725e4fd1f Konstantin Baierer 2025-10-02 12:16:32 +02:00
  • 8a9b4f8f55 remove commented-out requirement for tf == 2.12.1, rely on same version as in eynollah proper integrate-training-from-sbb_pixelwise_segmentation kba 2025-10-02 12:16:26 +02:00
  • 54b8931f95
    Merge 96eb1c11e6 into 5725e4fd1f Konstantin Baierer 2025-10-01 20:30:37 +02:00
  • 96eb1c11e6 Merge remote-tracking branch 'bertsky/loky-with-shm-for-175-rebuilt' into prepare-v0.6.0 prepare-v0.6.0 kba 2025-10-01 20:27:56 +02:00
  • f60e0543ab training: update docs training-installation kba 2025-10-01 19:16:58 +02:00
  • 1c043c586a eynollah-training: all training CLI into single click group kba 2025-10-01 18:52:11 +02:00
  • 690d47444c make relative wildcard imports explicit kba 2025-10-01 18:36:28 +02:00
  • 2baf42e878 organize imports, use relative imports kba 2025-10-01 18:15:54 +02:00
  • 4f5cdf3140 move training scripts to src/eynollah/training kba 2025-10-01 18:12:45 +02:00
  • f0ef2b5db2 remove unused imports kba 2025-10-01 18:10:13 +02:00
  • 95bb5908bb Merge branch 'integrate-training-from-sbb_pixelwise_segmentation' of https://github.com/qurator-spk/eynollah into integrate-training-from-sbb_pixelwise_segmentation kba 2025-10-01 18:02:09 +02:00
  • 48266b1ee0 make training dependencies optional-dependencies of eynollah kba 2025-10-01 18:01:21 +02:00
  • 733af1e9a7 📝 update train/README.md, align with docs/train.md kba 2025-10-01 17:43:32 +02:00
  • 5725e4fd1f -Continue processing when num_col is None but textregions exist. -Convert marginal-only to main body if no main body is present. -Reset deskew angle to 0 when text region density (textregion area to page area) < 0.3 and angle > 45°. main vahidrezanezhad 2025-10-01 15:58:03 +02:00
  • 4514d417a7 force GH markdown code block in list cneud 2025-10-01 01:16:25 +02:00
  • e027bc038e Update README.md cneud 2025-10-01 01:05:15 +02:00
  • 91d2a74ac9 remove redundant parentheses cneud 2025-10-01 00:38:01 +02:00
  • f2f93e0251 list literal is faster than using list constructor to create a new list cneud 2025-10-01 00:26:27 +02:00
  • 70af00182b mutable defaults are the source of all evil cneud 2025-10-01 00:20:18 +02:00
  • 1d0616eb69 comparisons to None should not use the equality operators cneud 2025-10-01 00:15:11 +02:00
  • 9ce127eb51 remove unnecessary backslash cneud 2025-10-01 00:04:53 +02:00
  • 558867eb24 fix typo cneud 2025-10-01 00:04:07 +02:00
  • 59669d10e1
    Merge 3aa7ad04fa into a6f0af07d1 Robert Sachunsky 2025-09-30 23:22:33 +02:00
  • 3aa7ad04fa 📝 update changelog Robert Sachunsky 2025-09-30 23:14:52 +02:00
  • f0de1adabf rm loky dependency Robert Sachunsky 2025-09-30 23:12:18 +02:00
  • 7daec392b9 Dockerfile: fix up CUDA installation for mixed TF/Torch Robert Sachunsky 2025-09-30 22:10:45 +02:00
  • ad129ed46c CI: remove OS from model cache keys Robert Sachunsky 2025-09-30 22:05:53 +02:00
  • c86e59f481 CI: update model key, split up cache restore/save Robert Sachunsky 2025-09-30 22:03:46 +02:00
  • a3d8197930 makefile: update model URL Robert Sachunsky 2025-09-30 21:50:21 +02:00
  • 61b20cc83d tests: switch from subtests to parametrize, use --isolate everywhere to free CUDA memory in between Robert Sachunsky 2025-09-30 19:20:35 +02:00
  • 375e0263d4 CNN-RNN OCR model: switch to 20250930 version (compatible with TF 2.12 on CPU as well) Robert Sachunsky 2025-09-30 19:16:50 +02:00
  • b21051db21 ProcessPoolExecutor: shutdown during del() instead of atexit() Robert Sachunsky 2025-09-30 19:16:00 +02:00
  • 08c8c26028 indent extremely long lines Robert Sachunsky 2025-09-30 03:52:19 +02:00
  • f857ee7b51 simplify Robert Sachunsky 2025-09-19 02:12:18 +02:00
  • c0137c29ad try to fix the failed outsourcing of utils_ocr Robert Sachunsky 2025-09-30 02:23:43 +02:00
  • 13f85b0d5c Merge branch 'main' into loky-with-shm-for-175-rebuilt Robert Sachunsky 2025-09-30 02:07:20 +02:00
  • 070dafca75 remove duplicate LICENSE cneud 2025-09-29 22:17:27 +02:00
  • 53c1ca11fc Update README.md cneud 2025-09-29 22:15:17 +02:00
  • 0d695a2e2d Install nvidia-cudnn11==8.6.0.163 as workaround to NHCW/NCHW issue, ht @bertsky workaround-nhcw-nchw-issue kba 2025-09-29 19:18:16 +02:00
  • 758602403e replace loky with concurrent.futures.ProcessPoolExecutor (faster) Robert Sachunsky 2025-09-21 21:35:22 +02:00
  • 0366707136 get_smallest_skew: do not pass logger Robert Sachunsky 2025-09-20 00:57:00 +02:00
  • b94c96fcbb find_num_col: exit early if empty (avoiding exceptions) Robert Sachunsky 2025-09-20 00:56:33 +02:00
  • 04c3d7dd1b get_smallest_skew: avoid shm if no ProcessPoolExecutor is passed Robert Sachunsky 2025-09-18 20:07:54 +02:00
  • 0662ece536 do_work_of_slopes*: use shm also in non-light mode(s) Robert Sachunsky 2025-09-04 15:18:55 +02:00
  • 31f240c3b8 do_image_rotation, do_work_of_slopes_new_curved: pass arrays via shared memory Robert Sachunsky 2025-09-02 15:04:04 +02:00
  • 8be2c79771 Revert "deskewing with faster multiprocessing" Robert Sachunsky 2025-09-03 09:01:18 +02:00
  • abf5c0f845 get_smallest_skew: when shifting search range of rotation angle, compare resulting (maximum) variances instead of blindly assuming the new range is better Robert Sachunsky 2025-09-02 15:01:52 +02:00
  • dc0caad512 writer: use @type='heading' instead of 'header' Robert Sachunsky 2025-08-26 21:07:50 +02:00
  • f458e3ece0 writer: SeparatorRegion needs SeparatorRegionType (not ImageRegionType) Robert Sachunsky 2025-08-26 21:07:18 +02:00
  • 4337d62985 contours: rename 'pixel' → 'label' for clarity Robert Sachunsky 2025-08-26 21:06:36 +02:00
  • 5b16c2fc00 avoid pulling unused 'image_page_rotated' through functions Robert Sachunsky 2025-08-26 21:05:40 +02:00
  • 5bff2d156a use box2rect instead of crop_image_inside_box when no image needed Robert Sachunsky 2025-08-26 21:02:43 +02:00
  • 9b5182c1c0 utils: introduce box2rect and box2slice Robert Sachunsky 2025-08-26 21:00:33 +02:00
  • bca2ae3d78 get_marginals: exit early if no peaks found to avoid spurious overlap mask Robert Sachunsky 2025-08-29 12:37:44 +02:00
  • 235539a350 filter_contours_without_textline_inside: avoid removing from identical lists twice Robert Sachunsky 2025-08-29 12:19:37 +02:00
  • 11e143afee polygon2contour: avoid overflow Robert Sachunsky 2025-08-29 12:16:56 +02:00
  • 7a9e8256ee increase dilatation: textregions/lines (5→6), seplines (0→1) Robert Sachunsky 2025-08-21 13:00:31 +02:00
  • f3faa29528 refactor shapely converisons into contour2polygon / polygon2contour, also handle heterogeneous geometries Robert Sachunsky 2025-08-21 12:59:03 +02:00
  • 0650274ffa move dilate_*_contours to .utils.contour, rename dilate_textregions_contours_textline_version → dilate_textline_contours Robert Sachunsky 2025-08-21 01:42:46 +02:00
  • a433c73628 filter_contours_area_of_image*: also ensure validity here Robert Sachunsky 2025-08-21 01:33:16 +02:00
  • 17bcf1af71 rename *lines_xml → *seplines for clarity Robert Sachunsky 2025-08-21 01:32:32 +02:00
  • e730725da3 check_any_text_region_in_model_one_is_main_or_header_light: return original instead of resampled contours Robert Sachunsky 2025-08-21 01:05:15 +02:00
  • 7b51fd6624 avoid creating invalid polygons via rounding Robert Sachunsky 2025-08-21 01:03:46 +02:00
  • 41cc38c51a get_textregion_contours_in_org_image_light: no back rotation, drop slope_first (always 0) Robert Sachunsky 2025-08-20 14:28:14 +02:00
  • afba70c920 separate_lines/do_work_of_slopes: skip if crop is empty Robert Sachunsky 2025-08-19 22:56:36 +02:00
  • 66b2bce8b9 return_boxes_of_images_by_order_of_reading_new: log any exceptions Robert Sachunsky 2025-09-19 12:19:58 +02:00
  • b48c41e68f return_boxes_of_images_by_order_of_reading_new: simplify, avoid changing dtype during np.append Robert Sachunsky 2025-08-19 20:09:09 +02:00
  • 09ece86f0d dilate_textregions_contours: simplify (via shapely's Polygon.buffer()), ensure validity Robert Sachunsky 2025-08-19 11:58:45 +02:00
  • 9d8b858dfc remove docs/eynollah-layout, superseded by docs/model.md and docs/usage.md kba 2025-09-29 16:01:29 +02:00
  • 2bcd20ebc7 reference the now-merged training tools in README.md kba 2025-09-29 15:21:42 +02:00
  • ce02a3553b 🔥 remove obsolete versions of the training document kba 2025-09-29 15:18:21 +02:00
  • 6d379782ab 📝 align former upstream train.md with wiki train.md syntactically kba 2025-09-29 15:11:02 +02:00
  • 52a7c93319 add documentation on training eynollah from sbb_pixelwise_segmentation wiki kba 2025-09-29 15:05:05 +02:00
  • ea05461dfe add documentation on eynollah layout from eynollah wiki kba 2025-09-29 15:04:46 +02:00
  • 56c4b7af88 📝 align pre-merge docs/train.md with former upstream train.md syntactically kba 2025-09-29 14:59:41 +02:00
  • 3b9548d0bd Merge sbb_pixelwise_segmentation training code into eynollah kba 2025-09-29 14:44:31 +02:00
  • a48e52c00e 📝 extend changelog for v0.5.0 changelog-v0.5.0 Robert Sachunsky 2025-09-29 13:49:18 +02:00
  • a6f0af07d1
    Merge pull request #185 from bertsky/patch-4 Konstantin Baierer 2025-09-29 10:44:27 +02:00
  • 92c1e824dc
    CD: master is now main Robert Sachunsky 2025-09-26 23:05:47 +02:00
  • 6ea6a62801 📝 v0.5.0 v0.5.0 kba 2025-09-26 16:23:46 +02:00
  • 882e242946
    Merge pull request #178 from qurator-spk/prepare-release-v0.5.0 Konstantin Baierer 2025-09-26 16:21:09 +02:00
  • 37e64b4e45 📝 changelog prepare-release-v0.5.0 kba 2025-09-26 16:19:04 +02:00
  • 3123add815 📝 update README kba 2025-09-26 15:07:32 +02:00
  • 830cc2c30a comment out the offending test outright kba 2025-09-26 14:37:04 +02:00
  • eb8d4573a8 tests: also disable ...ocr_directory test kba 2025-09-26 13:57:08 +02:00
  • 42fb452a7e disable the -doit OCR test kba 2025-09-26 12:54:29 +02:00
  • 480daa4c7c test_run: make ocr -doit work (add truetype file) Robert Sachunsky 2025-09-25 22:25:05 +02:00
  • 4c6405713a ci: ocr models kba 2025-09-25 22:19:36 +02:00
  • b4d460ca79 makefile forgot the OCR models adapt-ocrd kba 2025-09-25 22:16:38 +02:00
  • f3f5426597 Merge branch 'adapt-ocrd' of https://github.com/qurator-spk/eynollah into adapt-ocrd kba 2025-09-25 21:47:27 +02:00
  • 0bb1fb1a05 tests: adapt to layout/ocr model split kba 2025-09-25 21:47:15 +02:00
  • 2ec773128b Merge branch 'adapt-ocrd' of https://github.com/qurator-spk/eynollah into adapt-ocrd kba 2025-09-25 21:40:48 +02:00
  • f37d80c188 Merge branch 'adapt-ocrd' of https://github.com/qurator-spk/eynollah into adapt-ocrd kba 2025-09-25 21:39:55 +02:00
  • 57ee1cdc72 Merge remote-tracking branch 'bertsky/mbro_dead_code-plus-fixes-plus-tests' into adapt-ocrd kba 2025-09-25 21:39:36 +02:00
  • 5c0ab509c4 CI: Update model name kba 2025-09-25 21:17:32 +02:00
  • 9303ded11f ocrd-tool.json: use models_layout instead of eynollah_layouts for consistency kba 2025-09-25 21:12:52 +02:00