Commit graph

  • d2654af6b8
    Merge cbb3be0e01 into c9f6aa35b2 Robert Sachunsky 2026-04-30 14:20:54 +00:00
  • cbb3be0e01 add diagnostic plotting for prediction masking (commented) Robert Sachunsky 2026-04-30 00:49:18 +02:00
  • 33c055389d bold run_single refactoring (predict segmentation on cropped img)… Robert Sachunsky 2026-04-29 01:52:29 +02:00
  • 7e7cc6a801 do_order_of_regions(): use region mask instead of textline mask… Robert Sachunsky 2026-04-29 14:05:06 +02:00
  • 63df9be4db find_number_of_columns_in_document(): pass in (reuse) masks Robert Sachunsky 2026-04-29 14:03:26 +02:00
  • da9e00cfe5 consistently handle textline mask with respect to drop-capital mask… Robert Sachunsky 2026-04-27 13:04:09 +02:00
  • 2641171fb1 return_boxes_...order_of_reading...: avoid negative slices… Robert Sachunsky 2026-04-27 00:42:31 +02:00
  • 6a92f0d49c make get_deskewed_masks() unconditional, call only when needed Robert Sachunsky 2026-04-27 00:38:22 +02:00
  • 52eb4c9a0a move label definition and deskewing cancellation up Robert Sachunsky 2026-04-27 00:33:07 +02:00
  • fa882e1dbe move run_boxes_order() call to RO section of run_single() Robert Sachunsky 2026-04-27 00:22:15 +02:00
  • d88bd485ff get_slopes*(): does not need passing boxes separately Robert Sachunsky 2026-04-27 00:14:28 +02:00
  • 869646cbf5 get_full_layout() does not need the textline mask Robert Sachunsky 2026-04-27 00:07:27 +02:00
  • b5bc161a4c extract_page(): get external contours instead of indiscriminate tree Robert Sachunsky 2026-04-27 00:02:48 +02:00
  • 287bebde0d get_marginals(): fix height factor for mask resizing Robert Sachunsky 2026-04-30 15:44:46 +02:00
  • a031d590b8 get_marginals(): do allow both left and right point (f/u 4bdea39)… Robert Sachunsky 2026-04-30 15:32:15 +02:00
  • 9571ce3474 get_marginals(): reduce indentation Robert Sachunsky 2026-04-27 00:02:03 +02:00
  • c18deb0722 drop relabelling all marginalia to main if no main (now unnecessary) Robert Sachunsky 2026-04-27 00:19:33 +02:00
  • 1f6db34adf run/get_marginals(): simplify and speed up… Robert Sachunsky 2026-04-26 23:51:05 +02:00
  • 45a43f7e5e get_marginals(): fixup point_right fallback Robert Sachunsky 2026-04-26 21:52:11 +02:00
  • 0b8d8a7330 docker: core to 3.12.3 update-cd kba 2026-04-29 17:20:36 +02:00
  • ad5f22726e 🔥 require transformers >= 5 kba 2026-04-29 17:06:13 +02:00
  • f58189d5f4 ci: tag eynollah docker image with git tag version if possible, else latest kba 2026-04-29 16:28:09 +02:00
  • 4e9c0010a3 . ocrd-wrappers kba 2026-04-28 15:48:03 +02:00
  • 243cde804e . kba 2026-04-28 15:45:47 +02:00
  • 7f8bfc9945 . kba 2026-04-28 15:45:03 +02:00
  • d2aae35446 . kba 2026-04-28 15:39:53 +02:00
  • d705f855f1 . kba 2026-04-28 15:36:50 +02:00
  • abdcb1a1f9 . kba 2026-04-28 15:33:57 +02:00
  • 69280187c5 . kba 2026-04-28 15:29:48 +02:00
  • 1ba82ede88 . kba 2026-04-28 15:25:36 +02:00
  • be1296150c . kba 2026-04-28 15:07:33 +02:00
  • 4899a8fa17 . kba 2026-04-28 14:59:01 +02:00
  • 29ef9f09dc . kba 2026-04-28 14:53:13 +02:00
  • 511222704e . kba 2026-04-28 14:51:23 +02:00
  • 5c6e075975 Merge branch 'ocrd-wrappers' of https://github.com/qurator-spk/eynollah into ocrd-wrappers kba 2026-04-28 14:31:24 +02:00
  • 1ae862cf52 . kba 2026-04-28 14:31:15 +02:00
  • a9e12a63da wp kba 2026-04-28 12:18:29 +02:00
  • 957dc66e7c organize ocrd-eynollah-segment like ocrd-sbb-binarize kba 2026-04-27 18:50:54 +02:00
  • 68ceeec764 get_marginals(): improve contour assignment… Robert Sachunsky 2026-04-25 03:06:34 +02:00
  • 6d55d0b87b get_marginals(): improve peak point threshold criterion… Robert Sachunsky 2026-04-25 02:23:16 +02:00
  • 4bdea39c98 get_marginals(): improve left/right point selection… Robert Sachunsky 2026-04-25 01:59:48 +02:00
  • 70bf461c30 get_marginals(): simplify, improve… Robert Sachunsky 2026-04-25 01:52:21 +02:00
  • bb092364af get_slopes_and_deskew_new_light2: estimate slopes here, too… Robert Sachunsky 2026-04-24 15:22:42 +02:00
  • c478c03db4 avoid deskewed contour matching w/ -romb Robert Sachunsky 2026-04-24 13:41:22 +02:00
  • 998ee2ecee get_textlines_of_a_textregion_sorted: simplify Robert Sachunsky 2026-04-23 23:45:27 +02:00
  • be61875d6e get_textlines_of_a_textregion_sorted: w-h instead of w/h test Robert Sachunsky 2026-04-23 22:40:01 +02:00
  • 9723dfeb73 writer: also annotate col-classifier result… Robert Sachunsky 2026-04-23 23:46:23 +02:00
  • e3720d6623 writer: also annotate page-level deskewing result Robert Sachunsky 2026-04-23 21:07:31 +02:00
  • 2da718f76f writer, do_work_of_slopes*: drop passing bboxes around Robert Sachunsky 2026-04-23 21:05:20 +02:00
  • b792324c5b do_work_of_slopes_new_curved (if angle >45°): simplify, improve… Robert Sachunsky 2026-04-23 20:49:25 +02:00
  • dbdb6d0d53 rotate: rm unused failed variants, add new rotate_image_enlarge Robert Sachunsky 2026-04-23 20:46:05 +02:00
  • d257869d83 do_work_of_slopes_new_curved (if angle <45°): simplify, improve… Robert Sachunsky 2026-04-23 20:38:45 +02:00
  • 0dce1f24d2 do_work_of_slopes_new_curved: improve deskewing… Robert Sachunsky 2026-04-23 20:08:40 +02:00
  • 97d9b0ea50 small_textlines_to_parent_adherence2: simplify, improve… Robert Sachunsky 2026-04-23 20:21:44 +02:00
  • 0735cb9d2b filter_contours_without_textline_inside: also filter slopes Robert Sachunsky 2026-04-23 20:05:04 +02:00
  • fa8340dbb4 -cl: also filter textregions without textlines here Robert Sachunsky 2026-04-21 21:21:32 +02:00
  • 4a6d3968f9 major run_single refactoring… Robert Sachunsky 2026-04-21 21:12:29 +02:00
  • dfb40f4a49 hsep fusion: avoid zero division if zero overlap Robert Sachunsky 2026-04-21 18:51:57 +02:00
  • b63e073121 skip deskewing if no textlines Robert Sachunsky 2026-04-21 18:51:20 +02:00
  • 7b5aa2a1f6 more run_single refactoring… Robert Sachunsky 2026-04-21 17:36:05 +02:00
  • a2f43b8d69 simplify, add confidence for headings as well Robert Sachunsky 2026-04-21 01:06:41 +02:00
  • 264b00f8ab predictor: cache models' input shape instead of output shape Robert Sachunsky 2026-04-20 23:37:54 +02:00
  • 829256df91 do_prediction*: remove autosized variants, simplify Robert Sachunsky 2026-04-20 17:17:43 +02:00
  • de65a55a04 mbro: simplify, add drop-caps as well, reduce batch size… Robert Sachunsky 2026-04-20 17:10:24 +02:00
  • 0dfc9d911f run_boxes_no_full_layout: also map to fl labels here… Robert Sachunsky 2026-04-20 17:09:00 +02:00
  • 0015f2675b with -slro, also extract and apply page (Border) mask Robert Sachunsky 2026-04-20 16:58:16 +02:00
  • 569b96d1a9 find_number_of_columns_in_document: pass correct label_seps… Robert Sachunsky 2026-04-20 16:55:34 +02:00
  • f28a9c9e0b add confidence for all region types, prepare for textlines… Robert Sachunsky 2026-04-18 04:53:03 +02:00
  • 1164b97917 extract_text_regions_new: fix heading thresholding… Robert Sachunsky 2026-04-18 04:20:25 +02:00
  • 20dc5c3188 also cover drop-capital in (heuristic) reading order Robert Sachunsky 2026-04-17 03:41:04 +02:00
  • 92e94753c7 decoding of dropcaps in -fl: ensure consistency w/ early layout… Robert Sachunsky 2026-04-17 03:34:38 +02:00
  • 29b42fdfaa decoding of drop-capitals in full layout: also allow replacing img… Robert Sachunsky 2026-04-16 18:04:01 +02:00
  • 6e0aed35f4 run_boxes_*: simplify, document class label mappings, start using identifier constants instead of literals for labels Robert Sachunsky 2026-04-16 05:22:52 +02:00
  • f29e876a7c return_boxes_of_images_by_order_of_reading_new: sep label differs w/o -fl… Robert Sachunsky 2026-04-16 05:16:23 +02:00
  • f5f2435a38 run_marginals: drop unnecessarily passing textline_mask, mask_seps, mask_images Robert Sachunsky 2026-04-16 05:13:06 +02:00
  • 9309586712 split_textregion_main_vs_header → split_textregion_main_vs_head… Robert Sachunsky 2026-04-16 05:07:22 +02:00
  • 0f82b568ba do_prediction_new_concept: aggregate confidence for all classes… Robert Sachunsky 2026-04-16 05:02:20 +02:00
  • 5a27e46b22 keep seps over artificial boundaries to improve col separation… Robert Sachunsky 2026-04-16 04:56:38 +02:00
  • 9d6ff65e1d get_tables_from_model: utilise artificial bound thresholding… Robert Sachunsky 2026-04-16 04:49:07 +02:00
  • 12b1271487 layout cli: add option --halt-fail Robert Sachunsky 2026-04-13 01:19:47 +02:00
  • 56e6deb02c predictor: jit-compile and precompile (non-autosized) models Robert Sachunsky 2026-04-13 01:17:04 +02:00
  • 01c54eb2ef reduce inference batch sizes to accommodate 8 GB VRAM Robert Sachunsky 2026-04-13 01:15:25 +02:00
  • f44c39667e predictor: disable rebatching (until we have flexible batch sizes) Robert Sachunsky 2026-04-13 01:14:49 +02:00
  • 219954d15b predictor: use predict_on_batch instead of predict Robert Sachunsky 2026-04-13 01:14:18 +02:00
  • 0d21b62aee disable autosized prediction entirely (also for _patched)… Robert Sachunsky 2026-04-10 18:23:10 +02:00
  • ccef63f08b get_regions: always use resized/enhanced image… Robert Sachunsky 2026-04-10 18:17:51 +02:00
  • 04da66ed73 training: plot only ~ 1000 training and ~ 100 validation images Robert Sachunsky 2026-03-30 13:34:05 +02:00
  • a8556f5210 run: sort parallel log messages by file name instead of prefixing… Robert Sachunsky 2026-03-30 13:18:40 +02:00
  • 62140e4159
    Merge 9858221724 into c9f6aa35b2 Konstantin Baierer 2026-03-27 08:37:01 +00:00
  • 9858221724 comment out printing file names while training cnn-rnn ocr model integrating_trocr_and_torch_ensembling_and_updating_characters_list vahidrezanezhad 2026-03-27 09:36:55 +01:00
  • 8333158ecc BUG fixing for cnn-rnn ocr model training if augmentation is false vahidrezanezhad 2026-03-27 09:15:19 +01:00
  • 1756443605 fixup device sel Robert Sachunsky 2026-03-16 15:35:07 +01:00
  • bd495279e2
    Merge 42a3cc2335 into c9f6aa35b2 Robert Sachunsky 2026-03-16 11:32:07 +00:00
  • 6bbdcc39ef CLI/Eynollah.setup_models/ModelZoo.load_models: add device option/kwarg Robert Sachunsky 2026-03-15 04:54:04 +01:00
  • 67e9f84b54 do_prediction* for "col_classifier": pass array as float16 instead of float64 Robert Sachunsky 2026-03-15 03:20:39 +01:00
  • f54deff452 model_zoo/predictor: use one subprocess per model… Robert Sachunsky 2026-03-15 02:53:37 +01:00
  • c514bbc661 make switching between autosized and looped tiling easier Robert Sachunsky 2026-03-14 02:16:26 +01:00
  • 2f3b622cf5 predictor: rebatch tasks to increase CUDA throughput… Robert Sachunsky 2026-03-14 00:52:34 +01:00
  • b550725cc5 wrap_layout_model_patched: simplify shape calculation Robert Sachunsky 2026-03-14 00:51:22 +01:00
  • d6404dbbc2 do_prediction*: pass arrays as float16 instead of float64 to TF Robert Sachunsky 2026-03-14 00:49:26 +01:00