Commit graph

  • 511222704e . kba 2026-04-28 14:51:23 +02:00
  • 5c6e075975 Merge branch 'ocrd-wrappers' of https://github.com/qurator-spk/eynollah into ocrd-wrappers kba 2026-04-28 14:31:24 +02:00
  • 1ae862cf52 . kba 2026-04-28 14:31:15 +02:00
  • a9e12a63da wp kba 2026-04-28 12:18:29 +02:00
  • 957dc66e7c organize ocrd-eynollah-segment like ocrd-sbb-binarize kba 2026-04-27 18:50:54 +02:00
  • 68ceeec764 get_marginals(): improve contour assignment… Robert Sachunsky 2026-04-25 03:06:34 +02:00
  • 6d55d0b87b get_marginals(): improve peak point threshold criterion… Robert Sachunsky 2026-04-25 02:23:16 +02:00
  • 4bdea39c98 get_marginals(): improve left/right point selection… Robert Sachunsky 2026-04-25 01:59:48 +02:00
  • 70bf461c30 get_marginals(): simplify, improve… Robert Sachunsky 2026-04-25 01:52:21 +02:00
  • bb092364af get_slopes_and_deskew_new_light2: estimate slopes here, too… Robert Sachunsky 2026-04-24 15:22:42 +02:00
  • c478c03db4 avoid deskewed contour matching w/ -romb Robert Sachunsky 2026-04-24 13:41:22 +02:00
  • 998ee2ecee get_textlines_of_a_textregion_sorted: simplify Robert Sachunsky 2026-04-23 23:45:27 +02:00
  • be61875d6e get_textlines_of_a_textregion_sorted: w-h instead of w/h test Robert Sachunsky 2026-04-23 22:40:01 +02:00
  • 9723dfeb73 writer: also annotate col-classifier result… Robert Sachunsky 2026-04-23 23:46:23 +02:00
  • e3720d6623 writer: also annotate page-level deskewing result Robert Sachunsky 2026-04-23 21:07:31 +02:00
  • 2da718f76f writer, do_work_of_slopes*: drop passing bboxes around Robert Sachunsky 2026-04-23 21:05:20 +02:00
  • b792324c5b do_work_of_slopes_new_curved (if angle >45°): simplify, improve… Robert Sachunsky 2026-04-23 20:49:25 +02:00
  • dbdb6d0d53 rotate: rm unused failed variants, add new rotate_image_enlarge Robert Sachunsky 2026-04-23 20:46:05 +02:00
  • d257869d83 do_work_of_slopes_new_curved (if angle <45°): simplify, improve… Robert Sachunsky 2026-04-23 20:38:45 +02:00
  • 0dce1f24d2 do_work_of_slopes_new_curved: improve deskewing… Robert Sachunsky 2026-04-23 20:08:40 +02:00
  • 97d9b0ea50 small_textlines_to_parent_adherence2: simplify, improve… Robert Sachunsky 2026-04-23 20:21:44 +02:00
  • 0735cb9d2b filter_contours_without_textline_inside: also filter slopes Robert Sachunsky 2026-04-23 20:05:04 +02:00
  • fa8340dbb4 -cl: also filter textregions without textlines here Robert Sachunsky 2026-04-21 21:21:32 +02:00
  • 4a6d3968f9 major run_single refactoring… Robert Sachunsky 2026-04-21 21:12:29 +02:00
  • dfb40f4a49 hsep fusion: avoid zero division if zero overlap Robert Sachunsky 2026-04-21 18:51:57 +02:00
  • b63e073121 skip deskewing if no textlines Robert Sachunsky 2026-04-21 18:51:20 +02:00
  • 7b5aa2a1f6 more run_single refactoring… Robert Sachunsky 2026-04-21 17:36:05 +02:00
  • a2f43b8d69 simplify, add confidence for headings as well Robert Sachunsky 2026-04-21 01:06:41 +02:00
  • 264b00f8ab predictor: cache models' input shape instead of output shape Robert Sachunsky 2026-04-20 23:37:54 +02:00
  • 829256df91 do_prediction*: remove autosized variants, simplify Robert Sachunsky 2026-04-20 17:17:43 +02:00
  • de65a55a04 mbro: simplify, add drop-caps as well, reduce batch size… Robert Sachunsky 2026-04-20 17:10:24 +02:00
  • 0dfc9d911f run_boxes_no_full_layout: also map to fl labels here… Robert Sachunsky 2026-04-20 17:09:00 +02:00
  • 0015f2675b with -slro, also extract and apply page (Border) mask Robert Sachunsky 2026-04-20 16:58:16 +02:00
  • 569b96d1a9 find_number_of_columns_in_document: pass correct label_seps… Robert Sachunsky 2026-04-20 16:55:34 +02:00
  • f28a9c9e0b add confidence for all region types, prepare for textlines… Robert Sachunsky 2026-04-18 04:53:03 +02:00
  • 1164b97917 extract_text_regions_new: fix heading thresholding… Robert Sachunsky 2026-04-18 04:20:25 +02:00
  • 20dc5c3188 also cover drop-capital in (heuristic) reading order Robert Sachunsky 2026-04-17 03:41:04 +02:00
  • 92e94753c7 decoding of dropcaps in -fl: ensure consistency w/ early layout… Robert Sachunsky 2026-04-17 03:34:38 +02:00
  • 29b42fdfaa decoding of drop-capitals in full layout: also allow replacing img… Robert Sachunsky 2026-04-16 18:04:01 +02:00
  • 6e0aed35f4 run_boxes_*: simplify, document class label mappings, start using identifier constants instead of literals for labels Robert Sachunsky 2026-04-16 05:22:52 +02:00
  • f29e876a7c return_boxes_of_images_by_order_of_reading_new: sep label differs w/o -fl… Robert Sachunsky 2026-04-16 05:16:23 +02:00
  • f5f2435a38 run_marginals: drop unnecessarily passing textline_mask, mask_seps, mask_images Robert Sachunsky 2026-04-16 05:13:06 +02:00
  • 9309586712 split_textregion_main_vs_header → split_textregion_main_vs_head… Robert Sachunsky 2026-04-16 05:07:22 +02:00
  • 0f82b568ba do_prediction_new_concept: aggregate confidence for all classes… Robert Sachunsky 2026-04-16 05:02:20 +02:00
  • 5a27e46b22 keep seps over artificial boundaries to improve col separation… Robert Sachunsky 2026-04-16 04:56:38 +02:00
  • 9d6ff65e1d get_tables_from_model: utilise artificial bound thresholding… Robert Sachunsky 2026-04-16 04:49:07 +02:00
  • 12b1271487 layout cli: add option --halt-fail Robert Sachunsky 2026-04-13 01:19:47 +02:00
  • 56e6deb02c predictor: jit-compile and precompile (non-autosized) models Robert Sachunsky 2026-04-13 01:17:04 +02:00
  • 01c54eb2ef reduce inference batch sizes to accommodate 8 GB VRAM Robert Sachunsky 2026-04-13 01:15:25 +02:00
  • f44c39667e predictor: disable rebatching (until we have flexible batch sizes) Robert Sachunsky 2026-04-13 01:14:49 +02:00
  • 219954d15b predictor: use predict_on_batch instead of predict Robert Sachunsky 2026-04-13 01:14:18 +02:00
  • 0d21b62aee disable autosized prediction entirely (also for _patched)… Robert Sachunsky 2026-04-10 18:23:10 +02:00
  • ccef63f08b get_regions: always use resized/enhanced image… Robert Sachunsky 2026-04-10 18:17:51 +02:00
  • 04da66ed73 training: plot only ~ 1000 training and ~ 100 validation images Robert Sachunsky 2026-03-30 13:34:05 +02:00
  • a8556f5210 run: sort parallel log messages by file name instead of prefixing… Robert Sachunsky 2026-03-30 13:18:40 +02:00
  • 62140e4159
    Merge 9858221724 into c9f6aa35b2 Konstantin Baierer 2026-03-27 08:37:01 +00:00
  • 9858221724 comment out printing file names while training cnn-rnn ocr model integrating_trocr_and_torch_ensembling_and_updating_characters_list vahidrezanezhad 2026-03-27 09:36:55 +01:00
  • 8333158ecc BUG fixing for cnn-rnn ocr model training if augmentation is false vahidrezanezhad 2026-03-27 09:15:19 +01:00
  • 1756443605 fixup device sel Robert Sachunsky 2026-03-16 15:35:07 +01:00
  • bd495279e2
    Merge 42a3cc2335 into c9f6aa35b2 Robert Sachunsky 2026-03-16 11:32:07 +00:00
  • 6bbdcc39ef CLI/Eynollah.setup_models/ModelZoo.load_models: add device option/kwarg Robert Sachunsky 2026-03-15 04:54:04 +01:00
  • 67e9f84b54 do_prediction* for "col_classifier": pass array as float16 instead of float64 Robert Sachunsky 2026-03-15 03:20:39 +01:00
  • f54deff452 model_zoo/predictor: use one subprocess per model… Robert Sachunsky 2026-03-15 02:53:37 +01:00
  • c514bbc661 make switching between autosized and looped tiling easier Robert Sachunsky 2026-03-14 02:16:26 +01:00
  • 2f3b622cf5 predictor: rebatch tasks to increase CUDA throughput… Robert Sachunsky 2026-03-14 00:52:34 +01:00
  • b550725cc5 wrap_layout_model_patched: simplify shape calculation Robert Sachunsky 2026-03-14 00:51:22 +01:00
  • d6404dbbc2 do_prediction*: pass arrays as float16 instead of float64 to TF Robert Sachunsky 2026-03-14 00:49:26 +01:00
  • 135064a48e model_zoo: region model not used at runtime anymore - don't load Robert Sachunsky 2026-03-14 00:48:52 +01:00
  • ec08004fb0 run: add QueueListener to pool / QueueHandler to workers… Robert Sachunsky 2026-03-14 00:43:58 +01:00
  • b7aa1d24cc CLI: drop redundant negative option forms, add --num-jobs Robert Sachunsky 2026-03-13 18:22:25 +01:00
  • 576e120ba6 autosized prediction is only faster for _patched, not for _resized… Robert Sachunsky 2026-03-13 18:15:30 +01:00
  • 7499e3e7b8 textline inference thresholding was disabled during the merging step vahidrezanezhad 2026-03-13 17:48:27 +01:00
  • 6d55f297a5 run: use ProcessPoolExecutor for parallel run_single across pages… Robert Sachunsky 2026-03-13 10:15:51 +01:00
  • 96cfddf92d split_textregion_main_vs_header: avoid zero division Robert Sachunsky 2026-03-13 02:41:06 +01:00
  • 4e9b062b84 separate_marginals_to_left_and_right...: simplify Robert Sachunsky 2026-03-13 02:40:33 +01:00
  • ae0f194241 drop ProcessPoolExecutor for intra-page parallel subprocessing… Robert Sachunsky 2026-03-13 02:38:40 +01:00
  • becf031c65 refactor to remove data-dependency from all Eynollah methods… Robert Sachunsky 2026-03-13 01:44:39 +01:00
  • 800c55b826 predictor: fix spawn vs fork / parent vs child contexts Robert Sachunsky 2026-03-13 02:42:16 +01:00
  • 64281768a9 run_graphics_and_columns_light: fix double 1-off error… Robert Sachunsky 2026-03-12 10:18:14 +01:00
  • 46c5f52491 CLI: don't append /models_eynollah here (already in default_specs) Robert Sachunsky 2026-03-11 02:39:32 +01:00
  • 10214dfdda predictor: make sure all shared arrays get freed eventually Robert Sachunsky 2026-03-11 02:38:11 +01:00
  • cf5caa1eca predictor: fix termination for pytests… Robert Sachunsky 2026-03-11 02:34:29 +01:00
  • bb468bf68f predictor: mp.Value must come from spawn context, too Robert Sachunsky 2026-03-11 02:27:47 +01:00
  • 9f127a0783 introduce predictor subprocess for exclusive GPU processing… Robert Sachunsky 2026-03-07 03:54:16 +01:00
  • 6f4ec53f7e wrap_layout_model_resized/patched: compile call instead of predict Robert Sachunsky 2026-03-07 03:52:14 +01:00
  • 338c4a0edf wrap layout models for prediction (image resize or tiling) all in TF Robert Sachunsky 2026-03-07 03:33:44 +01:00
  • f33fd57da8 model_zoo: resolve path names coming in from caller (CLI) Robert Sachunsky 2026-03-05 00:45:24 +01:00
  • 41dccb216c use (generalized) do_prediction() instead of predict_enhancement() Robert Sachunsky 2026-03-04 23:49:11 +01:00
  • 341480e9a0 do_prediction: if img was too small for model, also upscale results Robert Sachunsky 2026-03-04 23:41:45 +01:00
  • 8ebbe65c17 textline_contours: remove unnecessary resize_image, simplify Robert Sachunsky 2026-03-04 15:13:34 +01:00
  • 3370a3aa85 do_prediction*: avoid 3-channel results, simplify further… Robert Sachunsky 2026-03-03 01:20:16 +01:00
  • f1d8257496 page alto label generation activated for textline vahidrezanezhad 2026-03-03 21:12:20 +01:00
  • 4b80e45d91 character list only needs be copied for cnn-rnn ocr model vahidrezanezhad 2026-03-03 13:20:22 +01:00
  • c9f6aa35b2
    fix license badge Clemens Neudecker 2026-03-03 09:43:54 +01:00
  • ff7dc31a68 do_prediction*: rename identifiers for artificial class thresholding Robert Sachunsky 2026-03-02 13:08:11 +01:00
  • b9cf68b51a training: fix b6d2440c Robert Sachunsky 2026-03-01 20:00:05 +01:00
  • ae3b6916ee assert within vit_resnet50_unet model is commented out since arising assert error vahidrezanezhad 2026-03-01 18:39:30 +01:00
  • 7f7bdab208 patches class for VIT encoder is corrected vahidrezanezhad 2026-03-01 18:26:29 +01:00
  • 686f1d34aa do_prediction*: simplify (esp. indexing/slicing) Robert Sachunsky 2026-03-01 04:37:20 +01:00
  • 3b56fa2a5b training: plot GT/prediction and metrics before training (commented) Robert Sachunsky 2026-02-28 20:08:10 +01:00