Commit graph

  • 38745111df
    Merge c4434c7f7d into 586077fbcd Konstantin Baierer 2026-02-19 12:59:25 +00:00
  • c4434c7f7d same task name for transformer-ocr training and inference integrating_trocr_and_torch_ensembling_and_updating_characters_list vahidrezanezhad 2026-02-19 13:59:16 +01:00
  • a84ae67e7a fix a typo vahidrezanezhad 2026-02-19 00:04:42 +01:00
  • 77adcbea8a copy characters list needed for cnn-rnn ocr model output while training and ensembling vahidrezanezhad 2026-02-18 16:47:21 +01:00
  • 4f66734e4d eynollah config files has renamed from config.json to config_eynollah.json - training trocr model still misses to write config file into checkpoint directories vahidrezanezhad 2026-02-18 16:04:44 +01:00
  • b426f7f152 trocr inference is integrated - works on CPU cause seg fault on GPU vahidrezanezhad 2026-02-18 15:04:54 +01:00
  • 8162f64297
    Merge a9496bbc70 into 586077fbcd Robert Sachunsky 2026-02-17 17:39:56 +00:00
  • a9496bbc70 enhancer/mbreorder: use std Keras data loader for classification Robert Sachunsky 2026-02-17 18:39:30 +01:00
  • 003c88f18a fix double import in 82266f82 Robert Sachunsky 2026-02-17 18:23:32 +01:00
  • f61effe8ce fix typo in c8240905 Robert Sachunsky 2026-02-17 18:20:58 +01:00
  • 5f71333649 fix missing import in 49261fa9 Robert Sachunsky 2026-02-17 18:11:49 +01:00
  • 67fca82f38 fix missing import in 27f43c17 Robert Sachunsky 2026-02-17 18:06:08 +01:00
  • 6a4163ae56 fix typo in 27f43c17 Robert Sachunsky 2026-02-17 17:48:43 +01:00
  • c1b5cc92af fix typo in 7562317d Robert Sachunsky 2026-02-17 17:43:57 +01:00
  • 7bef8fa95a training.train: add verbose=1 consistently Robert Sachunsky 2026-02-17 17:24:07 +01:00
  • 9b66867c21 training.models: re-use transformer builder code Robert Sachunsky 2026-02-17 17:35:20 +01:00
  • daa084c367 training.models: re-use UNet decoder builder code Robert Sachunsky 2026-02-17 17:11:15 +01:00
  • fcd10c3956 training.models: re-use RESNET50 builder (+weight init) code Robert Sachunsky 2026-02-17 14:52:04 +01:00
  • 4414f7b89b training.models.vit_resnet50_unet: re-use IMAGE_ORDERING Robert Sachunsky 2026-02-17 14:18:32 +01:00
  • 7888fa5968 training: remove data_gen in favor of tf.data pipelines Robert Sachunsky 2026-02-08 04:42:44 +01:00
  • 83c2408192 training.utils.data_gen: avoid repeated array allocation Robert Sachunsky 2026-02-08 01:25:53 +01:00
  • 514a897dd5 training.train: assert n_epochs vs. index_start Robert Sachunsky 2026-02-08 01:11:57 +01:00
  • 37338049af training: use relative imports Robert Sachunsky 2026-02-08 01:11:44 +01:00
  • 7b7ef041ec training.models: use asymmetric zero padding instead of lambda layer Robert Sachunsky 2026-02-08 01:10:56 +01:00
  • ee4bffd81d training.train: simplify transformer cfg checks Robert Sachunsky 2026-02-08 01:10:13 +01:00
  • 53252a59c6 training.models: fix glitch introduced in 3a73ccca Robert Sachunsky 2026-02-08 01:09:40 +01:00
  • ea285124ce fix Patches/PatchEncoder (make configurable again) Robert Sachunsky 2026-02-08 01:06:57 +01:00
  • 733462381c bug fix: layout visualization vahidrezanezhad 2026-02-16 11:50:39 +01:00
  • 9aa19aa6fa cnn-rnn ocr inference: get input shape of model vahidrezanezhad 2026-02-16 11:45:56 +01:00
  • 68dd5eab62 fixing: imporing StringLookup vahidrezanezhad 2026-02-12 15:37:11 +01:00
  • 2ee8d8e050
    Update inference.py vahidrezanezhad 2026-02-12 15:34:58 +01:00
  • ed034aa8ce
    Update inference.py vahidrezanezhad 2026-02-12 15:28:15 +01:00
  • 47fa22112c import tensorflow is uncommented for ocr training vahidrezanezhad 2026-02-11 19:52:56 +01:00
  • ab43477451 extracting ocr textline images and text: vertical lines threshold has changed to 1.4 vahidrezanezhad 2026-02-10 14:32:23 +01:00
  • a57914a68a fixed: textline and text correct extraction for page xml if vertical textlines are excluded + textline and text extraction for page alto files vahidrezanezhad 2026-02-09 18:53:08 +01:00
  • 2492c257c6 ocrd-tool.json: re-instante light_version and textline_light dummies for backwards compatibility Robert Sachunsky 2026-02-07 16:52:54 +01:00
  • bd282a594d training follow-up: Robert Sachunsky 2026-02-07 16:34:55 +01:00
  • 27f43c175f Merge branch 'main' into ro-fixes and resolve conflicts… Robert Sachunsky 2026-02-07 14:05:56 +01:00
  • 6944d31617 modify manual RO preference… Robert Sachunsky 2026-02-05 17:58:32 +01:00
  • d047327a1f
    Merge pull request #5 from bertsky/ro-fixes-update-deps Robert Sachunsky 2026-02-05 17:36:50 +01:00
  • 0d3a8eacba improve/update docs/train.md Robert Sachunsky 2026-02-05 14:54:08 +01:00
  • b1633dfc7c training.generate_gt: for RO, skip files if regionRefs are missing Robert Sachunsky 2026-02-05 14:53:26 +01:00
  • 5d0c26b629 training.train: use std Keras data loader for classification Robert Sachunsky 2026-02-05 12:02:58 +01:00
  • f03124f747 training.train: simplify+fix classification data loaders… Robert Sachunsky 2026-02-05 11:58:50 +01:00
  • 82d649061a training.train: fix F1 metric score setup Robert Sachunsky 2026-02-05 11:57:38 +01:00
  • 5c7801a1d6 training.train: simplify config args for model builder Robert Sachunsky 2026-02-05 11:56:11 +01:00
  • 4a65ee0c67 training.train: more config dependencies… Robert Sachunsky 2026-02-05 11:53:19 +01:00
  • fbf252db13 torch model ensembling is integrated vahidrezanezhad 2026-02-04 21:16:08 +01:00
  • 7562317da5 training: fix+simplify load_model logic for continue_training Robert Sachunsky 2026-02-04 17:35:38 +01:00
  • 1581094141 training: extend index_start to tasks classification and RO Robert Sachunsky 2026-02-04 17:35:12 +01:00
  • e85003db4a training: re-instate index_start, reflect cfg dependency Robert Sachunsky 2026-02-04 17:32:24 +01:00
  • 498ff8f7a5 Updating the --help descriptions vahidrezanezhad 2026-02-03 20:27:06 +01:00
  • fff4253352 generate or update list of characters in the case of cnn-rnn ocr training vahidrezanezhad 2026-02-03 20:20:20 +01:00
  • 60f0fb541d integrating transformer ocr vahidrezanezhad 2026-02-03 19:45:50 +01:00
  • dda8454236
    Merge 42a3cc2335 into 586077fbcd Robert Sachunsky 2026-02-03 14:36:39 +01:00
  • 586077fbcd 📦 v0.7.0 main v0.7.0 kba 2026-01-30 16:40:55 +01:00
  • 4ade0f788f 📝 changelog kba 2026-01-29 17:12:35 +01:00
  • f13560726e Merge remote-tracking branch 'origin/adding-cnn-rnn-training-script' into 2026-01-29-training kba 2026-01-29 17:32:08 +01:00
  • 25153ad307 training: add IoU metric Robert Sachunsky 2026-01-29 12:19:09 +01:00
  • d1e8a02fd4 training: fix epoch size calculation Robert Sachunsky 2026-01-29 03:01:14 +01:00
  • 29a0f19cee training: simplify image preprocessing… Robert Sachunsky 2026-01-28 13:53:11 +01:00
  • 87190f8997 Merge branch 'adding-cnn-rnn-training-script-rfct' into 2026-01-29-training kba 2026-01-29 10:27:36 +01:00
  • a76de1e182 Merge branch 'adding-cnn-rnn-training-script' into 2026-01-29-training kba 2026-01-29 10:26:34 +01:00
  • ef3cf02877 Merge branch 'ruff-training' into 2026-01-29-training kba 2026-01-29 10:26:14 +01:00
  • e69b35b49c training.train.config_params: re-organise to reflect dependencies Robert Sachunsky 2026-01-28 13:49:23 +01:00
  • 0372fd7a1e training.gt_gen_utils: fix+simplify cropping… Robert Sachunsky 2026-01-28 13:42:59 +01:00
  • acda9c84ee training.gt_gen_utils: improve XML→img path mapping… Robert Sachunsky 2026-01-28 13:28:03 +01:00
  • eb92760f73 training: download pretrained RESNET weights if missing Robert Sachunsky 2026-01-22 19:49:39 +01:00
  • 6a81db934e improve docs/train.md Robert Sachunsky 2026-01-22 11:25:50 +01:00
  • 87d7ffbdd8 training: use proper Keras callbacks and top-level loop Robert Sachunsky 2026-01-22 11:25:00 +01:00
  • f9695cd7be Merge branch 'adding-cnn-rnn-training-script' of https://github.com/qurator-spk/eynollah into adding-cnn-rnn-training-script adding-cnn-rnn-training-script vahidrezanezhad 2026-01-28 11:52:36 +01:00
  • 3500167870 weights ensembling for tensorflow models is integrated vahidrezanezhad 2026-01-28 11:52:12 +01:00
  • 33f6a231bc fix: prevent crash when printspace is missing in xmls used for label generation vahidrezanezhad 2026-01-26 17:30:26 +01:00
  • 6ae244bf9b Fix filename stem extraction using binarization. Restore the CNN-RNN model to its previous version, as setting channels_last alone was insufficient for running on both CPU and GPU. Prevent errors caused by null values in image shape elements. vahidrezanezhad 2026-01-26 15:03:11 +01:00
  • 30f39e7383 mapregion is added to labels vahidrezanezhad 2026-01-26 13:56:34 +01:00
  • c8240905a8 Fix label generation by selecting largest contour when erosion splits shapes vahidrezanezhad 2026-01-26 13:36:24 +01:00
  • 3c3effcfda drop TF1 vernacular, relax TF/Keras and Torch requirements… Robert Sachunsky 2026-01-20 04:18:55 +01:00
  • e2754da4f5 adapt to Numpy 1.25 changes… Robert Sachunsky 2026-01-20 04:04:07 +01:00
  • 9ccc495b4a wip adding-cnn-rnn-training-script-rfct kba 2025-12-19 14:57:10 +01:00
  • 49261fa99b CNN–RNN–OCR inference and adaptation of the CNN–RNN–OCR model to support inference on both CPU and GPU vahidrezanezhad 2025-12-17 15:12:39 +01:00
  • 6ee79c7320 evaluation with a given GT is only possible for segmentation tasks vahidrezanezhad 2025-12-17 13:28:02 +01:00
  • 4651000191 debuging input shape + enable finetuning a model vahidrezanezhad 2025-12-15 11:36:09 +01:00
  • 6241530293
    Merge dbe06867a6 into 0f410c2e7c Konstantin Baierer 2025-12-10 13:24:44 +00:00
  • dbe06867a6 wip: remove textline_light=True from call to EynollahXmlWriter reduce-complexity-rebased kba 2025-12-10 14:24:32 +01:00
  • 58000069cf Restore correct execution of export_textline_images_and_text vahidrezanezhad 2025-12-03 15:40:52 +01:00
  • 5716262629 Fix eynollah ocr --help so it works again vahidrezanezhad 2025-12-03 14:11:47 +01:00
  • 86d437b77b Restored correct functionality of the extract_only_images mode and cleaned up the argument handling vahidrezanezhad 2025-12-03 12:01:42 +01:00
  • 4175b52768 log to STDERR not STDOUT kba 2025-12-02 15:00:33 +01:00
  • 04d21b9d92 🔥 refactor eynollah ocr kba 2025-11-28 14:54:43 +01:00
  • 244847e3d4 move line-gt extraction out of ocr to eynollah-training kba 2025-11-28 12:09:50 +01:00
  • 058478baf3 CI: do not upgrade (now-unpineed) torch kba 2025-11-28 15:03:06 +01:00
  • fcd87fc3cf 💀 remove dead code from eynollah.py kba 2025-12-10 13:14:32 +01:00
  • 1eef5514d7 eynollah.py: fix kwargs to writer kba 2025-11-28 10:50:50 +01:00
  • b7d3a6724b enforce kwargs for writer.build_... kba 2025-11-27 12:43:45 +01:00
  • 97959869ba remove more branches after textline_light default true kba 2025-11-27 11:30:00 +01:00
  • 5d497b0f72 factor out extract_only_images as eynollah extract-images kba 2025-11-26 21:35:45 +01:00
  • b10773aae6 🔥 replace light_version/textline_light with True kba 2025-12-10 12:56:01 +01:00
  • 4fc3ff33cb The cnn-rnn ocr model can be trained now vahidrezanezhad 2025-12-09 17:22:12 +01:00
  • 84a72a128b cnn-rnn model can be called - model input height and width are dynamic now - data generator is also callable vahidrezanezhad 2025-12-09 15:30:19 +01:00
  • 59e5a73654 adding cnn-rnn training script vahidrezanezhad 2025-12-08 19:30:57 +01:00