eynollah/tests/cli_tests/test_binarization.py

import pytest
from PIL import Image

@pytest.mark.parametrize(
    "options",
    [
            [], # defaults
            ["--no-patches"],
    ], ids=str)
def test_run_eynollah_binarization_filename(
    tmp_path,
    run_eynollah_ok_and_check_logs,
    resources_dir,
    options,
):
    infile = resources_dir / '2files/kant_aufklaerung_1784_0020.tif'
    outfile = tmp_path / 'kant_aufklaerung_1784_0020.png'
    run_eynollah_ok_and_check_logs(
        'binarization',
        [
            '-i', str(infile),
            '-o', str(outfile),
        ] + options,
        [
            'Loaded model'
        ]
    )
    assert outfile.exists()
    with Image.open(infile) as original_img:
        original_size = original_img.size
    with Image.open(outfile) as binarized_img:
        binarized_size = binarized_img.size
    assert original_size == binarized_size

def test_run_eynollah_binarization_directory(
    tmp_path,
    run_eynollah_ok_and_check_logs,
    resources_dir,
    image_resources,
):
    outdir = tmp_path
    run_eynollah_ok_and_check_logs(
        'binarization',
        [
            '-di', str(resources_dir / '2files'),
            '-o', str(outdir),
        ],
        [
            f'Binarizing [  1/2] {image_resources[0].name}',
            f'Binarizing [  2/2] {image_resources[1].name}',
        ]
    )
    assert len(list(outdir.iterdir())) == 2
refactor cli tests 2025-10-29 16:20:30 +01:00			`import pytest`
			`from PIL import Image`

			`@pytest.mark.parametrize(`
			`"options",`
			`[`
			`[], # defaults`
			`["--no-patches"],`
			`], ids=str)`
			`def test_run_eynollah_binarization_filename(`
			`tmp_path,`
			`run_eynollah_ok_and_check_logs,`
tests: adapt paths 2025-11-13 11:46:31 +01:00			`resources_dir,`
refactor cli tests 2025-10-29 16:20:30 +01:00			`options,`
			`):`
tests: more path fixes 2025-11-13 12:21:48 +01:00			`infile = resources_dir / '2files/kant_aufklaerung_1784_0020.tif'`
tests: adapt paths 2025-11-13 11:46:31 +01:00			`outfile = tmp_path / 'kant_aufklaerung_1784_0020.png'`
refactor cli tests 2025-10-29 16:20:30 +01:00			`run_eynollah_ok_and_check_logs(`
make --model-basedir and --model-overrides top-level CLI options 2025-10-29 18:24:17 +01:00			`'binarization',`
refactor cli tests 2025-10-29 16:20:30 +01:00			`[`
			`'-i', str(infile),`
			`'-o', str(outfile),`
			`] + options,`
			`[`
Merge branch 'main' into ro-fixes and resolve conflicts… major conflicts resolved manually: - branches for non-`light` segmentation already removed in main - Keras/TF setup and no TF1 sessions, esp. in new ModelZoo - changes to binarizer and its CLI (`mode`, `overwrite`, `run_single()`) - writer: `build...` w/ kwargs instead of positional - training for segmentation/binarization/enhancement tasks: * drop unused `generate_data_from_folder()` * simplify `preprocess_imgs()`: turn `preprocess_img()`, `get_patches()` and `get_patches_num_scale_new()` into generators, only writing result files in the caller (top-level loop) instead of passing output directories and file counter - training for new OCR task: * `train`: put keys into additional `config_params` where they belong, resp. (conditioned under existing keys), and w/ better documentation * `train`: add new keys as kwargs to `run()` to make usable * `utils`: instead of custom data loader `data_gen_ocr()`, re-use existing `preprocess_imgs()` (for cfg capture and top-level loop), but extended w/ new kwargs and calling new `preprocess_img_ocr()`; the latter as single-image generator (also much simplified) * `train`: use tf.data loader pipeline from that generator w/ standard mechanisms for batching, shuffling, prefetching etc. * `utils` and `train`: instead of `vectorize_label`, use `Dataset.padded_batch` * add TensorBoard callback and re-use our checkpoint callback * also use standard Keras top-level loop for training still problematic (substantially unresolved): - `Patches` now only w/ fixed implicit size (ignoring training config params) - `PatchEncoder` now only w/ fixed implicit num patches and projection dim (ignoring training config params) 2026-02-07 14:05:56 +01:00			`'Loaded model'`
refactor cli tests 2025-10-29 16:20:30 +01:00			`]`
			`)`
			`assert outfile.exists()`
			`with Image.open(infile) as original_img:`
			`original_size = original_img.size`
			`with Image.open(outfile) as binarized_img:`
			`binarized_size = binarized_img.size`
			`assert original_size == binarized_size`

			`def test_run_eynollah_binarization_directory(`
			`tmp_path,`
			`run_eynollah_ok_and_check_logs,`
			`resources_dir,`
			`image_resources,`
			`):`
			`outdir = tmp_path`
			`run_eynollah_ok_and_check_logs(`
make --model-basedir and --model-overrides top-level CLI options 2025-10-29 18:24:17 +01:00			`'binarization',`
refactor cli tests 2025-10-29 16:20:30 +01:00			`[`
tests: adapt paths 2025-11-13 11:46:31 +01:00			`'-di', str(resources_dir / '2files'),`
refactor cli tests 2025-10-29 16:20:30 +01:00			`'-o', str(outdir),`
			`],`
			`[`
Merge branch 'main' into ro-fixes and resolve conflicts… major conflicts resolved manually: - branches for non-`light` segmentation already removed in main - Keras/TF setup and no TF1 sessions, esp. in new ModelZoo - changes to binarizer and its CLI (`mode`, `overwrite`, `run_single()`) - writer: `build...` w/ kwargs instead of positional - training for segmentation/binarization/enhancement tasks: * drop unused `generate_data_from_folder()` * simplify `preprocess_imgs()`: turn `preprocess_img()`, `get_patches()` and `get_patches_num_scale_new()` into generators, only writing result files in the caller (top-level loop) instead of passing output directories and file counter - training for new OCR task: * `train`: put keys into additional `config_params` where they belong, resp. (conditioned under existing keys), and w/ better documentation * `train`: add new keys as kwargs to `run()` to make usable * `utils`: instead of custom data loader `data_gen_ocr()`, re-use existing `preprocess_imgs()` (for cfg capture and top-level loop), but extended w/ new kwargs and calling new `preprocess_img_ocr()`; the latter as single-image generator (also much simplified) * `train`: use tf.data loader pipeline from that generator w/ standard mechanisms for batching, shuffling, prefetching etc. * `utils` and `train`: instead of `vectorize_label`, use `Dataset.padded_batch` * add TensorBoard callback and re-use our checkpoint callback * also use standard Keras top-level loop for training still problematic (substantially unresolved): - `Patches` now only w/ fixed implicit size (ignoring training config params) - `PatchEncoder` now only w/ fixed implicit num patches and projection dim (ignoring training config params) 2026-02-07 14:05:56 +01:00			`f'Binarizing [ 1/2] {image_resources[0].name}',`
			`f'Binarizing [ 2/2] {image_resources[1].name}',`
refactor cli tests 2025-10-29 16:20:30 +01:00			`]`
			`)`
			`assert len(list(outdir.iterdir())) == 2`