- `cache_images()`: only return an image dict (plus extra keys
for file name stem and dpi) - don't set any attributes
- `imread()`: just take from passed image dict, also add `binary` key
- `resize_and_enhance_image_with_column_classifier()`:
* `imread()` from image dict
* set `img_bin` key for binarization result if `input_binary`
* instead of `image_page_org_size` / `page_coord` attributes,
set `img_page` / `coord_page` in image dict
* instead of retval, set `img_res` in image dict
* also set `scale_x` and `scale_y` in image dict, resp.
* simplify
- `resize_image_with_column_classifier()`:
* `imread()` from image dict
* (as in `resize_and_enhance_with_column_classifier`:)
call `calculate_width_height_by_columns_1_2` if `num_col` is
1 or 2 here
* instead of retval, set `img_res` in image dict
* also set `scale_x` and `scale_y` in image dict, resp.
* simplify
- `calculate_width_height_by_columns*()`: simplify, get confidence of
num_col instead of entire array
- `extract_page()`: read `img_res` from image dict; simplify
- `early_page_for_num_of_column_classification()`:
`imread()` from image dict; simplify
- `textline_contours()`: no need for `num_col_classifier` here
- `run_textline()`: no need for `num_col_classifier` here
- `get_regions_light_v()` → `get_regions()`:
* read `img_res` from image dict
* get shapes via `img` from image dict instead of `image_org` attr
* use `img_page` / `coord_page` from image dict instead of attrs
* avoid unnecessary 3-channel arrays
* simplify
- `get_tables_from_model()`: no need for `num_col_classifier` here
- `run_graphics_and_columns_light()` → `run_graphics_and_columns()`:
* pass through image dict instead of `img_bin` (which really was `img_res`)
* simplify
- `run_graphics_and_columns_without_layout()`:
* pass through image dict instead of `img_bin` (which really was `img_res`)
* simplify
- `run_enhancement()`: pass through image dict
- `get_image_and_sclaes*()`: drop
- `run_boxes_full_layout()`:
* pass `image_page` instead of `img_bin` (which really was `image_page`)
* simplify
- `run()`:
* instantiate plotter outside of loop, and independent of img files
* move writer instantiation and overwrite checks into `run_single()`
* add try/catch for `run_single()` w/ logging
- `reset_file_name_dir`: drop
- `run_single()`:
* add some args/kwargs from `run()`
* call `cache_images()` (reading image dict) here
* instantiate writer here instead of (reused) attr in `run()`
* set `scale_x` / `scale_y` in writer from image dict once known
(i.e. after `run_enhancement()`)
* don't return anything, but write PAGE result here
- `check_any_text_region_in_model_one_is_main_or_header_light()` →
`split_textregion_main_vs_header()`
- plotter:
* pass `name` (file stem) from image dict to all methods
* for `write_images_into_directory()`: also `scale_x` and `scale_y`
from image dict
- writer:
* init with width/height
- ocrd processor:
* adapt (just `run_single()` call)
* drop `max_workers=1` restriction (can now run fully parallel)
- `get_textregion_contours_in_org_image_light()` →
`get_textregion_confidences()`:
* take shape from confmat directly instead of extra array
* simplify
When `num_col_classifier` predicted result gets bypassed
by heuristic result from `find_num_col()` (because prediction
had too little confidence or `calculate_width_height_by_columns()`
would have become too large), do not increment `num_col` further
(already 1 more than colseps).
- rename `terminate` → `stopped`
- call `terminate()` from superclass during shutdown
- del `self.model_zoo` in the parent process after spawn,
and in the child during shutdown
- new class `Predictor(multiprocessing.Process)` as stand-in
for EynollahModelZoo:
* calling `load_models()` starts the subprocess (and has
`.model_zoo.load_models()` run internally)
* calling `get()` yields a stand-in that supports `.predict()`,
which actually communicates with the singleton subprocess
via task and result queues, sharing Numpy arrays via SHM
* calling `predict()` with an empty dict (instead of an image)
merely retrieves the respective model's output shapes (cached)
* shared memory objects for arrays are cleared as soon as possible
* log messages are piped through QueueHandler / QueueListener
* exceptions are passed through the queues, and raised afterwards
- move all TF initialization to the predictor
(to avoid back and forth between CPU and GPU memory when looping
over image patches)
- `patch_encoder`: define `Model` subclasses which take an existing
(layout segmentation) model in the constructor, and define a new
`call()` using the existing model in a GPU-only `tf.function`:
* `wrap_layout_model_resized`: just `tf.image.resize()` from
input image to model size, then predict, then resize back
* `wrap_layout_model_patched`: ditto if smaller than model size;
otherwise use `tf.image.extract_patches` for patching in a
sliding-window approach, then predict patches one by one, then
`tf.scatter_nd` to reconstruct to image size
- when compiling `tf.function` graph, make sure to use input signature
with variable image size, but avoid retracing each new size sample
- in `EynollahModelZoo.load_model` for relevant model types,
also wrap the loaded model
* by `wrap_layout_model_resized` under model name + `_resized`
* by `wrap_layout_model_patched` under model name + `_patched`
- introduce `do_prediction_new_concept_autosize`,
replacing `do_prediction/_new_concept`,
but using passed model's `predict` directly without
resizing or tiling to model size
- instead of `do_prediction/_new_concept(True, ...)`,
now call `do_prediction_new_concept_autosize`,
but with `_patched` appended to model name
- instead of `do_prediction/_new_concept(False, ...)`,
now call `do_prediction_new_concept_autosize`,
but with `_resized` appended to model name
- `do_prediction/_new_concept`: avoid unnecessary `np.repeat`
on results, aggregate intermediate artificial class mask and
confidence data in extra arrays
- callers: avoid unnecessary thresholding the result arrays
- callers: adapt (no need to slice into channels)
- simplify by refactoring thresholding and skeletonization into
function `seg_mask_label`
- `extract_text_regions*`: drop unused second result array
- `textline_contours`: avoid calculating unused unpatched prediction
- instead of just comparing the number of connected components,
calculate the GT/pred label incidence matrix and retrieve the
share of singular values (i.e. nearly diagonal under reordering)
over total counts as similarity score
- also, suppress artificial class in that
(Functions cannot be both generators and procedures,
so make this a pure generator and save the image files
on the caller's side; also avoids passing output
directories)
Moreover, simplify by moving the `os.listdir` into the function
body (saving lots of extra variable bindings).