- rename `get_regions()` → `get_early_layout()`
- split up `run_boxes_no/full_layout()` into shared
* `get_full_layout()` (for lapping mapping,
table decoding and optional full model prediction)
* `get_deskewed_masks()` (for de-rotation)
* extraction of various region types (polygons and confidences)
* `run_boxes_order()` (for column detection and box ordering)
- rename `contours_tables` → `polygons_of_tables`
This further reduces redundant code, avoids splitting up the same
functionality across different places depending on mode etc.
- `run_single`: re-use `return_contours_of_interested_region`
for extraction and filtering of text region contours
- `run_single`: isolate new function `match_deskewed_contours`
- `run_single`: apply dilation afterwards
- rename `contours_only_text_parent_d_ordered` → `polygons_of_textregions_d`
- rename `contours_only_text_parent` → `polygons_of_textregions`
- rename `contours_only_text_parent_h` → `polygons_of_textregions_h`
- `do_work_of_slopes_new_curved` and `get_slopes_and_deskew_new_curved`:
no need for `mask_texts_only` array arg
- `filter_contours_inside_a_bigger_one`: no need for `image` as array arg,
simplify
- `split_textregion_main_vs_head`: simplify, re-order arguments
and return tuple logically
- if no main text regions are found, just convert marginals to main text
and continue normally instead of stopping early w/ empty marginals (i.e.
no textlines)
- do_order_of_regions_with_model:
* add `polygons_of_drop_capitals`, order these indices as well
(model was not trained for this, but it works)
* explicit label identifiers instead of number literals
* map marginals and images correctly
* simplify (a lot)
* reduce inference batch size to accomodate 8 GB VRAM GPUs
- return_indexes_of_contours_located_inside_another_list_of_contours:
simplify
- pass on probabilities from predicted class everywhere
- rename `confidence_matrix` → `confidence_regions` / `regions_confidence`
- rename `get_textregion_confidences()` → `get_region_confidences()`
- add same for tables, textlines and regionsfl (full layout model)
- aggregate per-region confidence lists for image, table, drop-capital,
left marginal and right marginal regions
- add in writer
- simplify/re-indent some
- try to replace more number literals with class label identifiers
- re-introduce boosting `heading` thresholding broken
when refactoring (light version and do_prediction)
- also return confidence for full layout prediction
1. use connected component analysis to get unique segments
in early prediction result
2. for each drop-capital segment in full prediction result,
find matching early segment
3. when they have high overlap, assign drop-capital label
to the entire early segment
- rename `putt_bb_of_drop_capitals_of_model_in_patches_in_layout`
→ `fill_bb_of_drop_capitals`
- also allow image (besides text) label in early layout prediction
result when checking if entire bbox can be filled (as opposed to
just drop-capital | image | background mask)
- simplify
fix bug where in non-full mode, the wrong class label was assumed
for separator regions (3 in non- vs 6 in full layout mode):
- pass in separator mask instead of full segmentation map
- rename for clarity:
- `regions_without_separators` → `text_mask` (alread binary)
- `regions_with_separators` → `sep_mask` (now just binary)
(thresholding and decoding with artificial boundary class can
overwrite existing column separators, which in turn can contribute
to missing column boundaries; this prioritises seps over boundaries,
which does not impair separation of instances, as seps will separate
text/image/etc instances just as well as artificial boundaries)
When 338c4a0e wrapped all prediction models for automatic
image size adaptation in CUDA,
- tiling (`_patched`) was indeed faster
- whole (`_resized`) was actually slower
But CUDA-based tiling also increases GPU memory requirements
a lot. And with the new parallel subprocess predictors, Numpy-
based tiling is not necessarily slower anymore.
(avoid strange image handling short-cut, which uses
early cropped image used for column classification
instead of normal image in 1/2-column cases;
fixes accuracy issues of region_1_2 model on these images)
(as follow-up to ec08004f:)
- create log queues and QueueListener separately for each job
- receive job logs sequentially
- drop log filter mechanism (prefixing log messages by file name)
- also count ratio of successful jobs
allow setting device specifier to load models into
either
- CPU or single GPU0, GPU1 etc
- per-model patterns, e.g. col*:CPU,page:GPU0,*:GPU1
pass through as kwargs until `ModelZoo.load_models()` setup up TF
- Eynollah: instead of one `Predictor` instance as stand-in for
entire `ModelZoo`, keep the latter but have each model in `_loaded`
dict become an independent predictor instance
- `ModelZoo.load_models()`: instantiate `Predictor`s for each
`model_category` and then call `Predictor.load_model()` on them
- `Predictor.load_model()`: set args/kwargs for `ModelZoo.load_model()`,
then spawn subprocess via `.start()`, which first enters `setup()`...
- `Predictor.setup()`: call `ModelZoo.load_model()` instead of (plural)
`.load_models()`; save to `self.model` instead of `self.model_zoo`
- `ModelZoo.load_model()`: move _all_ CUDA configuration and
TF/Keras-specific module initialization here (to be used only by
predictor subprocess)
- `Predictor`: drop stand-in `SingleModelPredictor` retrieved by `get()`;
directly provide `predict()` and `output_shape` via `self.call()`
- `Predictor`: drop `model` arg from all queues - now implicit; use
`self.name` for model name in messages
- `Predictor`: no need for requeuing other tasks (only same model now)
- `Predictor`: reduce rebatching batch sizes due to increased VRAM footprint
- `Eynollah.setup_models()`: set up loading `_patched` / `_resized`
here instead of during `ModelZoo.load_model()`
- `ModelZoo.load_models()`: for resized/patched models, call
`Predictor.load_model()` with kwarg instead of resp. model name suffix
- `ModelZoo.load_model()`: expect boolean kwargs `patched/resized`
for `wrap_layout_model_patched/resized` model wrappers, respectively
- depending on model type (i.e. size), configure target
batch sizes
- after receiving a prediction task for some model,
look up target batch size, then try to retrieve arrays
from follow-up tasks for the same model on the task queue;
stop when either no tasks are immediately available or
when the combined batch size (input batch size * number of tasks)
reaches the target
- push back tasks for other models to the queue
- rebatch: read all shared arrays, concatenate them along axis 0,
map respective job ids they came from
- predict on new (possibly larger) batch
- split result along axis 0 into number of jobs
- send each result along with its jobid to task queue
- set up a Queue and QueueListener along with ProcessPoolExecutor,
delegating messages from the queue to all handlers
- in forked subprocesses, instead of just inheriting handlers,
replace them with a single QueueHandler, and make sure
log messages get prefixes by the respective job id (img_filename)
so concurrent messages will still be readable
- in the predictor, make sure to pass on the log level to the
spawned subprocess, too
When 338c4a0e wrapped all prediction models for automatic
image size adaptation in CUDA,
- tiling (`_patched`) was indeed faster
- whole (`_resized`) was actually slower
So this reverts the latter part.
- reintroduce ProcessPoolExecutor
(previously for parallel deskewing within pages)
- wrap Eynollah instance into global, so (with forking)
serialization can be avoided – same pattern as in core ocrd.Processor
- move timing/logging into `run_single()`, respectively
- `cache_images()`: only return an image dict (plus extra keys
for file name stem and dpi) - don't set any attributes
- `imread()`: just take from passed image dict, also add `binary` key
- `resize_and_enhance_image_with_column_classifier()`:
* `imread()` from image dict
* set `img_bin` key for binarization result if `input_binary`
* instead of `image_page_org_size` / `page_coord` attributes,
set `img_page` / `coord_page` in image dict
* instead of retval, set `img_res` in image dict
* also set `scale_x` and `scale_y` in image dict, resp.
* simplify
- `resize_image_with_column_classifier()`:
* `imread()` from image dict
* (as in `resize_and_enhance_with_column_classifier`:)
call `calculate_width_height_by_columns_1_2` if `num_col` is
1 or 2 here
* instead of retval, set `img_res` in image dict
* also set `scale_x` and `scale_y` in image dict, resp.
* simplify
- `calculate_width_height_by_columns*()`: simplify, get confidence of
num_col instead of entire array
- `extract_page()`: read `img_res` from image dict; simplify
- `early_page_for_num_of_column_classification()`:
`imread()` from image dict; simplify
- `textline_contours()`: no need for `num_col_classifier` here
- `run_textline()`: no need for `num_col_classifier` here
- `get_regions_light_v()` → `get_regions()`:
* read `img_res` from image dict
* get shapes via `img` from image dict instead of `image_org` attr
* use `img_page` / `coord_page` from image dict instead of attrs
* avoid unnecessary 3-channel arrays
* simplify
- `get_tables_from_model()`: no need for `num_col_classifier` here
- `run_graphics_and_columns_light()` → `run_graphics_and_columns()`:
* pass through image dict instead of `img_bin` (which really was `img_res`)
* simplify
- `run_graphics_and_columns_without_layout()`:
* pass through image dict instead of `img_bin` (which really was `img_res`)
* simplify
- `run_enhancement()`: pass through image dict
- `get_image_and_sclaes*()`: drop
- `run_boxes_full_layout()`:
* pass `image_page` instead of `img_bin` (which really was `image_page`)
* simplify
- `run()`:
* instantiate plotter outside of loop, and independent of img files
* move writer instantiation and overwrite checks into `run_single()`
* add try/catch for `run_single()` w/ logging
- `reset_file_name_dir`: drop
- `run_single()`:
* add some args/kwargs from `run()`
* call `cache_images()` (reading image dict) here
* instantiate writer here instead of (reused) attr in `run()`
* set `scale_x` / `scale_y` in writer from image dict once known
(i.e. after `run_enhancement()`)
* don't return anything, but write PAGE result here
- `check_any_text_region_in_model_one_is_main_or_header_light()` →
`split_textregion_main_vs_header()`
- plotter:
* pass `name` (file stem) from image dict to all methods
* for `write_images_into_directory()`: also `scale_x` and `scale_y`
from image dict
- writer:
* init with width/height
- ocrd processor:
* adapt (just `run_single()` call)
* drop `max_workers=1` restriction (can now run fully parallel)
- `get_textregion_contours_in_org_image_light()` →
`get_textregion_confidences()`:
* take shape from confmat directly instead of extra array
* simplify