diff --git a/CHANGELOG.md b/CHANGELOG.md index 0ad9a09..8c6c000 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -22,7 +22,7 @@ Added: Fixed: * allow empty imports for optional dependencies - * avoid Numpy warnings (empty slices etc) + * avoid Numpy warnings (empty slices etc.) * remove deprecated Numpy types * binarization CLI: make `dir_in` usable again diff --git a/README.md b/README.md index e576f4d..3ba5086 100644 --- a/README.md +++ b/README.md @@ -11,23 +11,24 @@ ![](https://user-images.githubusercontent.com/952378/102350683-8a74db80-3fa5-11eb-8c7e-f743f7d6eae2.jpg) ## Features -* Support for up to 10 segmentation classes: +* Support for 10 distinct segmentation classes: * background, [page border](https://ocr-d.de/en/gt-guidelines/trans/lyRand.html), [text region](https://ocr-d.de/en/gt-guidelines/trans/lytextregion.html#textregionen__textregion_), [text line](https://ocr-d.de/en/gt-guidelines/pagexml/pagecontent_xsd_Complex_Type_pc_TextLineType.html), [header](https://ocr-d.de/en/gt-guidelines/trans/lyUeberschrift.html), [image](https://ocr-d.de/en/gt-guidelines/trans/lyBildbereiche.html), [separator](https://ocr-d.de/en/gt-guidelines/trans/lySeparatoren.html), [marginalia](https://ocr-d.de/en/gt-guidelines/trans/lyMarginalie.html), [initial](https://ocr-d.de/en/gt-guidelines/trans/lyInitiale.html), [table](https://ocr-d.de/en/gt-guidelines/trans/lyTabellen.html) * Support for various image optimization operations: * cropping (border detection), binarization, deskewing, dewarping, scaling, enhancing, resizing -* Text line segmentation to bounding boxes or polygons (contours) including for curved lines and vertical text -* Detection of reading order (left-to-right or right-to-left) +* Textline segmentation to bounding boxes or polygons (contours) including for curved lines and vertical text +* Text recognition (OCR) using either CNN-RNN or Transformer models +* Detection of reading order (left-to-right or right-to-left) using either heuristics or trainable models * Output in [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) * [OCR-D](https://github.com/qurator-spk/eynollah#use-as-ocr-d-processor) interface -:warning: Development is currently focused on achieving the best possible quality of results for a wide variety of -historical documents and therefore processing can be very slow. We aim to improve this, but contributions are welcome. +:warning: Development is focused on achieving the best quality of results for a wide variety of historical +documents and therefore processing can be very slow. We aim to improve this, but contributions are welcome. ## Installation Python `3.8-3.11` with Tensorflow `<2.13` on Linux are currently supported. -For (limited) GPU support the CUDA toolkit needs to be installed. +For (limited) GPU support the CUDA toolkit needs to be installed. A known working config is CUDA `11` with cuDNN `8.6`. You can either install from PyPI @@ -56,26 +57,27 @@ make install EXTRAS=OCR Pretrained models can be downloaded from [zenodo](https://zenodo.org/records/17194824) or [huggingface](https://huggingface.co/SBB?search_models=eynollah). -For documentation on methods and models, have a look at [`models.md`](https://github.com/qurator-spk/eynollah/tree/main/docs/models.md). +For documentation on models, have a look at [`models.md`](https://github.com/qurator-spk/eynollah/tree/main/docs/models.md). +Model cards are also provided for our trained models. ## Training -In case you want to train your own model with Eynollah, have see the +In case you want to train your own model with Eynollah, see the documentation in [`train.md`](https://github.com/qurator-spk/eynollah/tree/main/docs/train.md) and use the tools in the [`train` folder](https://github.com/qurator-spk/eynollah/tree/main/train). ## Usage Eynollah supports five use cases: layout analysis (segmentation), binarization, -image enhancement, text recognition (OCR), and (trainable) reading order detection. +image enhancement, text recognition (OCR), and reading order detection. ### Layout Analysis -The layout analysis module is responsible for detecting layouts, identifying text lines, and determining reading order -using both heuristic methods or a machine-based reading order detection model. +The layout analysis module is responsible for detecting layout elements, identifying text lines, and determining reading +order using either heuristic methods or a [pretrained reading order detection model](https://github.com/qurator-spk/eynollah#machine-based-reading-order). -Note that there are currently two supported ways for reading order detection: either as part of layout analysis based -on image input, or, currently under development, for given layout analysis results based on PAGE-XML data as input. +Reading order detection can be performed either as part of layout analysis based on image input, or, currently under +development, based on pre-existing layout analysis results in PAGE-XML format as input. The command-line interface for layout analysis can be called like this: @@ -108,15 +110,15 @@ The following options can be used to further configure the processing: | `-sp ` | save cropped page image to this directory | | `-sa ` | save all (plot, enhanced/binary image, layout) to this directory | -If no option is set, the tool performs layout detection of main regions (background, text, images, separators +If no further option is set, the tool performs layout detection of main regions (background, text, images, separators and marginals). -The best output quality is produced when RGB images are used as input rather than greyscale or binarized images. +The best output quality is achieved when RGB images are used as input rather than greyscale or binarized images. ### Binarization The binarization module performs document image binarization using pretrained pixelwise segmentation models. -The command-line interface for binarization of single image can be called like this: +The command-line interface for binarization can be called like this: ```sh eynollah binarization \ @@ -127,16 +129,16 @@ eynollah binarization \ ### OCR -The OCR module performs text recognition from images using two main families of pretrained models: CNN-RNN–based OCR and Transformer-based OCR. +The OCR module performs text recognition using either a CNN-RNN model or a Transformer model. -The command-line interface for ocr can be called like this: +The command-line interface for OCR can be called like this: ```sh eynollah ocr \ -i | -di \ -dx \ -o \ - -m | --model_name \ + -m | --model_name \ ``` ### Machine-based-reading-order @@ -172,22 +174,20 @@ If the input file group is PAGE-XML (from a previous OCR-D workflow step), Eynol (because some other preprocessing step was in effect like `denoised`), then the output PAGE-XML will be based on that as new top-level (`@imageFilename`) - ocrd-eynollah-segment -I OCR-D-XYZ -O OCR-D-SEG -P models eynollah_layout_v0_5_0 + ocrd-eynollah-segment -I OCR-D-XYZ -O OCR-D-SEG -P models eynollah_layout_v0_5_0 -Still, in general, it makes more sense to add other workflow steps **after** Eynollah. +In general, it makes more sense to add other workflow steps **after** Eynollah. -There is also an OCR-D processor for the binarization: +There is also an OCR-D processor for binarization: ocrd-sbb-binarize -I OCR-D-IMG -O OCR-D-BIN -P models default-2021-03-09 #### Additional documentation -Please check the [wiki](https://github.com/qurator-spk/eynollah/wiki). +Additional documentation is available in the [docs](https://github.com/qurator-spk/eynollah/tree/main/docs) directory. ## How to cite -If you find this tool useful in your work, please consider citing our paper: - ```bibtex @inproceedings{hip23rezanezhad, title = {Document Layout Analysis with Deep Learning and Heuristics}, diff --git a/src/eynollah/eynollah.py b/src/eynollah/eynollah.py index 20954a0..63f7005 100644 --- a/src/eynollah/eynollah.py +++ b/src/eynollah/eynollah.py @@ -4886,9 +4886,9 @@ class Eynollah: textline_mask_tot_ea_org[img_revised_tab==drop_label_in_full_layout] = 0 - text_only = ((img_revised_tab[:, :] == 1)) * 1 + text_only = (img_revised_tab[:, :] == 1) * 1 if np.abs(slope_deskew) >= SLOPE_THRESHOLD: - text_only_d = ((text_regions_p_1_n[:, :] == 1)) * 1 + text_only_d = (text_regions_p_1_n[:, :] == 1) * 1 #print("text region early 2 in %.1fs", time.time() - t0) ###min_con_area = 0.000005 diff --git a/src/eynollah/plot.py b/src/eynollah/plot.py index 412ae5a..c026e94 100644 --- a/src/eynollah/plot.py +++ b/src/eynollah/plot.py @@ -12,7 +12,7 @@ from .utils import crop_image_inside_box from .utils.rotate import rotate_image_different from .utils.resize import resize_image -class EynollahPlotter(): +class EynollahPlotter: """ Class collecting all the plotting and image writing methods """ diff --git a/src/eynollah/utils/__init__.py b/src/eynollah/utils/__init__.py index 6eeabd0..de083f5 100644 --- a/src/eynollah/utils/__init__.py +++ b/src/eynollah/utils/__init__.py @@ -138,8 +138,7 @@ def return_x_start_end_mothers_childs_and_type_of_reading_order( min_ys=np.min(y_sep) max_ys=np.max(y_sep) - y_mains=[] - y_mains.append(min_ys) + y_mains= [min_ys] y_mains_sep_ohne_grenzen=[] for ii in range(len(new_main_sep_y)): @@ -493,8 +492,7 @@ def find_num_col(regions_without_separators, num_col_classifier, tables, multipl # print(forest[np.argmin(z[forest]) ] ) if not isNaN(forest[np.argmin(z[forest])]): peaks_neg_true.append(forest[np.argmin(z[forest])]) - forest = [] - forest.append(peaks_neg_fin[i + 1]) + forest = [peaks_neg_fin[i + 1]] if i == (len(peaks_neg_fin) - 1): # print(print(forest[np.argmin(z[forest]) ] )) if not isNaN(forest[np.argmin(z[forest])]): @@ -662,8 +660,7 @@ def find_num_col_only_image(regions_without_separators, multiplier=3.8): # print(forest[np.argmin(z[forest]) ] ) if not isNaN(forest[np.argmin(z[forest])]): peaks_neg_true.append(forest[np.argmin(z[forest])]) - forest = [] - forest.append(peaks_neg_fin[i + 1]) + forest = [peaks_neg_fin[i + 1]] if i == (len(peaks_neg_fin) - 1): # print(print(forest[np.argmin(z[forest]) ] )) if not isNaN(forest[np.argmin(z[forest])]): @@ -1211,7 +1208,7 @@ def order_of_regions(textline_mask, contours_main, contours_header, y_ref): ##plt.plot(z) ##plt.show() - if contours_main != None: + if contours_main is not None: areas_main = np.array([cv2.contourArea(contours_main[j]) for j in range(len(contours_main))]) M_main = [cv2.moments(contours_main[j]) for j in range(len(contours_main))] cx_main = [(M_main[j]["m10"] / (M_main[j]["m00"] + 1e-32)) for j in range(len(M_main))] @@ -1222,7 +1219,7 @@ def order_of_regions(textline_mask, contours_main, contours_header, y_ref): y_min_main = np.array([np.min(contours_main[j][:, 0, 1]) for j in range(len(contours_main))]) y_max_main = np.array([np.max(contours_main[j][:, 0, 1]) for j in range(len(contours_main))]) - if len(contours_header) != None: + if len(contours_header) is not None: areas_header = np.array([cv2.contourArea(contours_header[j]) for j in range(len(contours_header))]) M_header = [cv2.moments(contours_header[j]) for j in range(len(contours_header))] cx_header = [(M_header[j]["m10"] / (M_header[j]["m00"] + 1e-32)) for j in range(len(M_header))] @@ -1235,17 +1232,16 @@ def order_of_regions(textline_mask, contours_main, contours_header, y_ref): y_max_header = np.array([np.max(contours_header[j][:, 0, 1]) for j in range(len(contours_header))]) # print(cy_main,'mainy') - peaks_neg_new = [] - peaks_neg_new.append(0 + y_ref) + peaks_neg_new = [0 + y_ref] for iii in range(len(peaks_neg)): peaks_neg_new.append(peaks_neg[iii] + y_ref) peaks_neg_new.append(textline_mask.shape[0] + y_ref) if len(cy_main) > 0 and np.max(cy_main) > np.max(peaks_neg_new): cy_main = np.array(cy_main) * (np.max(peaks_neg_new) / np.max(cy_main)) - 10 - if contours_main != None: + if contours_main is not None: indexer_main = np.arange(len(contours_main)) - if contours_main != None: + if contours_main is not None: len_main = len(contours_main) else: len_main = 0 @@ -1271,11 +1267,11 @@ def order_of_regions(textline_mask, contours_main, contours_header, y_ref): top = peaks_neg_new[i] down = peaks_neg_new[i + 1] indexes_in = matrix_of_orders[:, 0][(matrix_of_orders[:, 3] >= top) & - ((matrix_of_orders[:, 3] < down))] + (matrix_of_orders[:, 3] < down)] cxs_in = matrix_of_orders[:, 2][(matrix_of_orders[:, 3] >= top) & - ((matrix_of_orders[:, 3] < down))] + (matrix_of_orders[:, 3] < down)] cys_in = matrix_of_orders[:, 3][(matrix_of_orders[:, 3] >= top) & - ((matrix_of_orders[:, 3] < down))] + (matrix_of_orders[:, 3] < down)] types_of_text = matrix_of_orders[:, 1][(matrix_of_orders[:, 3] >= top) & (matrix_of_orders[:, 3] < down)] index_types_of_text = matrix_of_orders[:, 4][(matrix_of_orders[:, 3] >= top) & @@ -1404,8 +1400,7 @@ def combine_hor_lines_and_delete_cross_points_and_get_lines_features_back_new( return img_p_in[:,:,0], special_separators def return_points_with_boundies(peaks_neg_fin, first_point, last_point): - peaks_neg_tot = [] - peaks_neg_tot.append(first_point) + peaks_neg_tot = [first_point] for ii in range(len(peaks_neg_fin)): peaks_neg_tot.append(peaks_neg_fin[ii]) peaks_neg_tot.append(last_point) @@ -1413,7 +1408,7 @@ def return_points_with_boundies(peaks_neg_fin, first_point, last_point): def find_number_of_columns_in_document(region_pre_p, num_col_classifier, tables, pixel_lines, contours_h=None): t_ins_c0 = time.time() - separators_closeup=( (region_pre_p[:,:,:]==pixel_lines))*1 + separators_closeup= (region_pre_p[:, :, :] == pixel_lines) * 1 separators_closeup[0:110,:,:]=0 separators_closeup[separators_closeup.shape[0]-150:,:,:]=0 @@ -1452,7 +1447,7 @@ def find_number_of_columns_in_document(region_pre_p, num_col_classifier, tables, gray = cv2.bitwise_not(separators_closeup_n_binary) gray=gray.astype(np.uint8) - bw = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, \ + bw = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 15, -2) horizontal = np.copy(bw) vertical = np.copy(bw) @@ -1588,8 +1583,7 @@ def find_number_of_columns_in_document(region_pre_p, num_col_classifier, tables, args_cy_splitter=np.argsort(cy_main_splitters) cy_main_splitters_sort=cy_main_splitters[args_cy_splitter] - splitter_y_new=[] - splitter_y_new.append(0) + splitter_y_new= [0] for i in range(len(cy_main_splitters_sort)): splitter_y_new.append( cy_main_splitters_sort[i] ) splitter_y_new.append(region_pre_p.shape[0]) @@ -1663,8 +1657,7 @@ def return_boxes_of_images_by_order_of_reading_new( num_col, peaks_neg_fin = find_num_col( regions_without_separators[int(splitter_y_new[i]):int(splitter_y_new[i+1]),:], num_col_classifier, tables, multiplier=3.) - peaks_neg_fin_early=[] - peaks_neg_fin_early.append(0) + peaks_neg_fin_early= [0] #print(peaks_neg_fin,'peaks_neg_fin') for p_n in peaks_neg_fin: peaks_neg_fin_early.append(p_n) diff --git a/src/eynollah/utils/contour.py b/src/eynollah/utils/contour.py index 0e84153..0be8879 100644 --- a/src/eynollah/utils/contour.py +++ b/src/eynollah/utils/contour.py @@ -239,8 +239,7 @@ def do_back_rotation_and_get_cnt_back(contour_par, index_r_con, img, slope_first cont_int, _ = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE) if len(cont_int)==0: - cont_int = [] - cont_int.append(contour_par) + cont_int = [contour_par] confidence_contour = 0 else: cont_int[0][:, 0, 0] = cont_int[0][:, 0, 0] + np.abs(img_copy.shape[1] - img.shape[1]) diff --git a/src/eynollah/utils/counter.py b/src/eynollah/utils/counter.py index 9a3ed70..e6205c8 100644 --- a/src/eynollah/utils/counter.py +++ b/src/eynollah/utils/counter.py @@ -3,7 +3,7 @@ from collections import Counter REGION_ID_TEMPLATE = 'region_%04d' LINE_ID_TEMPLATE = 'region_%04d_line_%04d' -class EynollahIdCounter(): +class EynollahIdCounter: def __init__(self, region_idx=0, line_idx=0): self._counter = Counter() diff --git a/src/eynollah/utils/marginals.py b/src/eynollah/utils/marginals.py index ac8dc1d..9ec0737 100644 --- a/src/eynollah/utils/marginals.py +++ b/src/eynollah/utils/marginals.py @@ -76,7 +76,7 @@ def get_marginals(text_with_lines, text_regions, num_col, slope_deskew, light_ve peaks, _ = find_peaks(text_with_lines_y_rev, height=0) peaks=np.array(peaks) - peaks=peaks[(peaks>first_nonzero) & ((peaksfirst_nonzero) & (peaks < last_nonzero)] peaks=peaks[region_sum_0[peaks] cut_off: if not np.isnan(forest[np.argmin(z[forest])]): peaks_neg_true.append(forest[np.argmin(z[forest])]) - forest = [] - forest.append(peaks_neg[i + 1]) + forest = [peaks_neg[i + 1]] if i == (len(peaks_neg) - 1): if not np.isnan(forest[np.argmin(z[forest])]): peaks_neg_true.append(forest[np.argmin(z[forest])]) @@ -1195,8 +1194,7 @@ def separate_lines_new_inside_tiles(img_path, thetha): if diff_peaks_pos[i] > cut_off: if not np.isnan(forest[np.argmax(z[forest])]): peaks_pos_true.append(forest[np.argmax(z[forest])]) - forest = [] - forest.append(peaks[i + 1]) + forest = [peaks[i + 1]] if i == (len(peaks) - 1): if not np.isnan(forest[np.argmax(z[forest])]): peaks_pos_true.append(forest[np.argmax(z[forest])]) @@ -1430,9 +1428,9 @@ def separate_lines_new2(img_path, thetha, num_col, slope_region, logger=None, pl img_int = np.zeros((img_xline.shape[0], img_xline.shape[1])) img_int[:, :] = img_xline[:, :] # img_patch_org[:,:,0] - img_resized = np.zeros((int(img_int.shape[0] * (1.2)), int(img_int.shape[1] * (3)))) - img_resized[int(img_int.shape[0] * (0.1)) : int(img_int.shape[0] * (0.1)) + img_int.shape[0], - int(img_int.shape[1] * (1.0)) : int(img_int.shape[1] * (1.0)) + img_int.shape[1]] = img_int[:, :] + img_resized = np.zeros((int(img_int.shape[0] * 1.2), int(img_int.shape[1] * 3))) + img_resized[int(img_int.shape[0] * 0.1): int(img_int.shape[0] * 0.1) + img_int.shape[0], + int(img_int.shape[1] * 1.0): int(img_int.shape[1] * 1.0) + img_int.shape[1]] = img_int[:, :] # plt.imshow(img_xline) # plt.show() img_line_rotated = rotate_image(img_resized, slopes_tile_wise[i]) @@ -1444,8 +1442,8 @@ def separate_lines_new2(img_path, thetha, num_col, slope_region, logger=None, pl img_patch_separated_returned[:, :][img_patch_separated_returned[:, :] != 0] = 1 img_patch_separated_returned_true_size = img_patch_separated_returned[ - int(img_int.shape[0] * (0.1)) : int(img_int.shape[0] * (0.1)) + img_int.shape[0], - int(img_int.shape[1] * (1.0)) : int(img_int.shape[1] * (1.0)) + img_int.shape[1]] + int(img_int.shape[0] * 0.1): int(img_int.shape[0] * 0.1) + img_int.shape[0], + int(img_int.shape[1] * 1.0): int(img_int.shape[1] * 1.0) + img_int.shape[1]] img_patch_separated_returned_true_size = img_patch_separated_returned_true_size[:, margin : length_x - margin] img_patch_ineterst_revised[:, index_x_d + margin : index_x_u - margin] = img_patch_separated_returned_true_size @@ -1473,7 +1471,7 @@ def return_deskew_slop(img_patch_org, sigma_des,n_tot_angles=100, img_int[:,:]=img_patch_org[:,:]#img_patch_org[:,:,0] max_shape=np.max(img_int.shape) - img_resized=np.zeros((int( max_shape*(1.1) ) , int( max_shape*(1.1) ) )) + img_resized=np.zeros((int(max_shape * 1.1) , int(max_shape * 1.1))) onset_x=int((img_resized.shape[1]-img_int.shape[1])/2.) onset_y=int((img_resized.shape[0]-img_int.shape[0])/2.) @@ -1538,7 +1536,7 @@ def return_deskew_slop_old_mp(img_patch_org, sigma_des,n_tot_angles=100, img_int[:,:]=img_patch_org[:,:]#img_patch_org[:,:,0] max_shape=np.max(img_int.shape) - img_resized=np.zeros((int( max_shape*(1.1) ) , int( max_shape*(1.1) ) )) + img_resized=np.zeros((int(max_shape * 1.1) , int(max_shape * 1.1))) onset_x=int((img_resized.shape[1]-img_int.shape[1])/2.) onset_y=int((img_resized.shape[0]-img_int.shape[0])/2.) diff --git a/src/eynollah/writer.py b/src/eynollah/writer.py index 2f9caf3..4d97153 100644 --- a/src/eynollah/writer.py +++ b/src/eynollah/writer.py @@ -21,7 +21,7 @@ from ocrd_models.ocrd_page import ( ) import numpy as np -class EynollahXmlWriter(): +class EynollahXmlWriter: def __init__(self, *, dir_out, image_filename, curved_line,textline_light, pcgts=None): self.logger = getLogger('eynollah.writer') diff --git a/train/LICENSE b/train/LICENSE deleted file mode 100644 index 261eeb9..0000000 --- a/train/LICENSE +++ /dev/null @@ -1,201 +0,0 @@ - Apache License - Version 2.0, January 2004 - http://www.apache.org/licenses/ - - TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION - - 1. Definitions. - - "License" shall mean the terms and conditions for use, reproduction, - and distribution as defined by Sections 1 through 9 of this document. - - "Licensor" shall mean the copyright owner or entity authorized by - the copyright owner that is granting the License. - - "Legal Entity" shall mean the union of the acting entity and all - other entities that control, are controlled by, or are under common - control with that entity. For the purposes of this definition, - "control" means (i) the power, direct or indirect, to cause the - direction or management of such entity, whether by contract or - otherwise, or (ii) ownership of fifty percent (50%) or more of the - outstanding shares, or (iii) beneficial ownership of such entity. - - "You" (or "Your") shall mean an individual or Legal Entity - exercising permissions granted by this License. - - "Source" form shall mean the preferred form for making modifications, - including but not limited to software source code, documentation - source, and configuration files. - - "Object" form shall mean any form resulting from mechanical - transformation or translation of a Source form, including but - not limited to compiled object code, generated documentation, - and conversions to other media types. - - "Work" shall mean the work of authorship, whether in Source or - Object form, made available under the License, as indicated by a - copyright notice that is included in or attached to the work - (an example is provided in the Appendix below). - - "Derivative Works" shall mean any work, whether in Source or Object - form, that is based on (or derived from) the Work and for which the - editorial revisions, annotations, elaborations, or other modifications - represent, as a whole, an original work of authorship. For the purposes - of this License, Derivative Works shall not include works that remain - separable from, or merely link (or bind by name) to the interfaces of, - the Work and Derivative Works thereof. - - "Contribution" shall mean any work of authorship, including - the original version of the Work and any modifications or additions - to that Work or Derivative Works thereof, that is intentionally - submitted to Licensor for inclusion in the Work by the copyright owner - or by an individual or Legal Entity authorized to submit on behalf of - the copyright owner. For the purposes of this definition, "submitted" - means any form of electronic, verbal, or written communication sent - to the Licensor or its representatives, including but not limited to - communication on electronic mailing lists, source code control systems, - and issue tracking systems that are managed by, or on behalf of, the - Licensor for the purpose of discussing and improving the Work, but - excluding communication that is conspicuously marked or otherwise - designated in writing by the copyright owner as "Not a Contribution." - - "Contributor" shall mean Licensor and any individual or Legal Entity - on behalf of whom a Contribution has been received by Licensor and - subsequently incorporated within the Work. - - 2. Grant of Copyright License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - copyright license to reproduce, prepare Derivative Works of, - publicly display, publicly perform, sublicense, and distribute the - Work and such Derivative Works in Source or Object form. - - 3. Grant of Patent License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - (except as stated in this section) patent license to make, have made, - use, offer to sell, sell, import, and otherwise transfer the Work, - where such license applies only to those patent claims licensable - by such Contributor that are necessarily infringed by their - Contribution(s) alone or by combination of their Contribution(s) - with the Work to which such Contribution(s) was submitted. If You - institute patent litigation against any entity (including a - cross-claim or counterclaim in a lawsuit) alleging that the Work - or a Contribution incorporated within the Work constitutes direct - or contributory patent infringement, then any patent licenses - granted to You under this License for that Work shall terminate - as of the date such litigation is filed. - - 4. Redistribution. You may reproduce and distribute copies of the - Work or Derivative Works thereof in any medium, with or without - modifications, and in Source or Object form, provided that You - meet the following conditions: - - (a) You must give any other recipients of the Work or - Derivative Works a copy of this License; and - - (b) You must cause any modified files to carry prominent notices - stating that You changed the files; and - - (c) You must retain, in the Source form of any Derivative Works - that You distribute, all copyright, patent, trademark, and - attribution notices from the Source form of the Work, - excluding those notices that do not pertain to any part of - the Derivative Works; and - - (d) If the Work includes a "NOTICE" text file as part of its - distribution, then any Derivative Works that You distribute must - include a readable copy of the attribution notices contained - within such NOTICE file, excluding those notices that do not - pertain to any part of the Derivative Works, in at least one - of the following places: within a NOTICE text file distributed - as part of the Derivative Works; within the Source form or - documentation, if provided along with the Derivative Works; or, - within a display generated by the Derivative Works, if and - wherever such third-party notices normally appear. The contents - of the NOTICE file are for informational purposes only and - do not modify the License. You may add Your own attribution - notices within Derivative Works that You distribute, alongside - or as an addendum to the NOTICE text from the Work, provided - that such additional attribution notices cannot be construed - as modifying the License. - - You may add Your own copyright statement to Your modifications and - may provide additional or different license terms and conditions - for use, reproduction, or distribution of Your modifications, or - for any such Derivative Works as a whole, provided Your use, - reproduction, and distribution of the Work otherwise complies with - the conditions stated in this License. - - 5. Submission of Contributions. Unless You explicitly state otherwise, - any Contribution intentionally submitted for inclusion in the Work - by You to the Licensor shall be under the terms and conditions of - this License, without any additional terms or conditions. - Notwithstanding the above, nothing herein shall supersede or modify - the terms of any separate license agreement you may have executed - with Licensor regarding such Contributions. - - 6. Trademarks. This License does not grant permission to use the trade - names, trademarks, service marks, or product names of the Licensor, - except as required for reasonable and customary use in describing the - origin of the Work and reproducing the content of the NOTICE file. - - 7. Disclaimer of Warranty. Unless required by applicable law or - agreed to in writing, Licensor provides the Work (and each - Contributor provides its Contributions) on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - implied, including, without limitation, any warranties or conditions - of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A - PARTICULAR PURPOSE. You are solely responsible for determining the - appropriateness of using or redistributing the Work and assume any - risks associated with Your exercise of permissions under this License. - - 8. Limitation of Liability. In no event and under no legal theory, - whether in tort (including negligence), contract, or otherwise, - unless required by applicable law (such as deliberate and grossly - negligent acts) or agreed to in writing, shall any Contributor be - liable to You for damages, including any direct, indirect, special, - incidental, or consequential damages of any character arising as a - result of this License or out of the use or inability to use the - Work (including but not limited to damages for loss of goodwill, - work stoppage, computer failure or malfunction, or any and all - other commercial damages or losses), even if such Contributor - has been advised of the possibility of such damages. - - 9. Accepting Warranty or Additional Liability. While redistributing - the Work or Derivative Works thereof, You may choose to offer, - and charge a fee for, acceptance of support, warranty, indemnity, - or other liability obligations and/or rights consistent with this - License. However, in accepting such obligations, You may act only - on Your own behalf and on Your sole responsibility, not on behalf - of any other Contributor, and only if You agree to indemnify, - defend, and hold each Contributor harmless for any liability - incurred by, or claims asserted against, such Contributor by reason - of your accepting any such warranty or additional liability. - - END OF TERMS AND CONDITIONS - - APPENDIX: How to apply the Apache License to your work. - - To apply the Apache License to your work, attach the following - boilerplate notice, with the fields enclosed by brackets "[]" - replaced with your own identifying information. (Don't include - the brackets!) The text should be enclosed in the appropriate - comment syntax for the file format. We also recommend that a - file or class name and description of purpose be included on the - same "printed page" as the copyright notice for easier - identification within third-party archives. - - Copyright [yyyy] [name of copyright owner] - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. diff --git a/train/inference.py b/train/inference.py index 094c528..595cfe7 100644 --- a/train/inference.py +++ b/train/inference.py @@ -173,7 +173,7 @@ class sbb_predict: ##if self.weights_dir!=None: ##self.model.load_weights(self.weights_dir) - if (self.task != 'classification' and self.task != 'reading_order'): + if self.task != 'classification' and self.task != 'reading_order': self.img_height=self.model.layers[len(self.model.layers)-1].output_shape[1] self.img_width=self.model.layers[len(self.model.layers)-1].output_shape[2] self.n_classes=self.model.layers[len(self.model.layers)-1].output_shape[3] @@ -305,8 +305,7 @@ class sbb_predict: input_1= np.zeros( (inference_bs, img_height, img_width,3)) - starting_list_of_regions = [] - starting_list_of_regions.append( list(range(labels_con.shape[2])) ) + starting_list_of_regions = [list(range(labels_con.shape[2]))] index_update = 0 index_selected = starting_list_of_regions[0] @@ -561,7 +560,7 @@ class sbb_predict: if self.image: res=self.predict(image_dir = self.image) - if (self.task == 'classification' or self.task == 'reading_order'): + if self.task == 'classification' or self.task == 'reading_order': pass elif self.task == 'enhancement': if self.save: @@ -584,7 +583,7 @@ class sbb_predict: image_dir = os.path.join(self.dir_in, ind_image) res=self.predict(image_dir) - if (self.task == 'classification' or self.task == 'reading_order'): + if self.task == 'classification' or self.task == 'reading_order': pass elif self.task == 'enhancement': self.save = os.path.join(self.out, f_name+'.png') @@ -665,7 +664,7 @@ def main(image, dir_in, model, patches, save, save_layout, ground_truth, xml_fil with open(os.path.join(model,'config.json')) as f: config_params_model = json.load(f) task = config_params_model['task'] - if (task != 'classification' and task != 'reading_order'): + if task != 'classification' and task != 'reading_order': if image and not save: print("Error: You used one of segmentation or binarization task with image input but not set -s, you need a filename to save visualized output with -s") sys.exit(1) diff --git a/train/models.py b/train/models.py index 8841bd3..fdc5437 100644 --- a/train/models.py +++ b/train/models.py @@ -394,7 +394,9 @@ def resnet50_unet(n_classes, input_height=224, input_width=224, task="segmentati return model -def vit_resnet50_unet(n_classes, patch_size_x, patch_size_y, num_patches, mlp_head_units=[128, 64], transformer_layers=8, num_heads =4, projection_dim = 64, input_height=224, input_width=224, task="segmentation", weight_decay=1e-6, pretraining=False): +def vit_resnet50_unet(n_classes, patch_size_x, patch_size_y, num_patches, mlp_head_units=None, transformer_layers=8, num_heads =4, projection_dim = 64, input_height=224, input_width=224, task="segmentation", weight_decay=1e-6, pretraining=False): + if mlp_head_units is None: + mlp_head_units = [128, 64] inputs = layers.Input(shape=(input_height, input_width, 3)) #transformer_units = [ @@ -516,7 +518,9 @@ def vit_resnet50_unet(n_classes, patch_size_x, patch_size_y, num_patches, mlp_he return model -def vit_resnet50_unet_transformer_before_cnn(n_classes, patch_size_x, patch_size_y, num_patches, mlp_head_units=[128, 64], transformer_layers=8, num_heads =4, projection_dim = 64, input_height=224, input_width=224, task="segmentation", weight_decay=1e-6, pretraining=False): +def vit_resnet50_unet_transformer_before_cnn(n_classes, patch_size_x, patch_size_y, num_patches, mlp_head_units=None, transformer_layers=8, num_heads =4, projection_dim = 64, input_height=224, input_width=224, task="segmentation", weight_decay=1e-6, pretraining=False): + if mlp_head_units is None: + mlp_head_units = [128, 64] inputs = layers.Input(shape=(input_height, input_width, 3)) ##transformer_units = [ diff --git a/train/train.py b/train/train.py index e8e92af..add878a 100644 --- a/train/train.py +++ b/train/train.py @@ -269,10 +269,10 @@ def run(_config, n_classes, n_epochs, input_height, num_patches = num_patches_x * num_patches_y if transformer_cnn_first: - if (input_height != (num_patches_y * transformer_patchsize_y * 32) ): + if input_height != (num_patches_y * transformer_patchsize_y * 32): print("Error: transformer_patchsize_y or transformer_num_patches_xy height value error . input_height should be equal to ( transformer_num_patches_xy height value * transformer_patchsize_y * 32)") sys.exit(1) - if (input_width != (num_patches_x * transformer_patchsize_x * 32) ): + if input_width != (num_patches_x * transformer_patchsize_x * 32): print("Error: transformer_patchsize_x or transformer_num_patches_xy width value error . input_width should be equal to ( transformer_num_patches_xy width value * transformer_patchsize_x * 32)") sys.exit(1) if (transformer_projection_dim % (transformer_patchsize_y * transformer_patchsize_x)) != 0: @@ -282,10 +282,10 @@ def run(_config, n_classes, n_epochs, input_height, model = vit_resnet50_unet(n_classes, transformer_patchsize_x, transformer_patchsize_y, num_patches, transformer_mlp_head_units, transformer_layers, transformer_num_heads, transformer_projection_dim, input_height, input_width, task, weight_decay, pretraining) else: - if (input_height != (num_patches_y * transformer_patchsize_y) ): + if input_height != (num_patches_y * transformer_patchsize_y): print("Error: transformer_patchsize_y or transformer_num_patches_xy height value error . input_height should be equal to ( transformer_num_patches_xy height value * transformer_patchsize_y)") sys.exit(1) - if (input_width != (num_patches_x * transformer_patchsize_x) ): + if input_width != (num_patches_x * transformer_patchsize_x): print("Error: transformer_patchsize_x or transformer_num_patches_xy width value error . input_width should be equal to ( transformer_num_patches_xy width value * transformer_patchsize_x)") sys.exit(1) if (transformer_projection_dim % (transformer_patchsize_y * transformer_patchsize_x)) != 0: @@ -297,7 +297,7 @@ def run(_config, n_classes, n_epochs, input_height, model.summary() - if (task == "segmentation" or task == "binarization"): + if task == "segmentation" or task == "binarization": if not is_loss_soft_dice and not weighted_loss: model.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate=learning_rate), metrics=['accuracy']) @@ -365,8 +365,7 @@ def run(_config, n_classes, n_epochs, input_height, y_tot=np.zeros((testX.shape[0],n_classes)) - score_best=[] - score_best.append(0) + score_best= [0] num_rows = return_number_of_total_training_data(dir_train) weights=[] diff --git a/train/utils.py b/train/utils.py index 2bb7261..ead4887 100644 --- a/train/utils.py +++ b/train/utils.py @@ -260,7 +260,7 @@ def generate_data_from_folder_training(path_classes, batchsize, height, width, n if batchcount>=batchsize: ret_x = ret_x/255. - yield (ret_x, ret_y) + yield ret_x, ret_y ret_x= np.zeros((batchsize, height,width, 3)).astype(np.int16) ret_y= np.zeros((batchsize, n_classes)).astype(np.int16) batchcount = 0 @@ -446,7 +446,7 @@ def generate_arrays_from_folder_reading_order(classes_file_dir, modal_dir, batch ret_y[batchcount, :] = label_class batchcount+=1 if batchcount>=batchsize: - yield (ret_x, ret_y) + yield ret_x, ret_y ret_x= np.zeros((batchsize, height, width, 3))#.astype(np.int16) ret_y= np.zeros((batchsize, n_classes)).astype(np.int16) batchcount = 0 @@ -464,7 +464,7 @@ def generate_arrays_from_folder_reading_order(classes_file_dir, modal_dir, batch ret_y[batchcount, :] = label_class batchcount+=1 if batchcount>=batchsize: - yield (ret_x, ret_y) + yield ret_x, ret_y ret_x= np.zeros((batchsize, height, width, 3))#.astype(np.int16) ret_y= np.zeros((batchsize, n_classes)).astype(np.int16) batchcount = 0