Use MAX file group name instead of BEST
continuous-integration/drone/push Build is failing Details

We were using the file group name BEST for what Kitodo seems to call
MAX by convention. So we use MAX now.

Currently, we work under the assumption that, if MAX exists in the METS
retrieved by OAI-PMH, it's not what we want and we replace it with our
own IIIF URLS with full size.

Fixes GH-43.
master
Gerber, Mike 4 years ago
parent 94a035e2cf
commit 691be243f6

@ -88,7 +88,7 @@ The document must be specified by its PPN, for example:
~~~ ~~~
~/devel/ocrd-galley/ppn2ocr PPN77164308X ~/devel/ocrd-galley/ppn2ocr PPN77164308X
cd PPN77164308X cd PPN77164308X
~/devel/ocrd-galley/my_ocrd_workflow -I BEST --skip-validation ~/devel/ocrd-galley/my_ocrd_workflow -I MAX --skip-validation
~~~ ~~~
This produces a workspace directory `PPN77164308X` with the OCR results in it; This produces a workspace directory `PPN77164308X` with the OCR results in it;

@ -91,15 +91,20 @@ def make_workspace(ppn, workspace):
remove_file_grp(mets, 'PRESENTATION') remove_file_grp(mets, 'PRESENTATION')
remove_file_grp(mets, 'LOCAL') remove_file_grp(mets, 'LOCAL')
# Duplicate DEFAULT file group into a new file group BEST
# Delete MAX file group - we assume that, if it exists, it is not as
# we expect it, e.g. IIIF full URLs
remove_file_grp(mets, 'MAX')
# Duplicate DEFAULT file group into a new file group MAX
format_ = 'tif' format_ = 'tif'
file_grp_default = mets.find('//mets:fileGrp[@USE="DEFAULT"]', namespaces=XMLNS) file_grp_default = mets.find('//mets:fileGrp[@USE="DEFAULT"]', namespaces=XMLNS)
file_grp_best = deepcopy(file_grp_default) file_grp_best = deepcopy(file_grp_default)
file_grp_best.attrib['USE'] = 'BEST' file_grp_best.attrib['USE'] = 'MAX'
for f in file_grp_best.findall('./mets:file', namespaces=XMLNS): for f in file_grp_best.findall('./mets:file', namespaces=XMLNS):
old_id = f.attrib['ID'] old_id = f.attrib['ID']
new_id = re.sub('DEFAULT', 'BEST', old_id) new_id = re.sub('DEFAULT', 'MAX', old_id)
f.attrib['ID'] = new_id f.attrib['ID'] = new_id
f.attrib['MIMETYPE'] = mime_type_for_format(format_) f.attrib['MIMETYPE'] = mime_type_for_format(format_)
@ -157,7 +162,7 @@ def ppn2ocr(ppn):
# XXX # XXX
# subprocess.run([ # subprocess.run([
# os.path.join(self_dir, 'run-docker-hub'), # os.path.join(self_dir, 'run-docker-hub'),
# '-I', 'BEST', # '-I', 'MAX',
# '--skip-validation' # '--skip-validation'
# ]) # ])

Loading…
Cancel
Save