Gerber, Mike
|
bdab016e2c
|
✨ Use GT4HistOCR_2000000 model from qurator-data for Tesseract
|
5 years ago |
Gerber, Mike
|
47dd5d3b62
|
🎨 Move XML schemata to a better path
|
5 years ago |
Gerber, Mike
|
af2034400a
|
🎨 Add extra newlines to separate steps
|
5 years ago |
Gerber, Mike
|
1863439d92
|
💩 Remove extra Pillow dependency workarounds
|
5 years ago |
Gerber, Mike
|
e5cd5b937e
|
✨ Run pip3 list for easier checking
|
5 years ago |
Gerber, Mike
|
bd24624bd7
|
⬆ Do not downgrade to PAGE 2018 anymore
|
5 years ago |
Gerber, Mike
|
0b2b66a0b4
|
🔧 Allow setting LOG_LEVEL
|
5 years ago |
Gerber, Mike
|
f19bba45b8
|
💩 Remove mysterious TEMP directory for now
|
5 years ago |
Gerber, Mike
|
68902f923d
|
📜 Downgrading to PAGE 2018 is not the last step anymore
|
5 years ago |
Gerber, Mike
|
6c0d7e0aee
|
💩 Do not fix PAGE image references for now
|
5 years ago |
Gerber, Mike
|
343a3fbf82
|
🔧 Evaluate both Tesseract and Calamari results
|
5 years ago |
Gerber, Mike
|
0bc06c2fad
|
✨ Run Calamari OCR
|
5 years ago |
Gerber, Mike
|
daed87566e
|
🚑 Don't install typegroups classifier for now
|
5 years ago |
Gerber, Mike
|
d8f3438ac5
|
🚑 Don't check pixel density
|
5 years ago |
Gerber, Mike
|
85ff80d548
|
✨ Use dinglehopper's new OCR-D interface
|
5 years ago |
Gerber, Mike
|
d5aa273b44
|
🚧 Use ocr-eval aka dinglehopper
|
5 years ago |
Gerber, Mike
|
be5750f4e1
|
✨ As a last step, downgrade to PAGE 2018 to support PAGE Viewer
|
5 years ago |
Gerber, Mike
|
cf2b4de2a0
|
🧹 Validate again after fixing image references
|
5 years ago |
Gerber, Mike
|
21e00932be
|
🐛 Use a valid filegrp USE for fontident
|
5 years ago |
Gerber, Mike
|
ade39a278c
|
🎨 Align file groups
|
5 years ago |
Gerber, Mike
|
3fee2d4fe6
|
📌 Use my ocrd_typegroups_classifier fix for passing down the page id
|
5 years ago |
Gerber, Mike
|
44772f1923
|
🚧 Work around problems with ocrd-tesserocr producing TextEquiv/@conf
|
5 years ago |
Gerber, Mike
|
8b67866aac
|
✨ Validate PAGE XML after OCR
|
5 years ago |
Gerber, Mike
|
0d7fd21446
|
✨ Validate workspace after each step
|
5 years ago |
Gerber, Mike
|
de841746e3
|
Use PAGE 2019
|
5 years ago |
Gerber, Mike
|
ff0570e151
|
Use frk for now
|
5 years ago |
Gerber, Mike
|
cc81afa1a5
|
🧹 No need to clean up after tesserocr
|
6 years ago |
Gerber, Mike
|
89a2893e4e
|
❌ I do not care for the multiple mets:agents elements
|
6 years ago |
Gerber, Mike
|
0e63fa1756
|
⁉ PyTessApi seems to use both engine modes
|
6 years ago |
Gerber, Mike
|
e3a1afbc93
|
📝 Document the functions
|
6 years ago |
Gerber, Mike
|
f3e37dd16c
|
Do not hardcode path to typegroups model binary
|
6 years ago |
Gerber, Mike
|
8d66469621
|
Binarize images before segmenting
|
6 years ago |
Gerber, Mike
|
5e1ece4877
|
Use ocrd-tesserocr-segment-*
|
6 years ago |
Gerber, Mike
|
e30f03699c
|
TODO Binarization
|
6 years ago |
Gerber, Mike
|
0d5b5b1b17
|
XXX does ocrd_tesserocr use the LSTM engine?
|
6 years ago |
Gerber, Mike
|
16f2f16dbe
|
XXX <error>INCONSISTENCY in TextRegion ID 'dummy'
|
6 years ago |
Gerber, Mike
|
89abc507e0
|
XXX ocrd-ocropy-segment throws an exception for buerger_gedichte_1778.ocrd
|
6 years ago |
Gerber, Mike
|
ad3a7c2b95
|
XXX remove_filegrp link to OCR-D issue
|
6 years ago |
Gerber, Mike
|
f94230c587
|
Set log level to DEBUG again
|
6 years ago |
Gerber, Mike
|
2b2c39d6d4
|
Add a global LOG_LEVEL option
|
6 years ago |
Gerber, Mike
|
fbc3b8ca4f
|
Fix image references
|
6 years ago |
Gerber, Mike
|
b6c490e18b
|
Add a PAGE fix XML step
|
6 years ago |
Gerber, Mike
|
d98ce2d2d4
|
Add a PAGE validation step
|
6 years ago |
Gerber, Mike
|
10c4068a99
|
XXX Global -l DEBUG
|
6 years ago |
Gerber, Mike
|
f8f44e990d
|
Clean up after ocrd-ocropy-segment's mess
|
6 years ago |
Gerber, Mike
|
243ddea674
|
Use ocrd-ocropy-segment instead of non-functional ocrd-tesserocr-segment-line
|
6 years ago |
Gerber, Mike
|
9bd3853c78
|
Add OCR step
|
6 years ago |
Gerber, Mike
|
a64b9cf5c8
|
XXX Multiple calls create multiple identical mets:agent elements
|
6 years ago |
Gerber, Mike
|
c207859bcd
|
Refactor: Extract functions for the steps
|
6 years ago |
Gerber, Mike
|
a2d547b857
|
Reformat to use shorter lines
|
6 years ago |
Gerber, Mike
|
b5f9dcb7f3
|
Initial commit
|
6 years ago |