-
Notifications
You must be signed in to change notification settings - Fork 121
added markdown document for ocr engine comparison #577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
* **Bad**, because increases support complexity with multiple engines | ||
|
||
### Confirmation | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elaborate on how this is done. I would assume that you have the 100+ PDFs at hand and wrote a test suite?
|
||
* Current implementation uses Tesseract 4.x with LSTM engine | ||
* In benchmarks, Google Cloud Vision shows the highest overall accuracy | ||
* Handwriting (categories 2 & 3) is the main differentiator among engines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are these catorgies mentioned?
|
||
The web resources that informed this ADR: | ||
|
||
1. <https://www.mdpi.com/2073-8994/12/5/715> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link that to each pro/con agrument
@@ -0,0 +1,153 @@ | |||
# ADR-002: OCR Engine Selection for JabRef |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to follow the format given at JabRef's repo - and place it in the JabRef folder. https://github.com/JabRef/jabref/tree/main/docs/decisions
I think, this is AI generated, because I cannot explain otherwise why A) this takes number 0002 - and in the heading.
(And does not follow the MADR format)
should go to devdocs: jabref/docs/decisions |
This is related to gsoc ocr project by Kaan Erdem.
JabRef/jabref#13313