-
Notifications
You must be signed in to change notification settings - Fork 7
Google Summer of Code 2020
Below is a short list of project ideas for Google Summer of Code 2020
Expand content on Sangraha by utilizing OCR to extract text from PDF documents. The latest version of Tesseract does OCR using deep learning. There is already trained data available for Devanagiri which we have previously tried using and didn’t get optimal translation. The trained data can be enhanced to support Nepali language or we can create a new one for Nepali language. Mentor: Pragya Tripathi
Skills required: Java, Machine learning
Lucene and Elasticsearch does not support Nepali language stemmer. Currently, we are using Hindi language stemmer as a workaround. To improve the quality of the search we plan to implement Nepali stemmer in Elasticsearch. This project is a good opportunity to give back to open source projects that Sangraha depends upon. Mentor: Anup Dhamala
Skills required: Java, NLP
We want to integrate Wiki.js in Sangraha to allow crowdsourcing of the content. Wiki.js will allow users to add new content and admins to moderate them. It will also provide user management features. Mentor: Prasanna Suman
Skills required: Javascript