Soha is the name of a Persian chatbot that will be used by the faculty, students, and members of the Department of Computer Science and Engineering of Shahid Beheshti University.
This chatbot will process the users' inputs and provide guidelines to them for their requests. This project handles the information extraction phase of this megaproject by extracting entities from unstructured inputs such as:
- The student ID
- Students' entry year
- Students' GPA
- The name of the student
- The name of the course
- Type of the request (Like dropping a course, semester withdrawal, etc.)
A combination of rule-based methods (using regex) and deep learning methods (using the BERT language model) was used for this task.
Also, due to the unavailability of a dataset that meets this project's needs, a crowdsourcing website was launched to encourage people to enter their possible requests in a two-week contest with prizes. This dataset is going to be used for other subprojects of the Soha system.
This repository contains:
- Codes related to a MERNG crowdsourcing web application to collect students' sample inputs using gamification methods
- A Jupyter notebook containing the Information Extraction phase of the project using the BERT language model.
Notebook | Link |
---|---|
Jupyter Notebook |
Parsa Hejabi - @callme_parsa
Project Link: https://github.com/ParsaHejabi/DS-BankFinder-Project