The university of manchester Postgards - March 2022
Coursework 1 for text mining comp61332
Xinyi Ouyang - Chuhan Qiu - Mingchen Wan - Zhangli Wang - Mochuan Zhan
word tokenization -> word embedding -> sentence representation(BOW, BiLSTM) -> training classifier(NN)
.
├── README.md
├── data
│ ├── dev.txt
│ ├── glove.small.txt
│ ├── raw_data.txt
│ ├── stopword.txt
│ ├── train.txt
│ ├── test.txt
│ └── vocabulary.txt
├── document
│ ├── README.md
│ ├── document.md
│ └── document.pdf
├── src
│ ├── utility
│ │ ├── __init__.py
│ │ ├── file_loader.py
│ │ └── pre_train.py
│ ├── __init__.py
│ ├── biLSTM.py
│ ├── bow.py
│ ├── config.ini
│ ├── model.py
│ └── question_classifier.py
└──
System environment for testing
Manchester computer science virtual machine Ubuntu (64.bit) CSImage 2122 v15 PGT
4GB RAM CPU 3
hardware environment
Huawei Matebook14 2019 windows 11, with 8-gen Core i5 CPU and 8GB RAM
Training dataset
5500-labeled questions
Testing dataset
TREC 10