This project is part of an Information Retrieval (CS60092) assignment at IIT Kharagpur. (A PDF of the work can be found in the repository)
- Python (Version - 3.8.10)
- re
- sys
- string
- nltk
- pickle
- NumPy
In case of any missing library, kindly install it using
- pip3 install < library name > (for Python) (Some libraries mentioned above come as part of python3)
We created an inverted index from the documents. Using the query, fetched the relevant documents from the inverted index.
The results of the query are evaluated against gold standard relevance.
- Length of the inverted index (Vocabulary) - 8813
- Number of queries that returned document IDs - 14 out of 225
Few evaluation metrics
- lnc.ltc
- Mean Average Precision (MAP) @ 10 = 0.4509512345679012
- Mean Average Precision (MAP) @ 20 = 0.40917921521906336
- Mean Normalized Discounted Cumulative Gain (NDCG) @ 10 = 0.35613102024509247
- Mean Normalized Discounted Cumulative Gain (NDCG) @ 20 = 0.387901807896173
- lnc.Ltc
- Mean Average Precision (MAP) @ 10 = 0.4509512345679012
- Mean Average Precision (MAP) @ 20 = 0.40917921521906336
- Mean Normalized Discounted Cumulative Gain (NDCG) @ 10 = 0.35613102024509247
- Mean Normalized Discounted Cumulative Gain (NDCG) @ 20 = 0.387901807896173
- anc.apc
- Mean Average Precision (MAP) @ 10 = 0.4321748383303936
- Mean Average Precision (MAP) @ 20 = 0.3955611461367082
- Mean Normalized Discounted Cumulative Gain (NDCG) @ 10 = 0.34492256327874254
- Mean Normalized Discounted Cumulative Gain (NDCG) @ 20 = 0.37787975030561444
To understand how to create an inverted index, retrieve relevant documents as part of the query and various evaluation metrics.