Evaluate over specifically provided datasets by panalexeu · Pull Request #5 · brandonstarxel/chunking_evaluation

panalexeu · 2025-01-04T15:43:35Z

overview

While evaluating some of my chunking algorithm ideas, I encountered an issue: GeneralEvaluation class, which includes all questions and all corpora datasets, takes too much time to evaluate (in my case, hours).

To address this, I developed the DatasetEvaluation class, which inherits from GeneralEvaluation and accepts a list of dataset names to include in the evaluation. It filters the questions_df and corpus_list properties accordingly.

unit tests

I also included unit tests for the DatasetEvaluation class in tests/test_dataset_evaluation.py. To run them, the pytest package should be installed.

To run the tests, navigate to the root project directory:

cd ./chunking_evaluation

Then, run:

pytest

usage example

import time

from chunking_evaluation import DatasetEvaluation, GeneralEvaluation, Dataset
from chunking_evaluation.chunking import FixedTokenChunker
from chromadb.utils import embedding_functions
from rich import print

ef = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"
)
chunker = FixedTokenChunker()
eval = DatasetEvaluation(
    datasets=[
        Dataset.PUBMED,
        Dataset.WIKITEXTS,
        Dataset.CHATLOGS,
        Dataset.FINANCE,
        Dataset.STATE_OF_THE_UNION,
    ]
)

if __name__ == '__main__':
    start = time.time()
    results = eval.run(chunker, ef)
    end = time.time()

    print(results)
    print(f'TIME: {end - start}')

…tasets

panalexeu · 2025-01-05T12:14:07Z

Hey, I realized I missed something yesterday. In BaseEvaluation, specifically in the run method, there is a loop that iterates over self.questions_df.iter_rows() and retrieves and appends calculated brute_iou_scores, iou_scores, recall_scores, and precision_scores by index. The index values correspond to the indices in questions_df.

However, in _full_precision_score and _scores_from_dataset_and_retrievals methods, these variables were being appended to a list, which caused issues when retrieving them using the questions_df index, resulting in an "Index out of range" error.

To resolve this, I reworked _full_precision_score and _scores_from_dataset_and_retrievals to return dictionaries with the index as the key and the score as the value. This ensures that these variables remain consistent and work correctly with arbitrary datasets number.

Latest commit resolves this issue.

I would like to ask you to carefully review my changes, as I made edits directly to the BaseEvaluation class and may have overlooked some logic.

panalexeu · 2025-01-08T15:33:37Z

Accidentally committed the test_change, reset the branch HEAD to the commit before the test_change.

panalexeu added 7 commits January 3, 2025 17:57

created DatasetEvaluation class with unit tests

069a1de

minor refactoring

30e464f

refactored _load_questions_df method docstring

1b72710

added DatasetEvaluation empty datasets list handling

4c1422e

exported DatasetEvaluation in chunking_evaluation module

ca659de

exported Dataset enum in chunking_evaluation module too

6bcb2bf

minor refactoring

fe61d55

panalexeu changed the title ~~Evaluate over specificly provided datasets~~ Evaluate over specifically provided datasets Jan 4, 2025

brandonstarxel self-assigned this Jan 4, 2025

reworked score calculation logic to support an arbitrary number of da…

27682fc

…tasets

panalexeu force-pushed the feature/dataset-evaluation branch from 0f86fb4 to 27682fc Compare January 8, 2025 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate over specifically provided datasets#5

Evaluate over specifically provided datasets#5
panalexeu wants to merge 8 commits into
brandonstarxel:devfrom
panalexeu:feature/dataset-evaluation

panalexeu commented Jan 4, 2025

Uh oh!

panalexeu commented Jan 5, 2025

Uh oh!

panalexeu commented Jan 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

panalexeu commented Jan 4, 2025

overview

unit tests

usage example

Uh oh!

panalexeu commented Jan 5, 2025

Uh oh!

panalexeu commented Jan 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants