Skip to content

Commit 1d4b2ff

Browse files
Update default.py - Missing cosine_similarity transform for docs with Token Count 101–500 (#1934)
The `default_ transforms` function defined at `src/ragas/testset/transforms/default.py` has a problem with handling transforms for documents with 101-500 tokens. The code divides the `transforms` configurations based on the document's token count. Several transforms are instantiated when the ["101-500" token count bins the first quartile (Q1](https://github.com/explodinggradients/ragas/blob/2bc29a2b8358ddb6b167fdf7ab0518ad9371463c/src/ragas/testset/transforms/default.py#L128), among them the `cosine_sim_builder`. While `cosine_sim_builder` is correctly **instantiated** (line 139), it's then **not included** in the list of transforms that are actually returned (line 153). It appears that `cosine_sim_builder` was likely unintentionally omitted from the returned transforms list. The intended behavior should probably mirror how `ner_overlap_sim` is handled (line 120), where `cosine_sim_builder` is instantiated and added to the returned list. The current code effectively instantiates `cosine_sim_builder` but then discards it. This omission might impact the number of relationships created in the knowledge graph.
1 parent 776afaa commit 1d4b2ff

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

src/ragas/testset/transforms/default.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ def filter_chunks(node):
154154
summary_extractor,
155155
node_filter,
156156
Parallel(summary_emb_extractor, theme_extractor, ner_extractor),
157-
ner_overlap_sim,
157+
Parallel(cosine_sim_builder, ner_overlap_sim),
158158
]
159159
else:
160160
raise ValueError(

0 commit comments

Comments
 (0)