Skip to content

ValueError: num_samples should be a positive integer value, but got num_samples=0 #13

@G-dragondragon

Description

@G-dragondragon

Dear Authors:

Your article presents very meaningful work, and I am very interested in applying your method. However, while following the README document to repeat the procedure, I encountered an issue that I am unable to resolve.

When using my own dataset and proceeding to the step "Meta-train PLMs on the auxiliary tasks", I attempted to adjust the -ts parameter from 2 to 40, the -mi parameter from 1 to 10, as well as other parameters, but I still encountered the following error:

 **********************Current dataset: NMH3_GARDEN_puxiang_2024**********************

Trainable params: 12206080 (1.84%)
All params: 664562614
Traceback (most recent call last):
File "/root/autodl-tmp/FSFP/Pro-FSFP-main/main.py", line 86, in
pipeline(proteins)
File "/root/autodl-tmp/FSFP/Pro-FSFP-main/fsfp/pipeline.py", line 358, in call
report = self.meta_single(meta_train, train, test)
File "/root/autodl-tmp/FSFP/Pro-FSFP-main/fsfp/pipeline.py", line 226, in meta_single
eval_data = MetaRankingSequenceData(eval_splits, tokenizer,
File "/root/autodl-tmp/FSFP/Pro-FSFP-main/fsfp/dataset/base.py", line 119, in init
support_iter = DataLoader(support,
File "/root/autodl-tmp/fsfpenv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 351, in init
sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type]
File "/root/autodl-tmp/fsfpenv/lib/python3.10/site-packages/torch/utils/data/sampler.py", line 107, in init
raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0

One important point to note is that my dataset contains only 47 mutant activity data points, including 2 single-point mutation data points, 23 double-site mutation data points, and 21 triple-site mutation data points. I am unsure whether the small dataset size or the mix of different types of mutants might affect the execution.

Additionally, when reproducing your workflow using DMS data from the ProteinGym database (which is not among the 87 datasets you provided), I found that setting the -ts parameter above 20 does not cause any issues, whereas setting it below 20 results in the same error mentioned earlier.

I would greatly appreciate your insights on this matter!

Wishing you success in your research! Thank you.

Gao

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions