Changes to support pass@k evaluation on the HumanEval dataset #1180

shubhra · 2023-08-11T14:56:20Z

Example:

numactl -C0-15 python deepsparse/src/deepsparse/transformers/eval_downstream.py \
        <model_path>\
        --num-cores 16 \
        --dataset openai_humaneval \
        --humaneval-method pass_at_k \
        --engine deepsparse \
        --start 0 \
        --max-samples 2

This will create a subset of the HumanEval dataset starting at index 0 (start) and pick 2 samples (max-samples) to run the evaluation on.
If benchmark-humaneval argument is supplied, the evaluation will run on a pre-selected smaller subset of the dataset that contains 11 samples and will ignore start and max-samples.
Set humaneval-method to perplexity to evaluate perplexity instead of pass@k.
Add --n-solutions <n> to specify the number of solutions required per task . Default is 1.

Note: Remove numactl -C0-15 if you don't need to specify which cores to run on.

jeanniefinks · 2025-05-10T15:31:02Z

Per the main README announcement, DeepSparse is being deprecated by June 2, 2025. Closing the PR as work has been suspended; thank you for the inputs and support!

Changes to support pass at k evaluation on the HumanEval dataset

56b022d

shubhra marked this pull request as draft August 11, 2023 14:56

shubhra changed the title ~~Changes to support pass at k evaluation on the HumanEval dataset~~ Changes to support pass@k evaluation on the HumanEval dataset Aug 11, 2023

jeanniefinks closed this May 10, 2025

jeanniefinks deleted the shubhra/humaneval_pass@k branch May 29, 2025 23:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changes to support pass@k evaluation on the HumanEval dataset #1180

Changes to support pass@k evaluation on the HumanEval dataset #1180

Uh oh!

shubhra commented Aug 11, 2023

Uh oh!

jeanniefinks commented May 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Changes to support pass@k evaluation on the HumanEval dataset #1180

Changes to support pass@k evaluation on the HumanEval dataset #1180

Uh oh!

Conversation

shubhra commented Aug 11, 2023

Uh oh!

jeanniefinks commented May 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants