Skip to content

GPU memory contention when generating synthetic data with n_jobs>1 on CUDA #242

@coderabbitai

Description

@coderabbitai

Problem

When process_batch in src/sampleworks/eval/generate_synthetic_sf.py and src/sampleworks/eval/generate_synthetic_density.py is called with a CUDA device and n_jobs > 1, each worker process spawns a separate CUDA context. Depending on available GPU memory, this can cause OOM errors or severe performance degradation as multiple workers compete for the same GPU(s).

Suggested approach

  • Allow the caller to specify a more granular device (e.g. cuda:0, cuda:1) per worker, or
  • Detect GPU usage and warn/clamp n_jobs accordingly if CUDA device is used

Context

Flagged during code review of PR #234 (feat(synthetic-sf): add script to generate synthetic structure factors).
Also applicable to the density generation script.

Backlinks:

Requested by @marcuscollins — help from engineers appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    engineeringTask that is best suited to software engineers, not research scientistshelp wantedExtra attention is needed

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions