Skip to content

Optimize exit code handling by relying on scheduler status for successful executions #6445

@pditommaso

Description

@pditommaso

Background

Currently, Nextflow reads the .exitcode file from the work directory to determine task completion status. PR #6442 improved K8s error handling by prioritizing the scheduler's exit status for failed executions (e.g., OOMKilled, pod eviction), but still falls back to reading the .exitcode file for successful executions.

Proposed Optimization

For successful task executions (scheduler exit status == 0), we should rely solely on the scheduler's reported exit status and bypass reading the .exitcode file entirely.

Benefits

  • Reduced I/O pressure: Eliminates one file read operation per successful task
  • Better scalability: Particularly beneficial for workloads with many fine-grain jobs
  • Lower storage costs: Reduces remote file storage access (S3, Azure Blob, GCS, etc.)
  • Improved performance: Faster task completion acknowledgment

Implementation Considerations

This optimization should be evaluated across all executor types:

  • K8s (nf-k8s)
  • AWS Batch (nf-amazon)
  • Azure Batch (nf-azure)
  • Google Batch (nf-google)
  • Other cloud executors

Related Work

The current PR establishes the pattern of prioritizing scheduler exit status for errors. This issue proposes extending that approach to successful executions as well.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions