Skip to content

Conversation

@janniklinde
Copy link
Contributor

This patch introduces a new failure-propagation mechanism for out-of-core (OOC) tasks via the LocalTaskQueue.

Previously, unexpected exceptions in OOC tasks could silently fail, leaving upstream tasks waiting indefinitely because their output streams were never closed. To address this, we now propagate exceptions through the queue hierarchy, ensuring upstream and downstream threads are properly interrupted.

LocalTaskQueue maintains an exception state that allows both enqueue and dequeue operations to rethrow the stored exception, propagating errors across dependent queues. When a failure occurs, all related queues are notified, cascading the exception until it reaches the main thread and any other affected tasks.

Additionally, a common OOC task submission method was added to OOCInstruction to replace manual submission via CommonThreadPool. This ensures consistent exception propagation and simplifies OOC task management.

@codecov
Copy link

codecov bot commented Oct 28, 2025

Codecov Report

❌ Patch coverage is 89.47368% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.26%. Comparing base (801d8e2) to head (8eef615).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...sysds/runtime/instructions/ooc/OOCInstruction.java 80.00% 2 Missing and 1 partial ⚠️
.../runtime/controlprogram/parfor/LocalTaskQueue.java 91.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               main    #2346   +/-   ##
=========================================
  Coverage     72.26%   72.26%           
- Complexity    46724    46730    +6     
=========================================
  Files          1503     1503           
  Lines        177258   177262    +4     
  Branches      34832    34836    +4     
=========================================
+ Hits         128089   128104   +15     
+ Misses        39495    39493    -2     
+ Partials       9674     9665    -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mboehm7
Copy link
Contributor

mboehm7 commented Oct 28, 2025

LGTM - thanks for the patch @janniklinde. The idea of propagating exceptions through the queues is great, and I also like the concise abstraction for submitting these OOC tasks.

@mboehm7 mboehm7 closed this in d38e56c Oct 28, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in SystemDS PR Queue Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants