Skip to content

Add start_time and compute_node_id to the job table#350

Merged
daniel-thom merged 5 commits into
mainfrom
job-runtime-state
May 25, 2026
Merged

Add start_time and compute_node_id to the job table#350
daniel-thom merged 5 commits into
mainfrom
job-runtime-state

Conversation

@daniel-thom
Copy link
Copy Markdown
Collaborator

Summary

  • Surface how long a running job has been running and which compute node it is on, without waiting for completion. Previously this data only landed on the result record at job completion, and the active node was hidden on the internal job_internal table.
  • Migration 20260525000000_move_runtime_state_to_job: adds start_time and compute_node_id to job (with FK + partial index), copies any existing values over from job_internal.active_compute_node_id, then drops that column.
  • start_job writes both atomically alongside the status flip (RFC3339 timestamp). complete_job (single + batch paths), reset_job_status, reset_failed_jobs_only, and retry_job clear them. Status remains the source of truth for "is running."
  • Read paths expose the fields on JobModel; the previous id IN (SELECT … FROM job_internal …) filter for active_compute_node_id collapses to a direct WHERE on job.
  • CLI torc jobs list adds Compute Node + Elapsed columns (elapsed populated only when status=Running). torc jobs get adds raw Compute Node + Start Time lines. TUI jobs table + details popup and torc-dash jobs table + details card mirror this.
  • OpenAPI spec regenerated; Rust/Julia/Python clients refreshed via sync_openapi.sh all --promote + clients.

Test plan

  • cargo build --workspace --all-features ✅ (verified locally)
  • cargo clippy --all --all-targets --all-features -- -D warnings
  • cargo fmt -- --check and dprint check
  • cargo nextest run --workspace — 1413/1415 passing locally; remaining 2 are transient (pass in isolation)
  • Start a workflow, claim/start a job, confirm torc jobs list <wf> shows the node + an elapsed time that ticks up
  • Complete the job and confirm both fields revert to blank in the jobs list (compute node remains queryable on the result record)
  • Run torc workflows reset-status <wf> / --failed-only and confirm runtime state is cleared
  • Run torc workflows retry-job and confirm start_time + compute_node_id are cleared on the new attempt
  • Open the TUI; check the Jobs view shows the new columns and the details popup shows raw start_time
  • Load torc-dash; confirm jobs table + job details card show the new fields

🤖 Generated with Claude Code

Surface how long a running job has been running and which compute node it
is on, without waiting for completion. Previously the only attempt-level
data lived on the result record, which is created only when the job
finishes. The runtime state was also tracked on job_internal, an
internal-only table, requiring a join and hiding the data from the
public API.

Both fields are now public columns on the job table, set atomically by
start_job and cleared by complete_job and the reset/retry paths. Status
remains the source of truth for "is running." The CLI jobs list, TUI
jobs view, and torc-dash now display the compute node and elapsed time
for running jobs; raw start_time appears in the per-job detail views.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves per-attempt runtime state onto the public job table so clients can see which compute node is executing a currently-running job and how long it has been running, without waiting for job completion.

Changes:

  • Adds job.start_time and job.compute_node_id (with FK + partial index) and migrates existing node assignments from job_internal.active_compute_node_id.
  • Updates server write paths to set/clear the runtime state on start/complete/reset/retry, and updates read/query paths to expose the new fields.
  • Updates CLI/TUI/torc-dash and regenerates OpenAPI + clients to display the new fields.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
torc-server/migrations/20260525000000_move_runtime_state_to_job.up.sql Adds start_time/compute_node_id to job, backfills node id, and removes job_internal.active_compute_node_id.
torc-server/migrations/20260525000000_move_runtime_state_to_job.down.sql Attempts to revert by restoring active_compute_node_id and dropping the new job columns/index.
torc-dash/static/js/app-utils.js Adds elapsed-time formatting helper for RFC3339 timestamps.
torc-dash/static/js/app-tables.js Adds “Node” and “Elapsed” columns to jobs table.
torc-dash/static/js/app-job-details.js Shows compute node + start time in job details summary.
src/tui/ui.rs Adds “Node”/“Elapsed” columns and RFC3339 elapsed formatting in the TUI jobs table.
src/tui/components.rs Extends job details popup to include compute node + start time.
src/tui/app.rs Passes new runtime fields into the job details popup.
src/server/http_server/jobs_transport.rs Sets runtime state when starting a job; clears it on completion paths.
src/server/api/jobs.rs Selects/returns the new columns; clears them on reset/retry; simplifies active-node filtering to job.compute_node_id.
src/models.rs Extends JobModel with start_time and compute_node_id.
src/client/commands/jobs.rs Adds “Compute Node” + “Elapsed” columns/lines in CLI output.
python_client/src/torc/openapi_client/models/job_model.py Regenerated Python client model to include new fields.
julia_client/Torc/src/api/models/model_JobModel.jl Regenerated Julia client model to include new fields.
julia_client/julia_client/docs/JobModel.md Regenerated Julia docs for JobModel.
api/openapi.yaml Regenerated OpenAPI spec with new JobModel fields.
api/openapi.codegen.yaml Regenerated codegen input spec with new JobModel fields.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

);

DROP INDEX IF EXISTS idx_job_compute_node_id;

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the up — empirically verified. Ran sqlx migrate revert against the dev DB and it applied cleanly. SQLite's DROP COLUMN restriction is on columns referenced by the schema (incoming FKs, indexes, PK/UNIQUE), not on columns with outgoing FK references. Nothing FKs into job.compute_node_id, the partial index idx_job_compute_node_id is dropped first, so the DROP succeeds.

Comment thread torc-server/migrations/20260525000000_move_runtime_state_to_job.up.sql Outdated
daniel-thom and others added 4 commits May 25, 2026 13:00
…ning

The init/reinit/cascade paths flip job status without touching the two
new runtime columns, so a job that was Running when a compute node died
keeps its stale start_time and compute_node_id after `torc recover` or
`torc workflows reinitialize` moves it back to Ready/Blocked/
Uninitialized. The displayed elapsed time would then count from the
killed attempt, not the next one.

Enforces the invariant: start_time and compute_node_id are non-null only
while status = Running. Adds the NULL writes to initialize_unblocked_jobs,
initialize_blocked_jobs_to_blocked (both variants), uninitialize_blocked_jobs,
both copies of update_jobs_from_completion_reversal, and the generic
manage_status_change setter (defensive — not the proper Running path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Other timestamp columns (workflow.timestamp, result.completion_time,
access_group.created_at) are declared TEXT. Storing the RFC3339 strings
under TEXT instead of TIMESTAMP avoids the NUMERIC type affinity on
TIMESTAMP and matches existing conventions. No data-format change —
values are still RFC3339 strings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rendered durations now read 3m 12s / 2h 04m / 3d 11h instead of the
unbroken 3m12s / 2h04m / 3d11h. The space makes the unit boundaries
scannable at a glance in tables and detail views.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@daniel-thom daniel-thom merged commit 3e65ccd into main May 25, 2026
9 checks passed
@daniel-thom daniel-thom deleted the job-runtime-state branch May 25, 2026 19:49
daniel-thom added a commit that referenced this pull request May 25, 2026
* Render timestamps as local-with-offset for humans, UTC for JSON

Timestamp display was inconsistent across the CLI, TUI, and dash. Some
fields rendered raw RFC3339, others stripped the offset to local time
without a suffix, and the dash showed local time with no way to tell it
apart from UTC. After PR #350 added job.start_time, the inconsistency
was about to spread to a new field.

Apply a single policy:

  * humans (TUI, dash, CLI table mode, CLI detail printouts) see local
    time with an explicit ±HHMM offset suffix
  * JSON mode keeps the server's raw UTC RFC3339 value

The server already stores UTC, so this is purely a client-side
rendering change. File mtimes get the same treatment as RFC3339
timestamps.

A new pair of helpers in `src/client/utils.rs` (format_local_timestamp
and format_local_timestamp_epoch) covers the CLI and TUI surfaces; the
dash routes everything through formatDateLocal in app-utils.js; and the
Python convert_timestamp helper now returns a tz-aware UTC datetime so
callers no longer silently inherit the client's local zone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants