Add start_time and compute_node_id to the job table#350
Conversation
Surface how long a running job has been running and which compute node it is on, without waiting for completion. Previously the only attempt-level data lived on the result record, which is created only when the job finishes. The runtime state was also tracked on job_internal, an internal-only table, requiring a join and hiding the data from the public API. Both fields are now public columns on the job table, set atomically by start_job and cleared by complete_job and the reset/retry paths. Status remains the source of truth for "is running." The CLI jobs list, TUI jobs view, and torc-dash now display the compute node and elapsed time for running jobs; raw start_time appears in the per-job detail views. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR moves per-attempt runtime state onto the public job table so clients can see which compute node is executing a currently-running job and how long it has been running, without waiting for job completion.
Changes:
- Adds
job.start_timeandjob.compute_node_id(with FK + partial index) and migrates existing node assignments fromjob_internal.active_compute_node_id. - Updates server write paths to set/clear the runtime state on start/complete/reset/retry, and updates read/query paths to expose the new fields.
- Updates CLI/TUI/torc-dash and regenerates OpenAPI + clients to display the new fields.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| torc-server/migrations/20260525000000_move_runtime_state_to_job.up.sql | Adds start_time/compute_node_id to job, backfills node id, and removes job_internal.active_compute_node_id. |
| torc-server/migrations/20260525000000_move_runtime_state_to_job.down.sql | Attempts to revert by restoring active_compute_node_id and dropping the new job columns/index. |
| torc-dash/static/js/app-utils.js | Adds elapsed-time formatting helper for RFC3339 timestamps. |
| torc-dash/static/js/app-tables.js | Adds “Node” and “Elapsed” columns to jobs table. |
| torc-dash/static/js/app-job-details.js | Shows compute node + start time in job details summary. |
| src/tui/ui.rs | Adds “Node”/“Elapsed” columns and RFC3339 elapsed formatting in the TUI jobs table. |
| src/tui/components.rs | Extends job details popup to include compute node + start time. |
| src/tui/app.rs | Passes new runtime fields into the job details popup. |
| src/server/http_server/jobs_transport.rs | Sets runtime state when starting a job; clears it on completion paths. |
| src/server/api/jobs.rs | Selects/returns the new columns; clears them on reset/retry; simplifies active-node filtering to job.compute_node_id. |
| src/models.rs | Extends JobModel with start_time and compute_node_id. |
| src/client/commands/jobs.rs | Adds “Compute Node” + “Elapsed” columns/lines in CLI output. |
| python_client/src/torc/openapi_client/models/job_model.py | Regenerated Python client model to include new fields. |
| julia_client/Torc/src/api/models/model_JobModel.jl | Regenerated Julia client model to include new fields. |
| julia_client/julia_client/docs/JobModel.md | Regenerated Julia docs for JobModel. |
| api/openapi.yaml | Regenerated OpenAPI spec with new JobModel fields. |
| api/openapi.codegen.yaml | Regenerated codegen input spec with new JobModel fields. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ); | ||
|
|
||
| DROP INDEX IF EXISTS idx_job_compute_node_id; | ||
|
|
There was a problem hiding this comment.
Same as the up — empirically verified. Ran sqlx migrate revert against the dev DB and it applied cleanly. SQLite's DROP COLUMN restriction is on columns referenced by the schema (incoming FKs, indexes, PK/UNIQUE), not on columns with outgoing FK references. Nothing FKs into job.compute_node_id, the partial index idx_job_compute_node_id is dropped first, so the DROP succeeds.
…ning The init/reinit/cascade paths flip job status without touching the two new runtime columns, so a job that was Running when a compute node died keeps its stale start_time and compute_node_id after `torc recover` or `torc workflows reinitialize` moves it back to Ready/Blocked/ Uninitialized. The displayed elapsed time would then count from the killed attempt, not the next one. Enforces the invariant: start_time and compute_node_id are non-null only while status = Running. Adds the NULL writes to initialize_unblocked_jobs, initialize_blocked_jobs_to_blocked (both variants), uninitialize_blocked_jobs, both copies of update_jobs_from_completion_reversal, and the generic manage_status_change setter (defensive — not the proper Running path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Other timestamp columns (workflow.timestamp, result.completion_time, access_group.created_at) are declared TEXT. Storing the RFC3339 strings under TEXT instead of TIMESTAMP avoids the NUMERIC type affinity on TIMESTAMP and matches existing conventions. No data-format change — values are still RFC3339 strings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rendered durations now read 3m 12s / 2h 04m / 3d 11h instead of the unbroken 3m12s / 2h04m / 3d11h. The space makes the unit boundaries scannable at a glance in tables and detail views. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Render timestamps as local-with-offset for humans, UTC for JSON Timestamp display was inconsistent across the CLI, TUI, and dash. Some fields rendered raw RFC3339, others stripped the offset to local time without a suffix, and the dash showed local time with no way to tell it apart from UTC. After PR #350 added job.start_time, the inconsistency was about to spread to a new field. Apply a single policy: * humans (TUI, dash, CLI table mode, CLI detail printouts) see local time with an explicit ±HHMM offset suffix * JSON mode keeps the server's raw UTC RFC3339 value The server already stores UTC, so this is purely a client-side rendering change. File mtimes get the same treatment as RFC3339 timestamps. A new pair of helpers in `src/client/utils.rs` (format_local_timestamp and format_local_timestamp_epoch) covers the CLI and TUI surfaces; the dash routes everything through formatDateLocal in app-utils.js; and the Python convert_timestamp helper now returns a tz-aware UTC datetime so callers no longer silently inherit the client's local zone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
job_internaltable.20260525000000_move_runtime_state_to_job: addsstart_timeandcompute_node_idtojob(with FK + partial index), copies any existing values over fromjob_internal.active_compute_node_id, then drops that column.start_jobwrites both atomically alongside the status flip (RFC3339 timestamp).complete_job(single + batch paths),reset_job_status,reset_failed_jobs_only, andretry_jobclear them. Status remains the source of truth for "is running."JobModel; the previousid IN (SELECT … FROM job_internal …)filter foractive_compute_node_idcollapses to a directWHEREonjob.torc jobs listaddsCompute Node+Elapsedcolumns (elapsed populated only when status=Running).torc jobs getadds rawCompute Node+Start Timelines. TUI jobs table + details popup and torc-dash jobs table + details card mirror this.sync_openapi.sh all --promote+clients.Test plan
cargo build --workspace --all-features✅ (verified locally)cargo clippy --all --all-targets --all-features -- -D warnings✅cargo fmt -- --checkanddprint check✅cargo nextest run --workspace— 1413/1415 passing locally; remaining 2 are transient (pass in isolation)torc jobs list <wf>shows the node + an elapsed time that ticks uptorc workflows reset-status <wf>/--failed-onlyand confirm runtime state is clearedtorc workflows retry-joband confirmstart_time+compute_node_idare cleared on the new attempt🤖 Generated with Claude Code