Skip to content

feat(inference): ResidentModelStatus snapshot + queuedRequestCount#1882

Merged
roryford merged 1 commit into
mainfrom
feat/obs-resident-status
Jun 15, 2026
Merged

feat(inference): ResidentModelStatus snapshot + queuedRequestCount#1882
roryford merged 1 commit into
mainfrom
feat/obs-resident-status

Conversation

@roryford

Copy link
Copy Markdown
Owner

Summary

  • Adds ResidentModelStatus public struct to ManifoldInference with modelID, backend, estimatedFootprintBytes, loadedAt, lastActivityAt, and a live idleDuration computed property.
  • Exposes InferenceService.residentModelStatus: ResidentModelStatus? (nil when no model loaded) and InferenceService.queuedRequestCount: Int (depth of the waiting queue, not counting the active request).
  • Tracks lastActivityTimestamp in GenerationQueue at three points: enqueue acceptance, dequeue-to-active, and request completion.
  • Tracks loadedAt: Date? and residentFootprintBytes: UInt64? in ModelLifecycleCoordinator; set at commitLoadIfCurrent (footprint sourced from ModelLoadPlan.outcome.totalEstimatedBytes for local loads, nil for cloud endpoints and the #if DEBUG test-init path); cleared at unloadModel.

Closes #1880

Test plan

  • ResidentModelStatusTests (XCTestCase, ManifoldInferenceTests) — 9 tests:
    • residentModelStatus is non-nil when loaded, nil before load and after unloadModel().
    • modelID and backend match the names supplied at init.
    • loadedAt is within the expected time window.
    • estimatedFootprintBytes is nil for the debug-init path (no ModelLoadPlan).
    • queuedRequestCount is 0 initially and returns to 0 after a generation completes.
    • lastActivityAt is updated after generation completes; idleDuration is non-negative.
  • Run locally: scripts/test.sh --profile spike --spike-module ManifoldInferenceTests --skip-update

🤖 Generated with Claude Code

Adds read-only runtime observability to InferenceService:
- ResidentModelStatus (new public struct) snapshots modelID, backend, estimated
  footprint, loadedAt, lastActivityAt, and a live idleDuration computed property.
- queuedRequestCount (Int) exposes queue depth without the Bool-only hasQueuedRequests.
- lastActivityTimestamp tracked in GenerationQueue at enqueue, dequeue-to-active,
  and request completion; exposed via lastActivityAt.
- loadedAt and residentFootprintBytes tracked in ModelLifecycleCoordinator, set at
  commitLoadIfCurrent and cleared at unloadModel; footprint threaded from
  ModelLoadPlan.outcome.totalEstimatedBytes for local loads (nil for cloud/debug init).

Closes #1880

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@roryford roryford merged commit 9e688f7 into main Jun 15, 2026
11 checks passed
@roryford roryford deleted the feat/obs-resident-status branch June 15, 2026 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Public runtime residency metrics (resident footprint + last-activity)

1 participant