feat(inference): ResidentModelStatus snapshot + queuedRequestCount#1882
Merged
Conversation
Adds read-only runtime observability to InferenceService: - ResidentModelStatus (new public struct) snapshots modelID, backend, estimated footprint, loadedAt, lastActivityAt, and a live idleDuration computed property. - queuedRequestCount (Int) exposes queue depth without the Bool-only hasQueuedRequests. - lastActivityTimestamp tracked in GenerationQueue at enqueue, dequeue-to-active, and request completion; exposed via lastActivityAt. - loadedAt and residentFootprintBytes tracked in ModelLifecycleCoordinator, set at commitLoadIfCurrent and cleared at unloadModel; footprint threaded from ModelLoadPlan.outcome.totalEstimatedBytes for local loads (nil for cloud/debug init). Closes #1880 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ResidentModelStatuspublic struct toManifoldInferencewithmodelID,backend,estimatedFootprintBytes,loadedAt,lastActivityAt, and a liveidleDurationcomputed property.InferenceService.residentModelStatus: ResidentModelStatus?(nil when no model loaded) andInferenceService.queuedRequestCount: Int(depth of the waiting queue, not counting the active request).lastActivityTimestampinGenerationQueueat three points: enqueue acceptance, dequeue-to-active, and request completion.loadedAt: Date?andresidentFootprintBytes: UInt64?inModelLifecycleCoordinator; set atcommitLoadIfCurrent(footprint sourced fromModelLoadPlan.outcome.totalEstimatedBytesfor local loads, nil for cloud endpoints and the#if DEBUGtest-init path); cleared atunloadModel.Closes #1880
Test plan
ResidentModelStatusTests(XCTestCase,ManifoldInferenceTests) — 9 tests:residentModelStatusis non-nil when loaded, nil before load and afterunloadModel().modelIDandbackendmatch the names supplied at init.loadedAtis within the expected time window.estimatedFootprintBytesis nil for the debug-init path (noModelLoadPlan).queuedRequestCountis 0 initially and returns to 0 after a generation completes.lastActivityAtis updated after generation completes;idleDurationis non-negative.scripts/test.sh --profile spike --spike-module ManifoldInferenceTests --skip-update🤖 Generated with Claude Code