fix: runtime cache health check bypassing stale connections by 0xbbjoker · Pull Request #192 · elizaOS/cloud

0xbbjoker · 2026-01-05T22:09:04Z

RuntimeCache Health Check Fix

The RuntimeCache was storing the cache key (e.g., "agentId:ws") as entry.agentId, but DbAdapterPool stores adapters under just "agentId". This caused health checks to always return true for webSearch-enabled runtimes because the adapter lookup never found a match.

Solution: RuntimeCache.set() now accepts the actual agentId separately from the cache key, ensuring health checks properly detect and evict runtimes with closed DB connections.

Note

Restores proper DB health checks for cached runtimes by decoupling cache key (e.g., agentId:ws) from the actual agentId.

Changes RuntimeCache.set(cacheKey, runtime, characterName, actualAgentId) to persist the true agentId while using cacheKey for lookup
Updates call site in createRuntimeForUser to pass agentId separately and improves cache logging
Ensures getWithHealthCheck checks the correct adapter via DbAdapterPool.checkHealth(entry.agentId) and evicts unhealthy runtimes

^{Written by Cursor Bugbot for commit 929fa5a. This will update automatically on new commits. Configure here.}

vercel · 2026-01-05T22:09:09Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
eliza-cloud-v2	Ready	Preview, Comment	Jan 7, 2026 9:23pm
miniapp	Error		Jan 7, 2026 9:23pm

coderabbitai · 2026-01-05T22:09:12Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

🎁 Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-01-05T22:10:48Z

Code Review - PR #192: Runtime Cache Health Check Fix

Summary

This PR fixes a critical bug where the RuntimeCache was storing the cache key (e.g., "agentId:ws") as entry.agentId, but DbAdapterPool was looking up adapters using just the agentId. This mismatch caused health checks to always return true for webSearch-enabled runtimes, allowing stale DB connections to persist.

Code Quality & Best Practices ✅

Strengths:

Clean, focused fix that addresses the root cause
Improved logging that includes both the actual agentId and the cache key for debugging
Follows TypeScript best practices with proper type annotations
Maintains existing code structure and patterns

Observations:

The fix correctly separates concerns: cache key vs. agent ID for DB adapter lookup
Parameter naming is clear: cacheKey and actualAgentId make the distinction explicit

Correctness ✅

The fix is sound:

RuntimeCache.set() now accepts 4 parameters: cacheKey, runtime, characterName, actualAgentId
Stores actualAgentId in entry.agentId (used by DbAdapterPool.checkHealth())
Uses cacheKey for Map storage (supports webSearch suffix: "agentId:ws")
Call site at lib/eliza/runtime-factory.ts:426 correctly passes both values

Potential Issues & Considerations

1. API Breaking Change (Minor)
The set() method signature changed from 3 to 4 parameters. While this appears to be an internal class, verify:

Are there any other call sites to RuntimeCache.set() besides line 426?
Should this be considered a patch vs. minor version bump?

2. Type Safety Enhancement Opportunity
Consider making the distinction more explicit:

set(cacheKey: string, runtime: AgentRuntime, characterName: string, actualAgentId: UUID): void {

Could be enhanced with a comment or even better, using branded types for cache keys if this pattern expands.

3. Cache Invalidation Question
The delete() and has() methods still use agentId parameter names (lines 136, 145). These actually take cache keys. Consider:

Renaming the parameter to cacheKey for consistency
Or document that these methods accept cache keys, not agent IDs

Example from line 319-320:

const wasInMemoryBase = runtimeCache.delete(agentId);
const wasInMemoryWs = runtimeCache.delete(`${agentId}:ws`);

This shows delete() accepts cache keys, not agent IDs.

Performance Considerations ✅

No performance concerns:

Additional parameter has negligible overhead
Cache lookup logic unchanged
Health check behavior improved (prevents unnecessary work with stale connections)

Security Concerns ✅

No security issues identified:

UUID type safety maintained
No injection risks
Proper separation of cache key from agent ID prevents potential key confusion attacks

Test Coverage ⚠️

Missing tests:

No test files found for RuntimeCache or DbAdapterPool
This bug could have been caught with unit tests

Recommendation:
Add tests covering:

RuntimeCache.set() with webSearch suffix ("agentId:ws")
getWithHealthCheck() correctly passing entry.agentId to DbAdapterPool.checkHealth()
Health check eviction when DB connection is stale
Cache key vs. agent ID distinction

Test case example:

test('health check uses actual agentId, not cache key', async () => {
  const agentId = '123-456-789' as UUID;
  const cacheKey = `${agentId}:ws`;
  
  cache.set(cacheKey, mockRuntime, 'TestChar', agentId);
  
  // Should pass agentId to checkHealth, not cacheKey
  await cache.getWithHealthCheck(cacheKey, mockDbPool);
  
  expect(mockDbPool.checkHealth).toHaveBeenCalledWith(agentId);
});

Style & Documentation

Logging improvement:

`[RuntimeCache] Cached runtime: ${characterName} (${actualAgentId}, key=${cacheKey})`

This is excellent for debugging - clearly shows both values.

Documentation opportunity:
Consider adding a comment above the set() method explaining why both values are needed:

/**
 * Cache a runtime with the given cache key.
 * @param cacheKey - Cache key (may include suffix like ":ws")
 * @param actualAgentId - Agent UUID for DB adapter lookup (no suffix)
 */

Recommendation: ✅ APPROVE with suggestions

This is a solid bug fix that correctly addresses the health check bypass issue. The code is clean and the fix is minimal.

Suggested improvements (non-blocking):

Add unit tests to prevent regression
Rename agentId parameter to cacheKey in delete(), has(), and get() methods for consistency
Add JSDoc comment explaining the distinction between cache key and agent ID
Consider adding a validation that actualAgentId doesn't contain the suffix (defensive programming)

Great work identifying and fixing this subtle but important bug! 🎯

claude · 2026-01-06T00:42:50Z

Code Review: Runtime Cache Health Check Fix

Summary

This PR correctly fixes a critical bug where health checks were bypassing stale DB connections due to a cache key mismatch between RuntimeCache and DbAdapterPool.

The Bug

Before: When web search was enabled, the cache key would be "agentId:ws", but this compound key was being stored in entry.agentId. The health check in line 104 (dbPool.checkHealth(entry.agentId)) would then look up "agentId:ws" in the DbAdapterPool, which stores adapters under plain "agentId". This lookup would fail to find the adapter and return true (meaning "healthy" at lib/eliza/runtime-factory.ts:236), bypassing the health check entirely.

After: The fix separates the cache key (used for Map indexing) from the actual agentId (stored in the entry), ensuring health checks lookup the correct adapter.

Review Findings

✅ Code Quality & Best Practices

Clean separation of concerns: Cache key vs actual agentId is now explicit
Improved logging: The debug message now shows both actualAgentId and cacheKey for better observability
Consistent naming: actualAgentId parameter name clearly indicates its purpose
Minimal changes: Only touches what's necessary to fix the bug

✅ Correctness

The fix is correct:

RuntimeCache.set() now receives both the cache key and the actual agentId
The cache entry stores the real agentId (without :ws suffix)
Health checks properly lookup adapters using the clean agentId
Cache retrieval still works via the compound key

⚠️ Potential Issues

1. Parameter Order Inconsistency
The set() method signature is:

set(cacheKey: string, runtime: AgentRuntime, characterName: string, actualAgentId: UUID)

Consider reordering to group related parameters:

set(cacheKey: string, actualAgentId: UUID, runtime: AgentRuntime, characterName: string)

This puts the two ID-related parameters together, making the relationship clearer. However, this is a minor style preference and the current implementation is acceptable.

2. Type Safety - No Breaking Change Detected
The signature change from:

set(agentId: string, runtime: AgentRuntime, characterName: string): void

to:

set(cacheKey: string, runtime: AgentRuntime, characterName: string, actualAgentId: UUID): void

This is a breaking change for any code calling this private method. ✅ However, I verified there's only one call site (lib/eliza/runtime-factory.ts:426), which is updated in this PR, so this is safe.

3. Edge Case: Other Methods Not Updated
The delete() and has() methods at lines 136-147 still take a single agentId parameter. They should receive the cache key (e.g., "agentId:ws"), not the raw agentId.

Let me verify if this is already handled correctly by checking the call sites...

Actually, looking at the code, these methods operate on the Map key, which should be the cache key. The parameter name agentId is misleading but the functionality appears correct since callers would pass the cache key. Consider renaming the parameter to cacheKey for clarity in a follow-up.

✅ Performance

No performance regression: The change only adds one additional parameter
Same lookup complexity: O(1) Map operations unchanged
Health check behavior improved: Now actually performs checks instead of bypassing

✅ Security

No security concerns identified
The fix actually improves reliability by ensuring stale connections are properly detected and evicted

❌ Test Coverage

Missing: No unit tests for RuntimeCache health check logic. Consider adding tests for:

Health check with clean agentId when cache key has :ws suffix
Verification that stale connections are properly evicted
Cache key vs agentId separation

Example test case:

test('health check uses actualAgentId not cache key', async () => {
  const cache = new RuntimeCache();
  const runtime = createMockRuntime();
  const agentId = 'abc-123' as UUID;
  const cacheKey = 'abc-123:ws';
  
  cache.set(cacheKey, runtime, 'TestAgent', agentId);
  
  // Mock dbPool.checkHealth to verify it receives the clean agentId
  const mockDbPool = {
    checkHealth: jest.fn().mockResolvedValue(true)
  };
  
  await cache.getWithHealthCheck(cacheKey, mockDbPool);
  expect(mockDbPool.checkHealth).toHaveBeenCalledWith(agentId); // Not cacheKey!
});

Recommendation

✅ APPROVE with suggestions

This is a critical bug fix that should be merged. The implementation is correct and solves the stated problem.

Suggested Follow-ups (not blocking):

Add unit tests for the health check logic
Consider renaming agentId parameter in delete()/has() methods to cacheKey for clarity
Add a comment explaining the cache key vs agentId distinction at the CachedRuntime interface

Great catch on this subtle but important bug! 🎯

claude · 2026-01-06T19:23:18Z

Code Review - PR #192: Runtime Cache Health Check Fix

Summary

This PR fixes a critical bug where RuntimeCache health checks were bypassing stale database connections. The root cause was a key mismatch: cache keys included suffixes (e.g., "agentId:ws") but DbAdapterPool stored adapters under plain agentId, causing health checks to always return true.

✅ Strengths

Correct Problem Identification: The fix properly addresses the key mismatch between cache keys and adapter pool keys
Minimal Surface Area: Only 11 lines changed, reducing risk of introducing new bugs
Backward Compatible: Changes are internal to the caching layer without breaking external APIs
Improved Observability: Enhanced debug logging now shows both the actualAgentId and cacheKey

🐛 Issues Found

Critical: Missing Type Safety

Location: runtime-factory.ts:129

The actualAgentId parameter is typed as UUID but there's no runtime validation. If a non-UUID string is passed, it will silently fail health checks.

Recommendation:

async set(
  cacheKey: string,
  runtime: AgentRuntime,
  characterName: string,
  actualAgentId: UUID,
): Promise<void> {
  // Add validation
  if (!actualAgentId || typeof actualAgentId !== 'string') {
    throw new Error(`Invalid actualAgentId: ${actualAgentId}`);
  }
  // ... rest of implementation
}

Medium: Inconsistent Cache Key Usage

Locations: runtime-factory.ts:345-346, runtime-factory.ts:370

The invalidateRuntime() and isRuntimeCached() methods still use raw agentId instead of cacheKey, which could cause inconsistencies:

// Current code at line 345-346
const wasInMemoryBase = await runtimeCache.delete(agentId);
const wasInMemoryWs = await runtimeCache.delete(`${agentId}:ws`);

This works but creates duplication. Consider refactoring to use a helper:

private getCacheKeys(agentId: string): string[] {
  return [agentId, `${agentId}:ws`];
}

Low: Parameter Ordering

Location: runtime-factory.ts:125-129

The new signature has cacheKey first and actualAgentId last, which is counterintuitive since agentId is conceptually more fundamental than the cache key.

Suggestion: Consider reordering for better semantics:

async set(
  actualAgentId: UUID,
  cacheKey: string,
  runtime: AgentRuntime,
  characterName: string,
): Promise<void>

Though this is a minor point and not worth the churn if you prefer the current order.

🔍 Code Quality

Good:

Clean separation of concerns between cache key and agent ID
Descriptive variable names (cacheKey vs actualAgentId)
Improved logging for debugging

Concerns:

No input validation for the new parameter
No comments explaining the subtle distinction between cacheKey and actualAgentId

⚡ Performance

No performance concerns. The changes are purely logical with no additional overhead.

🔒 Security

No security concerns identified. The fix doesn't introduce any new attack vectors.

🧪 Test Coverage

Critical Gap: No tests found for RuntimeCache or this specific health check logic.

Recommendations:

Add unit tests for RuntimeCache.set() with different cache keys
Add integration tests for health check behavior with webSearch-enabled runtimes
Add regression test for the specific bug: cache key "agentId:ws" should resolve to adapter with key "agentId"

Example test case:

describe('RuntimeCache health check', () => {
  it('should find adapter using actualAgentId when cache key has suffix', async () => {
    const agentId = 'test-agent-123' as UUID;
    const cacheKey = `${agentId}:ws`;
    
    await runtimeCache.set(cacheKey, runtime, 'Test', agentId);
    const result = await runtimeCache.getWithHealthCheck(cacheKey, dbPool);
    
    expect(result).toBeDefined();
    expect(dbPool.checkHealth).toHaveBeenCalledWith(agentId); // Not cacheKey!
  });
});

📝 Recommendations

Must Fix Before Merge:

Add input validation for actualAgentId parameter
Add inline comments explaining the cache key vs agent ID distinction
Add basic unit tests (at minimum, test that health check uses actualAgentId)

Nice to Have:

Refactor cache key generation into a helper method
Consider adding TypeScript branded types to prevent passing wrong IDs
Add JSDoc comments to the set() method explaining parameters

🎯 Verdict

Approve with Changes: The core fix is sound and addresses a real bug, but it needs:

Input validation
Documentation comments
Test coverage

The logic is correct, but production code handling critical infrastructure like database connection health should have stronger safeguards.

Generated by Claude Code

claude · 2026-01-07T21:22:13Z

Code Review: PR #192 - Runtime Cache Health Check Fix

Summary

This PR fixes a critical bug where health checks were failing to detect unhealthy database connections for cached runtimes. The root cause was a key mismatch: RuntimeCache stored the composite cache key (e.g., "agentId:ws") as entry.agentId, but DbAdapterPool expected just the plain agentId for adapter lookups.

✅ Strengths

Correct Problem Identification: The fix properly addresses the mismatch between cache keys and adapter keys
Clear Parameter Naming: Using cacheKey vs actualAgentId makes the distinction explicit
Improved Logging: The updated debug log now shows both the actual agentId and cache key for better observability
Minimal Changes: The fix is surgical - only 4 lines changed with minimal surface area for new bugs
Backwards Compatible: Existing cache entries will be recreated naturally as they expire

🐛 Potential Issues

1. Type Safety Concern (Minor)

The actualAgentId parameter is typed as UUID, but there's no runtime validation. Consider adding a validation check:

async set(
  cacheKey: string,
  runtime: AgentRuntime,
  characterName: string,
  actualAgentId: UUID,
): Promise<void> {
  if (!actualAgentId) {
    throw new Error('[RuntimeCache] actualAgentId is required');
  }
  // ... rest of implementation
}

2. Cache Invalidation Gap (Low Priority)

Existing cache entries with the old structure (entry.agentId = "agentId:ws") will fail health checks until they expire naturally. This is acceptable given the 30-minute max age, but worth noting for monitoring.

🔍 Code Quality

Positive:

Clear separation of concerns between cache key (for lookup) and agentId (for health checks)
Consistent with existing code style
Good use of debug logging for troubleshooting

Suggestions:

Consider adding JSDoc comments to clarify the distinction between cacheKey and actualAgentId:

/**
 * Caches a runtime instance.
 * @param cacheKey - The composite key for cache lookup (e.g., "agentId" or "agentId:ws")
 * @param runtime - The runtime instance to cache
 * @param characterName - Name of the character for logging
 * @param actualAgentId - The raw agent UUID for DB adapter health checks
 */

🔒 Security

No security concerns identified. The change doesn't introduce any new attack vectors.

⚡ Performance

No performance impact. The change is purely structural and doesn't add any additional operations.

🧪 Test Coverage

Missing Tests:

Unit test verifying health check works with webSearch-enabled runtimes ("agentId:ws" keys)
Integration test confirming unhealthy adapters are properly evicted
Test case for the logging output format

Recommended Test:

describe('RuntimeCache health checks', () => {
  it('should properly check health for webSearch-enabled runtimes', async () => {
    const agentId = 'test-agent-id' as UUID;
    const cacheKey = `${agentId}:ws`;
    
    await runtimeCache.set(cacheKey, mockRuntime, 'TestChar', agentId);
    
    // Mock unhealthy adapter
    dbPool.checkHealth = jest.fn().mockResolvedValue(false);
    
    const result = await runtimeCache.getWithHealthCheck(cacheKey, dbPool);
    expect(result).toBeNull();
    expect(dbPool.checkHealth).toHaveBeenCalledWith(agentId);
  });
});

🎯 Overall Assessment

Verdict: ✅ Approve with minor suggestions

This is a solid fix that correctly addresses the root cause. The implementation is clean and minimal. The suggested improvements are non-blocking:

Add parameter validation
Add JSDoc comments for clarity
Add test coverage for the fix

The PR can be merged as-is, with the suggestions implemented in a follow-up if desired.

Impact: High - Fixes a critical bug that prevented proper cleanup of stale database connections.

Review generated by Claude Code

…ey-mismatch

fix: runtime cache health check bypassing stale connections

877e21f

vercel Bot deployed to Preview – eliza-cloud-v2 January 5, 2026 22:12 View deployment

Merge branch 'dev' into fix/runtime-cache-health-check-key-mismatch

7d9f31c

vercel Bot deployed to Preview – miniapp January 6, 2026 00:40 View deployment

vercel Bot deployed to Preview – eliza-cloud-v2 January 6, 2026 00:43 View deployment

standujar approved these changes Jan 6, 2026

View reviewed changes

Merge branch 'dev' into fix/runtime-cache-health-check-key-mismatch

99685a0

vercel Bot deployed to Preview – miniapp January 6, 2026 19:21 View deployment

vercel Bot had a problem deploying to Preview – eliza-cloud-v2 January 6, 2026 19:25 Failure

Merge branch 'dev' into fix/runtime-cache-health-check-key-mismatch

929fa5a

vercel Bot had a problem deploying to Preview – miniapp January 7, 2026 21:20 Failure

vercel Bot deployed to Preview – eliza-cloud-v2 January 7, 2026 21:23 View deployment

0xbbjoker merged commit 1890f46 into dev Jan 7, 2026
13 of 16 checks passed

0xbbjoker deleted the fix/runtime-cache-health-check-key-mismatch branch January 7, 2026 21:25

lalalune pushed a commit that referenced this pull request Apr 28, 2026

Merge pull request #192 from elizaOS/fix/runtime-cache-health-check-k…

e3d70f8

…ey-mismatch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: runtime cache health check bypassing stale connections#192

fix: runtime cache health check bypassing stale connections#192
0xbbjoker merged 4 commits intodevfrom
fix/runtime-cache-health-check-key-mismatch

0xbbjoker commented Jan 5, 2026 •

edited by cursor Bot

Loading

Uh oh!

vercel Bot commented Jan 5, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jan 5, 2026

Review skipped

Uh oh!

claude Bot commented Jan 5, 2026

Uh oh!

claude Bot commented Jan 6, 2026

Uh oh!

claude Bot commented Jan 6, 2026

Uh oh!

claude Bot commented Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

0xbbjoker commented Jan 5, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

RuntimeCache Health Check Fix

Uh oh!

vercel Bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jan 5, 2026

Review skipped

Uh oh!

claude Bot commented Jan 5, 2026

Code Review - PR #192: Runtime Cache Health Check Fix

Summary

Code Quality & Best Practices ✅

Correctness ✅

Potential Issues & Considerations

Performance Considerations ✅

Security Concerns ✅

Test Coverage ⚠️

Style & Documentation

Recommendation: ✅ APPROVE with suggestions

Uh oh!

claude Bot commented Jan 6, 2026

Code Review: Runtime Cache Health Check Fix

Summary

The Bug

Review Findings

✅ Code Quality & Best Practices

✅ Correctness

⚠️ Potential Issues

✅ Performance

✅ Security

❌ Test Coverage

Recommendation

Suggested Follow-ups (not blocking):

Uh oh!

claude Bot commented Jan 6, 2026

Code Review - PR #192: Runtime Cache Health Check Fix

Summary

✅ Strengths

🐛 Issues Found

Critical: Missing Type Safety

Medium: Inconsistent Cache Key Usage

Low: Parameter Ordering

🔍 Code Quality

⚡ Performance

🔒 Security

🧪 Test Coverage

📝 Recommendations

Must Fix Before Merge:

Nice to Have:

🎯 Verdict

Uh oh!

claude Bot commented Jan 7, 2026

Code Review: PR #192 - Runtime Cache Health Check Fix

Summary

✅ Strengths

🐛 Potential Issues

1. Type Safety Concern (Minor)

2. Cache Invalidation Gap (Low Priority)

🔍 Code Quality

🔒 Security

⚡ Performance

🧪 Test Coverage

🎯 Overall Assessment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

0xbbjoker commented Jan 5, 2026 •

edited by cursor Bot

Loading

vercel Bot commented Jan 5, 2026 •

edited

Loading