-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[Bug] RocksDB lock contention: multiple _SingleAccountBackend instances open same DB path (local backend) #1072
Description
Environment
- OpenViking: v0.2.8 (Docker, local vectordb backend)
- OS: Linux 6.12.18 (Synology NAS)
Problem
When using the local vectordb backend, OpenViking creates multiple _SingleAccountBackend instances (one per account_id), each calling create_collection_adapter(config) which creates a new LocalCollectionAdapter → get_or_create_local_collection() → opens the same RocksDB/LevelDB store path.
RocksDB enforces an exclusive file lock on the store/LOCK file. The first instance acquires it; every subsequent instance fails immediately with:
IO error: lock /app/data/vectordb/context/store/LOCK: already held by process
This makes all vector operations (count, query, search, upsert) fail for any account except the first one initialized.
Root Cause Analysis
The issue is in the architecture:
VikingVectorIndexBackend._get_backend_for_account()creates a new_SingleAccountBackendper account_id- Each backend calls
create_collection_adapter(config)infactory.py— no caching - Each adapter opens its own
PersistCollection→StoreManager→ RocksDB at the same path - RocksDB only allows one writer per database directory
Since account filtering is done at the query level (via account_id field filter), all accounts share the same physical collection. There's no need for separate DB handles.
Contributing Factors
lock_timeoutdefaults to0.0(immediate failure) inlock_manager.pyandtransaction_config.pyself._adapter.query()andself._adapter.count()are synchronous calls insideasync defmethods, blocking the event loop and preventing lock releaseself.embedder.embed()inhierarchical_retriever.pyis also synchronous in async context
Fix Applied (verified working)
Three patches that together resolve the issue:
Patch 1: Adapter instance cache in factory.py (PRIMARY FIX)
_local_adapter_cache: dict[str, CollectionAdapter] = {}
def create_collection_adapter(config) -> CollectionAdapter:
# ... existing code ...
if backend == "local" and hasattr(config, "path") and hasattr(config, "name"):
from pathlib import Path
cache_key = str(Path(config.path) / "vectordb" / (config.name or "context"))
if cache_key in _local_adapter_cache:
return _local_adapter_cache[cache_key]
adapter = adapter_cls.from_config(config)
_local_adapter_cache[cache_key] = adapter
return adapter
return adapter_cls.from_config(config)Patch 2: lock_timeout 0 → 5s
In lock_manager.py and transaction_config.py:
lock_timeout: float = 5.0 # was 0.0Patch 3: asyncio.to_thread for blocking calls
In viking_vector_index_backend.py:
return await asyncio.to_thread(self._adapter.query, ...)
return await asyncio.to_thread(self._adapter.count, filter=filter)In hierarchical_retriever.py:
result = await asyncio.to_thread(self.embedder.embed, query.query, True)Related PRs
These existing PRs address parts of the problem but not the root cause (adapter caching):
- fix(server): wrap sync blocking calls in asyncio.to_thread for search/recall path #1068 —
asyncio.to_threadfor search/recall path - fix: prevent mv lock timeout causing missing L0/L1 files #1064 — mv lock timeout + retry
- fix(openclaw-plugin): prevent CONFLICT errors with commit lock #1055 — commit lock contention in OpenClaw plugin
- [Bug]: SemanticProcessor fails to move L0/L1 files from temp to target directory due to lock timeout #1047 — SemanticProcessor lock timeout
None of these PRs fix the fundamental issue: multiple adapter instances opening the same RocksDB. Even with all these PRs merged, the lock contention will persist because each _SingleAccountBackend still creates its own DB handle.
Suggested Long-term Fix
The proper fix should be at the VikingVectorIndexBackend facade level: ensure all _SingleAccountBackend instances share a single CollectionAdapter (and thus a single RocksDB handle), since they all operate on the same physical collection with account-level filtering.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status