Skip to content

fix(worker): quarantine invalid async operations#1674

Open
ai-ag2026 wants to merge 1 commit into
vectorize-io:mainfrom
ai-ag2026:fix/quarantine-invalid-async-operations
Open

fix(worker): quarantine invalid async operations#1674
ai-ag2026 wants to merge 1 commit into
vectorize-io:mainfrom
ai-ag2026:fix/quarantine-invalid-async-operations

Conversation

@ai-ag2026
Copy link
Copy Markdown
Contributor

Summary

  • quarantine pending async operations that can never be claimed because task_payload is NULL
  • preserve legitimate batch_retain parent rows while they still have child operations
  • skip the PostgreSQL JSONB quarantine query entirely on non-PostgreSQL backends

Why

Crash recovery can leave orphaned batch_retain parent rows in async_operations with status='pending' and task_payload=NULL. The worker claim path intentionally ignores NULL payloads because there is no executable task to dispatch, so these rows can sit in the pending lane forever and make queue health look like normal backlog.

Test Plan

  • uv run ruff check hindsight_api/worker/poller.py tests/test_worker_quarantine.py
  • uv run python -m py_compile hindsight_api/worker/poller.py tests/test_worker_quarantine.py
  • git diff --check
  • uv run --extra embedded-db pytest tests/test_worker_quarantine.py -q -o addopts=
  • uv run --extra embedded-db pytest tests/test_worker_quarantine.py tests/test_worker.py -q -o addopts= -k 'not memory_engine_execute_task_passes_through_defer_operation' -> 67 passed, 1 deselected

Fixes #1671
Complements #1670 by handling the queued poison rows separately from embedding-dimension validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Async operations need poison quarantine / degraded queue health

1 participant