Skip to content

fix(embedder): use native fetch for Ollama + external AbortSignal propagation#389

Closed
jlin53882 wants to merge 3 commits intoCortexReach:masterfrom
jlin53882:fix/ollama-native-fetch-abort
Closed

fix(embedder): use native fetch for Ollama + external AbortSignal propagation#389
jlin53882 wants to merge 3 commits intoCortexReach:masterfrom
jlin53882:fix/ollama-native-fetch-abort

Conversation

@jlin53882
Copy link
Copy Markdown
Contributor

Summary

Fixes Issue #361: Ollama embedding requests hang indefinitely because the OpenAI SDK's HTTP client does not reliably abort TCP connections when AbortController.abort() fires.

Root Cause

When AbortController.abort() fires for an Ollama embedding request, the OpenAI SDK's HTTP client keeps the TCP socket open until the OS-level timeout (~120s). This stalls the entire before_prompt_build hook and causes auto-recall to time out.

Solution

1. src/embedder.ts — Ollama native fetch path

Added isOllamaProvider() and embedWithNativeFetch():

private isOllamaProvider(): boolean {
  return /localhost:11434|127\.0\.0\.1:11434|\/ollama\b/i.test(this._baseURL);
}

private async embedWithNativeFetch(payload: any, signal?: AbortSignal): Promise<any> {
  const endpoint = this._baseURL + "/embeddings";
  const response = await fetch(endpoint, {
    method: "POST",
    headers: { "Content-Type": "application/json", "Authorization": `Bearer ${apiKey}` },
    body: JSON.stringify(payload),
    signal, // ← native fetch properly respects AbortSignal
  });
  // ...
}

embedWithRetry() routes Ollama requests to embedWithNativeFetch() instead of the OpenAI SDK.

2. src/embedder.ts — External AbortSignal propagation

withTimeout() now accepts and merges an optional externalSignal:

private withTimeout<T>(
  promiseFactory: (signal: AbortSignal) => Promise<T>,
  label: string,
  externalSignal?: AbortSignal
): Promise<T> {
  const controller = new AbortController();
  // External signal → abort the internal controller immediately
  if (externalSignal) externalSignal.addEventListener("abort", () => controller.abort());
  // ...
}

All public APIs (embedQuery, embedPassage, embedBatchQuery, embedBatchPassage) now accept signal?: AbortSignal.

3. test/embedder-ollama-abort.test.mjs — Regression test

Verifies abort actually interrupts a slow server response:

  • Mock server on 127.0.0.1:11434 delays 5 seconds
  • External AbortSignal fires at 2 seconds
  • Asserts total time ≈ 2s (not 5s)

Result: aborted in 2029ms < 5000ms threshold

Review Feedback Addressed

Reviewer correctly noted that the original testOllamaAbortWithNativeFetch() hardcoded 127.0.0.1:11434 instead of using withServer's port, so it always hit "connection refused" without exercising the slow handler. Fixed in this commit.

Testing

node --test test/embedder-ollama-abort.test.mjs
# ✔ Ollama embedWithNativeFetch aborts slow request within expected time (2029.86ms)

Checklist

  • isOllamaProvider() correctly detects localhost:11434 / 127.0.0.1:11434
  • embedWithNativeFetch() uses native fetch, respects AbortSignal
  • Non-Ollama paths still use OpenAI SDK with signal
  • External AbortSignal merges with internal timeout controller
  • All public embedding APIs accept optional signal?: AbortSignal
  • Regression test proves abort interrupts slow request (2s < 5s)

jlin53882 and others added 3 commits March 26, 2026 21:29
…works

Root cause: OpenAI SDK HTTP client does not reliably abort Ollama
TCP connections when AbortController.abort() fires in Node.js. This
causes stalled sockets that hang until the gateway-level 120s timeout.

Fix: Add isOllamaProvider() to detect localhost:11434 endpoints, and
embedWithNativeFetch() using Node.js 18+ native fetch instead of the
OpenAI SDK. Native fetch properly closes TCP connections on abort.

Added Test 8 (testOllamaAbortWithNativeFetch) to cjk-recursion-regression
test suite. Also added standalone test (pr354-standalone.mjs) and
30-iteration stress test (pr354-30iter.mjs).

Fixes CortexReach#361.
@jlin53882 jlin53882 closed this Mar 28, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e8a7bfe1ec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +569 to +571
if (this.isOllamaProvider()) {
try {
return await this.embedWithNativeFetch(payload, signal);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve timeout behavior on Ollama batch embeddings

Routeing all Ollama traffic through embedWithNativeFetch here drops the SDK timeout safety for calls that do not provide a signal. In this commit, embedBatchQuery/embedBatchPassage still default to signal === undefined, so batch Ollama requests now run on raw fetch with no built-in timeout and can hang on stalled sockets instead of failing fast. This is a regression for batch embedding flows (e.g., bulk ingestion paths) that previously relied on SDK-level request timeouts.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant