-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Handling OutputValidationError retry loops and workflow hangs with local LLMs #260
Copy link
Copy link
Open
Description
I am currently testing Shannon using a local LLM (Qwen 3.5 122B via an OpenAI-compatible endpoint). During my testing, I've encountered a few workflow stability issues and would appreciate some guidance on how to configure or handle them.
1. OutputValidationError and Temporal Retry Loops
The pre-recon agent successfully progresses through many turns but eventually fails with an OutputValidationError.
The main challenge is that Temporal treats this validation failure as a retryable error (nonRetryable: false). As a result, instead of failing fast, the system enters a retry loop. I observed a single pre-recon activity running for over 75 minutes (elapsedSeconds: 4484) across multiple attempts, consuming significant resources and appearing to hang.
Temporal Dashboard Log:
{
"sdkComponent": "worker",
"taskQueue": "shannon-pipeline",
"attempt": 2,
"activityType": "runPreReconAgent",
"error": "ApplicationFailure: Agent pre-recon failed output validation",
"cause": {
"type": "OutputValidationError",
"nonRetryable": false
},
"durationMs": 386443
}Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels