Skip to content

Codebox agent-task failures should include provider diagnostics or artifact refs #4105

@chubes4

Description

@chubes4

Bug

After Homeboy Lab failure preservation fixes landed, the Conductor full-loop proof now preserves the remote dispatch envelope correctly, but the WP Codebox provider failure itself has no actionable diagnostics, runtime id, logs, transcript refs, or artifact refs.

Repro

homeboy agent-task cook \
  --cwd /Users/chubes/Developer/conductor@loop-full-run-proof \
  --repo conductor \
  --task-url https://github.a8c.com/chubes4/conductor/issues/48 \
  --backend codebox \
  --attempts 1 \
  --run-id conductor-full-loop-proof-retry4-20260612 \
  --prompt 'Verify the merged Conductor full solvability-loop proof wrapper through the intended Homeboy Lab/Codebox path...'

Observed

The remote dispatch envelope is now preserved locally, which is good. The preserved metadata shows:

{
  "task_id": "cook-conductor",
  "status": "failed",
  "summary": "WP Codebox agent task failed.",
  "metadata": {
    "provider": "wordpress.codebox-agent-task-executor",
    "codebox_run_result": {
      "schema": "wp-codebox/agent-task-run-result/v1",
      "status": "failed",
      "failure_classification": "runtime",
      "artifacts": [],
      "diagnostics": [],
      "metadata": {
        "provider_error": {},
        "run_id": "",
        "run_status": "",
        "runtime_id": "",
        "runtime_status": ""
      },
      "refs": {
        "artifact_bundles": [],
        "changed_files": [],
        "logs": [],
        "patches": [],
        "runtimes": [],
        "transcripts": []
      }
    }
  }
}

homeboy agent-task logs conductor-full-loop-proof-retry4-20260612 only shows:

{
  "state": "failed",
  "message": "WP Codebox agent task failed.",
  "task_id": "cook-conductor"
}

homeboy agent-task artifacts conductor-full-loop-proof-retry4-20260612 returns no artifacts/evidence refs.

Expected

When the WP Codebox agent-task executor fails before or during runtime execution, it should include enough structured evidence to diagnose the provider/runtime failure.

At minimum, one of these should be present:

  • provider error code/message
  • Codebox run id and runtime id
  • runtime status
  • log/transcript refs
  • artifact bundle refs
  • a diagnostic explaining why no Codebox run id/artifacts exist

Acceptance criteria

  • wp-codebox/agent-task-run-result/v1 failures are not empty shells.
  • diagnostics[] contains at least one reviewer-safe reason when status=failed.
  • metadata.provider_error is populated for provider/API failures.
  • refs.logs or refs.transcripts is populated when a Codebox runtime/session exists.
  • If no runtime/session was created, the failure says so explicitly.
  • Homeboy agent-task review/artifacts can surface those refs.

Downstream blocker

This blocks proving the Conductor full solvability loop through the intended Homeboy Lab/Codebox path. Homeboy Lab now preserves the remote envelope, but the provider envelope itself lacks actionable evidence.

AI assistance

  • AI assistance: Yes
  • Tool(s): OpenCode (GPT-5.5)
  • Used for: Capturing the provider-diagnostics blocker from the Conductor proof retry.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions