Skip to content

Master#3

Closed
heidi-dang wants to merge 9 commits into
mainfrom
master
Closed

Master#3
heidi-dang wants to merge 9 commits into
mainfrom
master

Conversation

@heidi-dang
Copy link
Copy Markdown
Owner

Summary

Changes

Screenshots

Before After

Testing

bun run typecheck
bun test

Related Issues

… ports

- Revert src/hooks/runtime-enforcement/hook.ts to HEAD (runtime authority unchanged)
- Revert src/agents/runtime/state-ledger.ts to HEAD (runtime authority unchanged)
- Revert src/agents/runtime/tool-runner.ts to HEAD (runtime authority unchanged)
- Revert src/agents/oracle.ts to HEAD (Heidi baseline is already stronger)
- Revert src/agents/sisyphus.ts (flat file) to HEAD - Heidi's modular flow preserved
- Revert src/agents/builtin-agents.ts to HEAD - sisyphus-agent wiring preserved
- Hephaestus dir: prompt-only, imports hard-blocks/anti-patterns from Heidi's prompts module
- Sisyphus dir: prompt-only, imports hard-blocks/anti-patterns from Heidi's prompts module
- Add isGpt5_4Model/isGpt5_3CodexModel type guards to types.ts for Hephaestus dispatch
- Update doctor check: remove dynamic-agent-prompt-builder as forbidden (passive library)
- Update doctor check: loop guard deferred to separate runtime-only PR

Phase 0 changes (Atlas/Gemini/GPT verification wave) and Sisyphus/Hephaestus
prompt-layer capability improvements are preserved. No runtime changes in this PR.
…edger

- Removed dynamic-agent-prompt-builder imports from Sisyphus and Hephaestus
- Ported official wording/orchestration into src/agents/prompts/orchestration.ts
- Moved Agent/Skill/Tool/Category types into src/agents/types.ts
- Deleted dynamic-agent-prompt-builder and restored it to the doctor forbidden list
- Tightened runtime enforcement hook to scan current chat flow instead of global historical ledger
- Dropped generic phrase matching ('success') from enforcement checks
- Unused/shadow state-ledger from agents runtime deleted
- Unified state-ledger across agent and core runtime
- Enhanced State Ledger schema (success, verified, changedState, stdout, sessionID)
- complete_task & query_ledger mapped to strictly verified, session-scoped truth
- execution-journal hook forwards full status booleans and stdout to ledger
- tool-contract hook strictly types metadata booleans and asserts ledger presence for state changes
- Purged prototype scaffolding (agent-runner, tool-runner, context-builder)
- Updated doctor checks and docs for Phase 1 closure
- Removed 'Hephaestus is designed exclusively for GPT' block in no-hephaestus-non-gpt hook
- Removed requiresProvider limitations from shared and CLI model requirements for Hephaestus
- Added model resolution regression tests specifying Grok resolves properly for config
- Updated upstream capability doctor to strictly forbid the return of these GPT-only rule strings
- Wired DETERMINISTIC_TOOLS into active createToolRegistry
- Exposed execution, plan, and runtime enforcement hooks via createToolGuardHooks
- Enforced strict boolean contract (success, verified, changedState) on all deterministic tools
- Added unit tests for isolated query_ledger and complete_task behavior
- Updated upstream capability tracker
… for xAI/Grok models by adding cache read tokens back to input.
…t metadata, create `execution.jsonl` for journaling, and add truth model integration tests.
…er entries and adding runtime validation for message claims.
…ating ledger entries and implement flow isolation in runtime enforcement to validate state claims against the current execution flow.
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces new agents, enhances existing ones, and implements runtime enforcement and verification mechanisms to improve the reliability and effectiveness of the AI system. It also adds support for new models and optimizes prompts for better performance.

Highlights

  • New Agents: Introduces two new agents: Hephaestus, an autonomous deep worker, and Oracle, a strategic technical advisor.
  • Enhanced Sisyphus: Significantly enhances the Sisyphus agent with improved orchestration capabilities, intent parsing, and delegation strategies.
  • Runtime Enforcement: Implements a runtime enforcement mechanism to ensure agents adhere to tool contracts and prevent hallucinations.
  • Verification Workflow: Introduces a verification workflow, especially for 'ultrawork' mode, requiring Oracle verification before task completion.
  • Model Support: Adds support of new models, including Grok and Gemini, with specific prompt optimizations and handling.
Changelog
  • .official/master/oracle.ts
    • Added a new Oracle agent for strategic technical advice.
  • .official/master/sisyphus.ts
    • Added a new Sisyphus agent with enhanced orchestration capabilities.
  • .official/src/agents/hephaestus-agent.ts
    • Added a new Hephaestus agent for autonomous deep work.
  • .official/src/agents/hephaestus/agent.test.ts
    • Added tests for the Hephaestus agent.
  • .official/src/agents/hephaestus/agent.ts
    • Added core logic for the Hephaestus agent, including prompt source and creation.
  • .official/src/agents/hephaestus/gpt-5-3-codex.ts
    • Added GPT-5.3 Codex optimized prompt for Hephaestus.
  • .official/src/agents/hephaestus/gpt-5-4.ts
    • Added GPT-5.4 optimized prompt for Hephaestus.
  • .official/src/agents/hephaestus/gpt.ts
    • Added generic GPT prompt for Hephaestus as a fallback.
  • .official/src/agents/hephaestus/index.ts
    • Added index file for Hephaestus agent components.
  • .official/src/agents/oracle.ts
    • Added core logic for the Oracle agent, including prompt metadata and creation.
  • .official/src/agents/sisyphus.ts
    • Added core logic for the Sisyphus agent, including prompt building and model handling.
  • .official/src/hooks/ralph-loop/AGENTS.md
    • Added documentation for the Ralph Loop hook.
  • .official/src/hooks/ralph-loop/command-arguments.ts
    • Added parsing logic for Ralph Loop command arguments.
  • .official/src/hooks/ralph-loop/completion-handler.ts
    • Added handler for detected completion in Ralph Loop.
  • .official/src/hooks/ralph-loop/completion-promise-detector.test.ts
    • Added tests for completion promise detection.
  • .official/src/hooks/ralph-loop/completion-promise-detector.ts
    • Added logic to detect completion promises in session messages.
  • .official/src/hooks/ralph-loop/constants.ts
    • Added constants for the Ralph Loop hook.
  • .official/src/hooks/ralph-loop/continuation-prompt-builder.ts
    • Added logic to build continuation prompts for Ralph Loop.
  • .official/src/hooks/ralph-loop/continuation-prompt-injector.ts
    • Added logic to inject continuation prompts into active sessions.
  • .official/src/hooks/ralph-loop/index.test.ts
    • Added tests for the Ralph Loop hook.
  • .official/src/hooks/ralph-loop/index.ts
    • Added index file for Ralph Loop hook components.
  • .official/src/hooks/ralph-loop/iteration-continuation.ts
    • Added logic to continue iterations in Ralph Loop.
  • .official/src/hooks/ralph-loop/loop-session-recovery.ts
    • Added logic to recover from crashed/interrupted loop sessions.
  • .official/src/hooks/ralph-loop/loop-state-controller.ts
    • Added logic to control the state of the Ralph Loop.
  • .official/src/hooks/ralph-loop/message-storage-directory.ts
    • Added logic to manage temporary directories for prompt injection.
  • .official/src/hooks/ralph-loop/ralph-loop-event-handler.ts
    • Added event handler for Ralph Loop to manage session events.
  • .official/src/hooks/ralph-loop/ralph-loop-hook.ts
    • Added core logic for the Ralph Loop hook, composing controller, recovery, and event handler.
  • .official/src/hooks/ralph-loop/reset-strategy-race-condition.test.ts
    • Added tests for race conditions in the reset strategy of Ralph Loop.
  • .official/src/hooks/ralph-loop/session-event-handler.ts
    • Added handlers for session events (deleted, error) in Ralph Loop.
  • .official/src/hooks/ralph-loop/storage.ts
    • Added storage logic for Ralph Loop state.
  • .official/src/hooks/ralph-loop/types.ts
    • Added type definitions for Ralph Loop.
  • .official/src/hooks/ralph-loop/ulw-loop-verification.test.ts
    • Added tests for ultrawork loop verification.
  • .official/src/hooks/ralph-loop/with-timeout.ts
    • Added utility function for API calls with timeout.
  • docs/implementation_upstream_capability_merge.md
    • Added documentation for the selective upstream capability merge.
  • src/agents/prompts/index.ts
    • Moved prompt-related logic out of dynamic-agent-prompt-builder and into this directory.
  • src/agents/types.ts
    • Consolidated AvailableAgent, AvailableTool, AvailableSkill, and AvailableCategory types into this file.
  • src/cli/model-fallback-requirements.ts
    • Updated model fallback requirements.
  • src/hooks/no-hephaestus-non-gpt/hook.ts
    • Removed the no-hephaestus-non-gpt hook.
  • src/runtime/state-ledger.ts
    • Implemented flow isolation in state ledger.
  • src/runtime/tools/complete-task.ts
    • Ensured complete_task only returns verified, successful, current-session entries.
  • tools/checks/check_upstream_capability_merge.py
    • Added a new doctor check for upstream capability merge.
Activity
  • Added new agents and enhanced existing ones.
  • Implemented runtime enforcement and verification mechanisms.
  • Added support for new models and optimized prompts.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several changes, including creating new files for Oracle and Sisyphus agents, modifying Hephaestus-related files, and updating hooks and tool configurations. The review comments suggest correcting a typo in the Hephaestus agent description, removing a redundant function, simplifying type assertions, improving the prompt patching for Gemini models, and removing an extra space. The changes also involve migrating logic from dynamic prompt builders to native prompts, enforcing stricter tool contracts, and adding a new hook for xAI usage patching.


return {
description:
"Autonomous Deep Worker - goal-oriented execution with GPT 5.2 Codex. Explores thoroughly before acting, uses explore/librarian agents for comprehensive context, completes tasks end-to-end. Inspired by AmpCode deep mode. (Hephaestus - OhMyOpenCode)",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a small typo in the agent description. It mentions GPT 5.2 Codex, but this file is for GPT 5.3 Codex.

Suggested change
"Autonomous Deep Worker - goal-oriented execution with GPT 5.2 Codex. Explores thoroughly before acting, uses explore/librarian agents for comprehensive context, completes tasks end-to-end. Inspired by AmpCode deep mode. (Hephaestus - OhMyOpenCode)",
"Autonomous Deep Worker - goal-oriented execution with GPT 5.3 Codex. Explores thoroughly before acting, uses explore/librarian agents for comprehensive context, completes tasks end-to-end. Inspired by AmpCode deep mode. (Hephaestus - OhMyOpenCode)",

Comment on lines +502 to +538
export function createHephaestusAgent(
model: string,
availableAgents?: AvailableAgent[],
availableToolNames?: string[],
availableSkills?: AvailableSkill[],
availableCategories?: AvailableCategory[],
useTaskSystem = false,
): AgentConfig {
const tools = availableToolNames ? categorizeTools(availableToolNames) : [];
const skills = availableSkills ?? [];
const categories = availableCategories ?? [];
const prompt = availableAgents
? buildHephaestusPrompt(
availableAgents,
tools,
skills,
categories,
useTaskSystem,
)
: buildHephaestusPrompt([], tools, skills, categories, useTaskSystem);

return {
description:
"Autonomous Deep Worker - goal-oriented execution with GPT 5.2 Codex. Explores thoroughly before acting, uses explore/librarian agents for comprehensive context, completes tasks end-to-end. Inspired by AmpCode deep mode. (Hephaestus - OhMyOpenCode)",
mode: MODE,
model,
maxTokens: 32000,
prompt,
color: "#D97706", // Forged Amber - Golden heated metal, divine craftsman
permission: {
question: "allow",
call_omo_agent: "deny",
} as AgentConfig["permission"],
reasoningEffort: "medium",
};
}
createHephaestusAgent.mode = MODE;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This createHephaestusAgent function appears to be redundant. The main factory function in src/agents/hephaestus/agent.ts already handles the logic for selecting the correct prompt builder based on the model. This duplicate function is unused and could cause confusion. I recommend removing it to keep the codebase clean and maintain a single source of truth for agent creation.

Comment on lines +263 to +276
if (isGptModel(model)) {
return {
...base,
prompt: ORACLE_GPT_PROMPT,
reasoningEffort: "medium",
textVerbosity: "high",
} as AgentConfig;
}

return {
...base,
thinking: { type: "enabled", budgetTokens: 32000 },
} as AgentConfig;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The type assertions as AgentConfig on the return statements are likely redundant. Since the function's return type is already declared as AgentConfig, the TypeScript compiler should be able to infer the type of the returned objects. Removing these explicit casts would make the code slightly cleaner.

Suggested change
if (isGptModel(model)) {
return {
...base,
prompt: ORACLE_GPT_PROMPT,
reasoningEffort: "medium",
textVerbosity: "high",
} as AgentConfig;
}
return {
...base,
thinking: { type: "enabled", budgetTokens: 32000 },
} as AgentConfig;
}
if (isGptModel(model)) {
return {
...base,
prompt: ORACLE_GPT_PROMPT,
reasoningEffort: "medium",
textVerbosity: "high",
};
}
return {
...base,
thinking: { type: "enabled", budgetTokens: 32000 },
};

Comment on lines +489 to +509
if (isGeminiModel(model)) {
// 1. Intent gate + tool mandate — early in prompt (after intent verbalization)
prompt = prompt.replace(
"</intent_verbalization>",
`</intent_verbalization>\n\n${buildGeminiIntentGateEnforcement()}\n\n${buildGeminiToolMandate()}`
);

// 2. Tool guide + examples — after tool_usage_rules (where tools are discussed)
prompt = prompt.replace(
"</tool_usage_rules>",
`</tool_usage_rules>\n\n${buildGeminiToolGuide()}\n\n${buildGeminiToolCallExamples()}`
);

// 3. Delegation + verification overrides — before Constraints (NOT at prompt end)
// Gemini suffers from lost-in-the-middle: content at prompt end gets weaker attention.
// Placing these before <Constraints> ensures they're in a high-attention zone.
prompt = prompt.replace(
"<Constraints>",
`${buildGeminiDelegationOverride()}\n\n${buildGeminiVerificationOverride()}\n\n<Constraints>`
);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using string.replace() to patch the prompt for Gemini models is a bit brittle. If the base prompt's structure or the placeholder strings change, these replacements could fail silently. A more robust approach would be to make the prompt builder more modular, allowing different sections to be conditionally included or replaced based on the model. For example, you could have a buildGeminiSpecificSections function that returns the necessary overrides, and incorporate them during the initial prompt construction.


export function getGeminiAtlasPrompt(): string {
return ATLAS_GEMINI_SYSTEM_PROMPT
return ATLAS_GEMINI_SYSTEM_PROMPT
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's an extra space before return. Please remove it for consistent formatting.

Suggested change
return ATLAS_GEMINI_SYSTEM_PROMPT
return ATLAS_GEMINI_SYSTEM_PROMPT

heidi-dang added a commit that referenced this pull request Mar 6, 2026
* fix(phase1): revert runtime file changes, keep prompt-only capability ports

- Revert src/hooks/runtime-enforcement/hook.ts to HEAD (runtime authority unchanged)
- Revert src/agents/runtime/state-ledger.ts to HEAD (runtime authority unchanged)
- Revert src/agents/runtime/tool-runner.ts to HEAD (runtime authority unchanged)
- Revert src/agents/oracle.ts to HEAD (Heidi baseline is already stronger)
- Revert src/agents/sisyphus.ts (flat file) to HEAD - Heidi's modular flow preserved
- Revert src/agents/builtin-agents.ts to HEAD - sisyphus-agent wiring preserved
- Hephaestus dir: prompt-only, imports hard-blocks/anti-patterns from Heidi's prompts module
- Sisyphus dir: prompt-only, imports hard-blocks/anti-patterns from Heidi's prompts module
- Add isGpt5_4Model/isGpt5_3CodexModel type guards to types.ts for Hephaestus dispatch
- Update doctor check: remove dynamic-agent-prompt-builder as forbidden (passive library)
- Update doctor check: loop guard deferred to separate runtime-only PR

Phase 0 changes (Atlas/Gemini/GPT verification wave) and Sisyphus/Hephaestus
prompt-layer capability improvements are preserved. No runtime changes in this PR.

* fix(phase1): completely eliminate builder from capabilities & unify ledger

- Removed dynamic-agent-prompt-builder imports from Sisyphus and Hephaestus
- Ported official wording/orchestration into src/agents/prompts/orchestration.ts
- Moved Agent/Skill/Tool/Category types into src/agents/types.ts
- Deleted dynamic-agent-prompt-builder and restored it to the doctor forbidden list
- Tightened runtime enforcement hook to scan current chat flow instead of global historical ledger
- Dropped generic phrase matching ('success') from enforcement checks
- Unused/shadow state-ledger from agents runtime deleted
- Unified state-ledger across agent and core runtime

* Refactor the truth model to prevent unverified bash commands from creating ledger entries and implement flow isolation in runtime enforcement to validate state claims against the current execution flow.

* fix(model-requirements): remove hardcoded gpt-5.3-codex constraints and allow non-gpt models for hephaestus
@heidi-dang heidi-dang closed this Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant