Meta-Issue: 4 Subsystem Proposals for Production Hardening

Hi maintainers 👋

I'm a contributor who recently had a PR merged here (Russian i18n). I've been running OpenFang in production and absolutely love the architecture. However, under heavy load and complex agentic tasks, I've hit a few bottlenecks related to memory bloat, sandbox limits, and task recovery.

Before designing anything, I spent several weeks auditing 20+ open-source AI agent projects — both studying their architectures and mapping patterns that work in production. The 4 proposals below aren't opinions. They're the result of that research, cross-referenced against concrete gaps I hit running OpenFang under real load. Each one has a full architecture document, Rust types, test matrices, and a phased PR roadmap ready.

Over the last few weeks, I've designed solutions for these bottlenecks. I have detailed specs ready, but **I know maintainer bandwidth is the tightest bottleneck in open source.** I am *not* proposing we do all of this at once. I'm opening this meta-issue to see which of these aligns best with your immediate roadmap.

I don't know what's already on your internal roadmap. It's very possible you've planned some of this differently. If so — I'd rather hear that now and adapt, than invest months in a direction that doesn't fit. My goal is to strengthen OpenFang, not to push a personal design.

### The Menu: Where does it hurt the most today?

| # | Proposal | The Concrete Problem It Solves | Opt-in / Flagged? |
|---|---|---|---|
| **1** | **Sandbox v2** | `memory.grow` OOMs, host thread leaks under load, and missing workspace-root path guards. | Yes |
| **2** | **Memory & Context** | Unbounded SQLite growth; prompt cache pruned blindly (2-5× cost waste); silent episode loss on crash. | Yes |
| **3** | **Planning & Tasks** | Failed agent steps leave files half-edited. No rollback mechanism or unified task UI. | Yes |
| **4** | **Model Council** | Weak local models (e.g., Gemma 4-26B-A4B or Qwen 3.6-35B-A3B) hallucinating on high-stakes tasks without a reality check. | Yes |

*(Note: I also have an early architectural sketch for **a native LSP client** for coding agents, but I've shelved that for now to focus on core stability).*

### On Implementation & Transparency

I want to be transparent about how I work: I'm a systems architect who designs in detail, then implements with AI coding assistance under maintainer guidance. This means:

1. Every PR includes tests I write and verify myself.
2. I'll flag areas where I'm unsure of Rust idioms and ask for style review explicitly.
3. I treat maintainer feedback on code quality as binding direction, not suggestion.

### My Commitments

- **Clean-room MIT:** All designs are built from scratch, inspired by MIT-licensed agent ecosystems (Hermes, OpenSpace, Beads) and standard algorithmic patterns.

**Maintainers:** I propose starting with **Sandbox v2** as the lowest-risk, highest-ROI module. Does this align with your current priorities, or should I pivot to Memory/Planning first?

---

*P.S. By the way, have you considered enabling GitHub Discussions for the repository? RFCs and architectural proposals are easier to structure there than in Issues. I'd be happy to move my posts over if you activate the feature.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meta-Issue: 4 Subsystem Proposals for Production Hardening #1113

The Menu: Where does it hurt the most today?

On Implementation & Transparency

My Commitments

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

#	Proposal	The Concrete Problem It Solves	Opt-in / Flagged?
1	Sandbox v2	`memory.grow` OOMs, host thread leaks under load, and missing workspace-root path guards.	Yes
2	Memory & Context	Unbounded SQLite growth; prompt cache pruned blindly (2-5× cost waste); silent episode loss on crash.	Yes
3	Planning & Tasks	Failed agent steps leave files half-edited. No rollback mechanism or unified task UI.	Yes
4	Model Council	Weak local models (e.g., Gemma 4-26B-A4B or Qwen 3.6-35B-A3B) hallucinating on high-stakes tasks without a reality check.	Yes

Meta-Issue: 4 Subsystem Proposals for Production Hardening #1113

Description

The Menu: Where does it hurt the most today?

On Implementation & Transparency

My Commitments

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions