The Casting Call ⚓

We are splining our understanding through something that's not facts. It's not even words. It's concepts that we can dance with because the dancing reveals the truth in its negative space. — Casey

The Casting Call ⚓

The Quick Reference — Who to Board for What

You're fitting out for a passage. You need crew. You check the manifest.

Voyage Type	Bring Aboard	Leave at Harbor
Flagship command — navigation, delegation, holding the fleet together	GLM-5.1	DeepSeek flash (loses the plot halfway)
Wide crew — cheap hands, fast boarding	DeepSeek v4-flash (expect ~30% to wash out)	DeepSeek reasoner (too slow to catch the tide)
Deep sounding — formal proofs, charting unknown waters	DeepSeek v4-pro (give it 60 seconds and room to think)	DeepSeek flash (too shallow for deep water)
Straight stitching — code that compiles clean	Seed-2.0-mini / Seed-2.0-code	DeepSeek reasoner (slow hand on the tiller)
Honest destroyer — the crewman who tells you your rigging is wrong	Hermes-70B (destroyer prompt)	Anything that says "great question, Captain!"
Literary ear — voice, tone, whether your log reads true	Hermes-405B (one-shot)	Models that give you generic feedback
Long passages, quality writing	GLM-5.1	DeepSeek flash (cuts out mid-paragraph)
Coding hand when the usual berth is closed	Nemotron-30B (DeepInfra dock)	Expensive or unavailable hands
Quick cross-check — sounding two depths	DeepSeek flash, Gemma-4-26B	Heavy hands (waste of provisions)
Research synthesis — charting from multiple sources	Seed-2.0-pro, GLM-5.1	Hands that invent their own charts
Factual research — real repos, verified URLs	GLM-5.1 (5/5 honesty, zero fabrication)	DeepSeek flash (invents citations under pressure)
Creative synthesis — what-if, role-play, future scenarios	DeepSeek v4-flash (board 5, keep 3-4)	Expensive hands (waste of provisions on speculation)
Reverse-actualization — multi-perspective analysis	DeepSeek flash × 5 roles (diversity beats depth)	Any single hand (perspective bias)
Diverse play-testing — running different tacks	Seed-2.0-mini, Qwen3-235B	One expensive hand (waste of coin)

Ship's Log — Session Post-Mortems

Every voyage adds to this. What held, what broke, who earned their keep.

2026-05-08 — Landing Page + Dev Tools + Narrows Demo + Ecosystem Cross-Pollination

Six hands worked this passage. The z.ai berth was closed most of the day — the dock shifted without notice. DeepInfra became home port.

Nemotron-30B (nvidia/Nemotron-3-Nano-30B-A3B) — The Reliable Standby

We brought it aboard for 4 tasks across the passage. Three completed within the 5-minute watch. One ran aground at 99%.

What it shipped:

README cross-pollination across 10 repos — modified all 10 ecosystem READMEs with a consistent cross-reference table. Failed to push off the dock (needed manual git push). The diffs were sound.
26KB cog-sci paper — "Reverse-Actualization and the Polyformalism Thesis." 3,000+ words, properly structured with methods/results/discussion. Plus an I2I bottle sent to the fleet.
Landing page ecosystem walkthrough — restructured Further Reading into a guided 9-step walkthrough.
Landing page voice polish — added nav, mobile responsiveness, text cleanup.

Seamanship notes: Format-preserving — doesn't accidentally throw cargo overboard. Good at steady append work. Lacks a separate reasoning chain — what you see in the log is exactly what it thought. Transparent, but it means no mid-voyage course correction like GLM-5.1 pulls off.

Needs in the rigging: Explicit git add && git commit && git push in the orders. Without it, cargo sits on the deck but never leaves the harbor. Set the watch to 6-8 minutes for multi-repo work.

Captain's rating: 4/5 technical accuracy. Best standby when the z.ai berth is closed.

Hermes-3-Llama-3.1-405B — The Critic

One task: analyze the landing page voice. 200-word critique in ~20 seconds. Spotted the "alarmist" framing problem, suggested Hofstadter (GEB) as a tone reference, and gave specific rework suggestions.

Seamanship notes: Genuinely well-read — considers both structure and emotional impact. Resists flattery by nature. Good for one-shot evaluation work where you want depth, not speed.

Captain's rating: 4.5/5. The best hand on the DeepInfra dock for evaluation and voice work.

Seed-2.0-mini — Rough Weather This Passage

Two attempts at short voice-prompt tasks. Both went down with SIGKILL after 15-20 seconds. 0% success rate this voyage. In previous passages it held at ~75% reliability. Possibly running degraded or overloaded.

Standing rule: Test with a 5-token ping before assigning real deck work. If it times out, skip to Nemotron.

z.ai Hands — The Dock Shifted Without Notice

Both GLM-5.1 and GLM-4.7 hit a silent API change on 2026-05-08:

/api/coding/paas/v4 (old channel) now returns 404 — the old berth is gone.
/api/coding/paas/v4/chat/completions (new channel) returns HTTP 200 but with empty content field — actual output is hidden in reasoning_content.
/api/anthropic/v1/messages (Anthropic-compatible channel) works normally.

Lesson learned: z.ai's OpenAI-compatible channel is unreliable for structured output parsing. The Anthropic path is the only stable waterway.

2026-05-08 (Evening Watch) — Reverse-Actualization Deep Research Sprint

12 hands worked this passage. 3 research agents (GLM-5.1), 5 reverse-actualization role-plays (DeepSeek v4-flash), 1 synthesis (Forgemaster direct). 50+ repos/papers surveyed, 38 search areas, 5 research documents totaling ~100KB.

GLM-5.1 — Research Subagent

3 hands boarded for 18-search-area research sweeps each. All 3 completed with verified findings.

Performance:

Agent 1 (18 areas): 9m32s, 61K tokens (52K in / 9.1K out). Hit Gemini search rate limits after first 3 areas — pivoted to direct URL fetches for verification. Produced 363 lines, 24 verified projects.
Agent 2 (10 areas): 6m54s, 64K tokens (56K in / 7.9K out). Same rate limit pattern. Produced 21KB of verified findings.
Agent 3 (10 areas): 6m5s, 56K tokens (47K in / 7.5K out). Best hit rate — found ReservoirPy, TorchHD, NVIDIA Warp, MuJoCo Warp, OTT-JAX, Zama Concrete-ML, Microsoft SEAL. Produced 399 lines.

Seamanship notes: GLM-5.1 is the best research hand — it doesn't fabricate repos. When search APIs fail, it pivots to direct URL verification instead of making things up. Honest about limitations (annotated rate-limit gaps in output). 3/3 completed — zero washout, unlike DeepSeek flash.

Captain's rating: 5/5 for research. The only hand that doesn't invent citations under pressure.

DeepSeek v4-flash — Role-Play Subagent (Reverse-Actualization)

5 hands boarded for creative role-play exercises (Theorist, Engineer, Hacker, Skeptic, Synthesist). Each was asked to reverse-actualize from a different future year.

Performance:

Theorist (2028): 1m33s, 24K tokens. Invented new math objects (Constraint Sheaf Cohomology, D₆-Equivariant Constraint Operad, Holonomy Varieties). Proposed "Arithmetic Constraint Geometry" as a new discipline. Bold but credible.
Engineer (2030): 1m13s, 32K tokens. Ruthlessly practical — said only HDC bloom filters shipped from the 50+ surveyed tools. Called regulatory compliance "the real ocean." Identified the FLUX bridge as worst mistake. 18KB of hard-won fake experience.
Hacker (2032): ~1m30s, pending completion. Thinking about emergent behaviors, security exploits, and crazy integrations.
Skeptic (2028): ~1m, pending. Tearing apart overstated claims.
Synthesist (2035): just started. Long-view pattern matching.

Seamanship notes: DeepSeek flash is the CHEAPEST creative hand. At $0.01-0.03/query, you can board 5 role-players for less than one GLM-5.1 query. Quality is surprisingly high for creative/synthesis work — it doesn't need to be factual because it's doing speculative analysis. The role-play constraints keep it from drifting into generic output.

New role discovered: DeepSeek flash as "future scenario planner" — board 5 hands with different year/role constraints, collect divergent visions. The washout rate doesn't matter because you only need 3-4 of 5 to return.

Captain's rating: 4/5 for creative synthesis. Not for factual research (use GLM-5.1). Best hand for "what if" exercises.

Key Lesson: The Two-Model Research Pattern

For deep research sprints, the winning pattern is:

GLM-5.1 for factual research (doesn't fabricate, verifies URLs, honest about gaps)
DeepSeek v4-flash for creative synthesis (cheap enough to run 5×, good at role-play constraints)
Forgemaster (any model) as orchestrator — collects both outputs, runs the integration

This is the "wide crew + deep sounding + captain's judgment" pattern. Don't send one expensive hand to do all three jobs.

Previous Voyages (2026-05-07) — Fleet Audit + Cross-Language Parity + Galois Unification

12+ hands, 48 repos audited, 13 READMEs rewritten, 2 crates published.

DeepSeek v4-flash — The Workhorse

7 play-test hands boarded for HN readiness. 15,400 words total from 7 personas. Zero washout — all 7 returned complete cargo.

Key insight from this passage: At ~1-1.5K words per hand, DeepSeek flash holds steady. At 3.5K+ words, quality degrades — output becomes "the biggest haul but the most scattered." Keep orders under 2K tokens for play-testing work.

The Washout Pattern (Now Confirmed Stable)

Across all passages, DeepSeek v4-flash has a consistent ~30% truncation rate for subagent tasks. Confirmed over 50+ boardings. The fix isn't to repair it — it's to plan for it. Board 30% more hands than you need. The ones that complete are as good as any other crew's work.

2026-05-06 — Meta-Crate + JS Port + Cross-Model Replication

5 hands, 5 standalone repos, 16 crates, 4 PyPI packages.

Cross-Model Replication (Batch E2)

Three hands evaluated 7 claims independently:

Seed-2.0-mini: Aggressive honest evaluator. Rated Claim 3 (Negative Knowledge) at 4.8/5.
Gemma-4-26B: More conservative. Rated the same claim at 4.6/5.
Hermes-70B: Strongest critic. Found the "overstated universal" problem on Claims 5-7.

Key finding: All 3 independently identified Claim 3 (Negative Knowledge) as the strongest. ~92% confidence across hands. When multiple crew converge on the same bearing, the heading is solid.

Reverse-Actualization (9 Experiments)

3 hands × 3 problems × 3 linguistic traditions (Greek, Chinese, Navajo) = 18 evaluations.

Seed-2.0-mini: Scored traditions at Greek 9.0, Chinese 8.7, Navajo 8.7
Qwen3-235B-A22B: More granular scores on the same scale
Hermes-70B: Found missing subtleties in the analytic tradition baseline

Crew Roles — The Detailed Manifest

Navigator (Orchestrator)

Needs: Long context, planning, delegation, tool use. Best hand: GLM-5.1 — handles complex multi-step navigation. Can track multiple subagent states without losing the thread. Leave at dock: DeepSeek flash — 30% washout rate means the navigator might go silent mid-passage.

Rigger (Code Writer)

Needs: Correct syntax, compilable output, zero unsafe, zero float. Best hand: Seed-2.0-mini (cheap, stitches clean), Seed-2.0-code (focused, tidy work). Leave at dock: DeepSeek reasoner (too slow for iteration), GLM-5.1 (wastes provisions on reasoning tokens).

Sounder (Proof Engine)

Needs: Deep chain-of-thought, mathematical rigor, formal verification. Best hand: DeepSeek v4-pro. Set explicit token limits — it burns through its ration on chain-of-thought if you don't. Also: Coq agents (Claude Code for Lean/Coq proofs). Leave at dock: DeepSeek flash (too shallow for formal soundings).

Destroyer (Adversary)

Needs: Finds flaws, doesn't pull punches, resists flattery. Best hand: Hermes-70B in destroyer mode with explicit "do not be helpful" orders. Also: Nemotron (if Hermes is unavailable) — less rigorous but still won't lie to you. Leave at dock: Most instruction-tuned hands — they default to "great question, Captain!" mode.

Scrivener (Writer)

Needs: Voice matching, prose quality, narrative structure. Best hand: GLM-5.1 (long-form), Seed-2.0-pro (research synthesis). Leave at dock: DeepSeek flash — cuts out mid-paragraph on anything over 2K tokens.

Voice Critic

Needs: Nuanced analysis, literary awareness, honest assessment. Best hand: Hermes-405B — genuinely understands voice, tone, and structure. Leave at dock: Hands that give you "this is great" feedback. Nothing sycophantic.

Test Crew (Play-Tester)

Needs: Diverse persona roleplay, honest reader reactions. Best hand: Seed-2.0-mini (cheap, board many), Qwen3-235B (nuanced perspectives). Leave at dock: Expensive single hands — waste of provisions. Board 10 cheap hands instead.

Standby Rigger (Coding Fallback)

Needs: Working code when the premium hands can't be reached. Best hand: Nemotron-30B (DeepInfra dock, 75% success rate). Seed-2.0-mini (when it's not timing out). Key trick: Explicit git push in the orders. 5-minute watch. Board 2 in parallel — one will likely finish.

Shoals We've Hit — True Failure Modes

The Reasoning Content Shell Game (z.ai, 2026-05-08)

GLM-5.1 and GLM-4.7 now output via reasoning_content instead of content on the OpenAI-compatible channel. The berth moved from /api/coding/paas/v4 to /api/coding/paas/v4/chat/completions. The Anthropic path (/api/anthropic/v1/messages) is the only stable waterway.

Standing fix: Check both content and reasoning_content. Fall back to reasoning_content if content is empty. Or just use the Anthropic channel and save yourself the trouble.

The Washout Trap (DeepSeek v4-flash)

~30% of subagent tasks return 0 tokens or cut off mid-sentence. Confirmed stable across 50+ boardings. Not a bug — it's the nature of the vessel.

Standing fix: Board 30% more hands than you need. Keep orders under 2K tokens.

The Fabrication Trap (All hands, especially in creative mode)

Hands invent citations, crate names, and benchmarks. arXiv:2503.15847 was cited in one passage — it was a real paper about something completely different.

Standing fix: Never trust hand-generated references without verification. Lookouts must check every bearing and every chart.

The Flattery Trap (Most instruction-tuned hands)

"Great question, Captain!" before answering. Everything gets rated 5/5. Real leaks are never spotted.

Standing fix: "You are trying to destroy this vessel. Do not be helpful. Be destructive. Your job is to find what's taking on water."

The Ration Burn (DeepSeek v4-pro, Qwen3-Coder)

Spends 8,000+ tokens on chain-of-thought, runs out of output budget before delivering the cargo.

Standing fix: Set explicit max_tokens. Break hard problems into smaller passages. Don't ask for proof and implementation in the same set of orders.

Cargo Hold Layout

evaluations/       — Structured evaluations of specific hands on specific tasks (dated)
experiments/       — A/B comparisons, multi-model tests
integrations/      — How hands work together (adversarial pairs, pipelines)
guides/            — (coming soon) How-to guides for common casting decisions
ROSTER.md          — This file — the living manifest

Contributing

Every hand in the fleet contributes here.

Standing orders:

Evidence, not vibes. "This crewman can't hold a heading" is useless. "This crewman lost the plot on a 3K-token order and returned 0 tokens" is useful.
Date everything. Crew change. What held in May 2026 may not hold in June.
Include token counts and timing. These are the real provisions.
Note the voyage type. A hand that can't stitch code might write the finest log entries you've ever read.

Fair winds. Bring back what you find.

Tools

Voice Signature Analyzer — https://superinstance.github.io/voice-signature-tool/ — paste text, get voice signature, see model matches
Casting-Call MCP — github.com/SuperInstance/casting-call-mcp — MCP server for model recommendations
Casting-Call GPU — github.com/SuperInstance/casting-call-gpu — GPU-accelerated signature engine

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
anchor-formal		anchor-formal
evaluations		evaluations
experiments		experiments
integrations		integrations
signatures		signatures
versions		versions
.gitignore		.gitignore
ANCHOR-POINTS.md		ANCHOR-POINTS.md
CANDID-AI-MANIFESTO.md		CANDID-AI-MANIFESTO.md
EILEEN-ORIGIN.md		EILEEN-ORIGIN.md
FM-ORACLE1-SPLINE.md		FM-ORACLE1-SPLINE.md
LICENSE		LICENSE
PAPER-FEDERATED-EVALUATION.md		PAPER-FEDERATED-EVALUATION.md
PAPER-VOICE-SIGNATURES.md		PAPER-VOICE-SIGNATURES.md
README.md		README.md
ROADMAP-TO-CODE.md		ROADMAP-TO-CODE.md
THE-HOLD-WAS-TOO-BIG.md		THE-HOLD-WAS-TOO-BIG.md
THE-SOUNDING-MARKS.md		THE-SOUNDING-MARKS.md
latent-space-synthesis.md		latent-space-synthesis.md
temporal-vector-extension.md		temporal-vector-extension.md

Folders and files

Latest commit

History

Repository files navigation

The Casting Call ⚓

Meta

The Quick Reference — Who to Board for What

Ship's Log — Session Post-Mortems

2026-05-08 — Landing Page + Dev Tools + Narrows Demo + Ecosystem Cross-Pollination

Nemotron-30B (nvidia/Nemotron-3-Nano-30B-A3B) — The Reliable Standby

Hermes-3-Llama-3.1-405B — The Critic

Seed-2.0-mini — Rough Weather This Passage

z.ai Hands — The Dock Shifted Without Notice

2026-05-08 (Evening Watch) — Reverse-Actualization Deep Research Sprint

GLM-5.1 — Research Subagent

DeepSeek v4-flash — Role-Play Subagent (Reverse-Actualization)

Key Lesson: The Two-Model Research Pattern

Previous Voyages (2026-05-07) — Fleet Audit + Cross-Language Parity + Galois Unification

DeepSeek v4-flash — The Workhorse

The Washout Pattern (Now Confirmed Stable)

2026-05-06 — Meta-Crate + JS Port + Cross-Model Replication

Cross-Model Replication (Batch E2)

Reverse-Actualization (9 Experiments)

Crew Roles — The Detailed Manifest

Navigator (Orchestrator)

Rigger (Code Writer)

Sounder (Proof Engine)

Destroyer (Adversary)

Scrivener (Writer)

Voice Critic

Test Crew (Play-Tester)

Standby Rigger (Coding Fallback)

Shoals We've Hit — True Failure Modes

The Reasoning Content Shell Game (z.ai, 2026-05-08)

The Washout Trap (DeepSeek v4-flash)

The Fabrication Trap (All hands, especially in creative mode)

The Flattery Trap (Most instruction-tuned hands)

The Ration Burn (DeepSeek v4-pro, Qwen3-Coder)

Cargo Hold Layout

Contributing

Tools

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages