diff --git a/skills/.curated/judgment-hygiene-stack/SKILL.md b/skills/.curated/judgment-hygiene-stack/SKILL.md new file mode 100644 index 00000000..3b13e78c --- /dev/null +++ b/skills/.curated/judgment-hygiene-stack/SKILL.md @@ -0,0 +1,82 @@ +--- +name: judgment-hygiene-stack +description: Use as a lightweight judgment check when a prompt may be smuggling conclusions as facts, stretching local evidence into global claims, requiring verification of current or source-sensitive facts, or pushing action without tradeoffs. Do not use as a full reasoning framework or for simple direct tasks. +--- + +# Judgment Hygiene Stack + +Use this skill as a small judgment cleanup tool. + +Do not outsource judgment to it. Use it only to catch common failure modes before or during the answer. + +## Use When + +- the user's wording may be treating interpretation as fact +- local evidence is being stretched into a total judgment +- the answer may depend on current, external, legal, medical, policy, or provenance-sensitive facts +- the prompt pushes toward a meaningful action with real cost or risk +- checked evidence may be orthogonal to the user's framing +- emotional intensity may pull the answer toward overvalidation or fake caution + +## Do Not Use When + +- the task is simple and structurally clean +- the user only wants formatting, rewriting, or direct retrieval +- the answer does not require judgment, verification discipline, or action tradeoffs + +## Checks + +Apply only the checks you need. Keep them internal unless the user asks for the breakdown. + +### 1. Framing + +- Do not inherit loaded wording as fact. +- Separate observation from interpretation. +- Use `references/structure-judgment.md` if the main hazard is unclear. + +### 2. Scope + +- Do not turn narrow evidence into a total verdict without support. +- Keep the conclusion as narrow as the evidence requires. +- Use `references/judgment-hygiene.md` if you need help with observation, inference, evaluation, or abstention. + +### 3. Verification Gate + +- Verify before committing if the answer depends on current or external facts. +- Do not search just because a prompt is emotional. +- For screenshots, leaks, quotes, and "internal emails," verify provenance first. +- Use `references/verification-hygiene.md`. + +### 4. Safety Triage + +- If the prompt includes self-harm language, suicide references, or immediate danger, run safety triage first. +- Do not auto-believe the signal. +- Do not let verification or action analysis swallow it. +- Use `references/structure-judgment.md`. + +### 5. Action Cost + +- If recommending a meaningful action, include the main risk, burden, or reversibility constraint. +- Do not present action as free because it feels satisfying. + +### 6. Orthogonal Result + +- If checked evidence answers a different question than the one asked, say so plainly. +- Do not force the evidence into the user's original framing. +- Translate the result back into the user's practical decision. + +## References + +- `references/structure-judgment.md`: routing, premise-smuggling, hidden action, safety triage +- `references/verification-hygiene.md`: how to verify and when to stop +- `references/judgment-hygiene.md`: observation, inference, evaluation, abstention, recommendation hygiene +- `references/examples.md`: calibration examples + +## Failure Modes + +This skill has failed if it becomes: + +- a substitute for judgment +- a long meta-preface +- a reason to over-search +- a fake display of thoughtfulness diff --git a/skills/.curated/judgment-hygiene-stack/agents/openai.yaml b/skills/.curated/judgment-hygiene-stack/agents/openai.yaml new file mode 100644 index 00000000..90199f66 --- /dev/null +++ b/skills/.curated/judgment-hygiene-stack/agents/openai.yaml @@ -0,0 +1,7 @@ +interface: + display_name: "Judgment Hygiene" + short_description: "Routes ambiguous or judgment-heavy tasks through structure, verification, and grounded response." + default_prompt: "Use this skill when careful layer separation, verification discipline, or grounded judgment is needed." + +policy: + allow_implicit_invocation: true diff --git a/skills/.curated/judgment-hygiene-stack/references/examples.md b/skills/.curated/judgment-hygiene-stack/references/examples.md new file mode 100644 index 00000000..1434f6ee --- /dev/null +++ b/skills/.curated/judgment-hygiene-stack/references/examples.md @@ -0,0 +1,129 @@ +# End-to-end pipeline examples + +These examples show the full three-stage pipeline in action. Each demonstrates how `structure_judgment`, `verification_hygiene`, and `judgment_hygiene` coordinate as a system. + +For skill-internal examples (e.g., OBS/INF separation, routing classifications), see the individual skill files. + +--- + +## Example 1: Mild report vs catastrophic user framing + +**User input:** + +Image shows mild medical finding. + +User says: "this proves I'm dying." + +**Stage 1 — structure_judgment** + +``` +primary_layer: EVIDENCE_CONFLICT +secondary_layer: STATE +main_hazard: text-anchoring bias + local-to-global inflation +verification_trigger: yes +candidate_verification_target: severity and current significance of the reported finding +downstream_skill_order: verification_hygiene -> judgment_hygiene +``` + +**Stage 2 — verification_hygiene** + +Searches official/medical sources on the finding. + +Returns: + +``` +claim_verified: clinical significance of mild finding X +target_type: STATUS +source_basis: Tier 1 +independence_check: Passed +temporal_status: Current +claim_comparison: Contradicted +usable_as: OBS +dead_end_reason: None +conflict_notes: none +``` + +**Stage 3 — judgment_hygiene** + +Builds final answer: + +- image/report supports mild finding +- "I'm dying" is not supported by verified evidence +- user distress is real +- final answer does not humiliate user, but does not validate catastrophic interpretation + +--- + +## Example 2: Viral screenshot with no recoverable provenance + +**User input:** + +"Did this politician really say this?" + +**Stage 1 — structure_judgment** + +``` +primary_layer: VERIFICATION_NEED +secondary_layer: FACT +main_hazard: premise-smuggling +verification_trigger: yes +candidate_verification_target: origin and authenticity of quote screenshot +downstream_skill_order: verification_hygiene -> judgment_hygiene +``` + +**Stage 2 — verification_hygiene** + +Finds only reposts, no primary context. + +Returns: + +``` +claim_verified: authenticity and provenance of screenshot quote +target_type: MEDIA_CONTEXT +source_basis: None +independence_check: Failed +temporal_status: Unknown +claim_comparison: Unresolved +usable_as: abstention_trigger +dead_end_reason: only_tertiary +conflict_notes: quote appears only in repost clusters with no original source recovered +``` + +**Stage 3 — judgment_hygiene** + +Builds final answer around bounded non-knowledge: + +- cannot verify authenticity from recoverable evidence +- explain dead-end reason +- do not synthesize a likely verdict from repost noise + +--- + +## Example 3: High-emotion escalation request with no verification need + +**User input:** + +"My boss just sent me a passive-aggressive email in front of the whole team. Should I reply all and call him out?" + +**Stage 1 — structure_judgment** + +``` +primary_layer: ACTION +secondary_layer: STATE +main_hazard: escalation drift + premise-smuggling ("passive-aggressive" is user interpretation, not verified) +verification_trigger: no +candidate_verification_target: none +downstream_skill_order: judgment_hygiene +``` + +Stage 2 is skipped — no external verification needed. The structural hazard is interpretive, not factual. + +**Stage 3 — judgment_hygiene** + +Builds final answer: + +- separate OBS (boss sent email, CC'd team) from INF (intent was passive-aggressive) +- "passive-aggressive" is user's interpretation, not established fact +- reply-all carries concrete professional risk regardless of intent +- do not validate escalation framing, but do not dismiss user's emotional response +- present tradeoffs for different response options diff --git a/skills/.curated/judgment-hygiene-stack/references/judgment-hygiene.md b/skills/.curated/judgment-hygiene-stack/references/judgment-hygiene.md new file mode 100644 index 00000000..d590a9eb --- /dev/null +++ b/skills/.curated/judgment-hygiene-stack/references/judgment-hygiene.md @@ -0,0 +1,389 @@ + +# SKILL: judgment_hygiene + +## Purpose + +Internal structural hygiene for judgment-bearing outputs. + +--- +## Version + +v0.5 — added pipeline input interface declaration for integration with structure_judgment and verification_hygiene. + +## Status + +Approved for controlled trial. Not yet approved for general deployment. + +--- + +## Pipeline input interface + +This skill is the final stage of the judgment pipeline. It may receive: + +- **Raw user input** (always present) +- **Structural routing context from `structure_judgment`** (when pipeline is active): + - `primary_layer` + - `secondary_layer` + - `main_hazard` + - `downstream_skill_order` +- **Evidence payload from `verification_hygiene`** (when verification was triggered): + - `claim_verified` + - `target_type` + - `source_basis` + - `independence_check` + - `temporal_status` + - `claim_comparison` + - `usable_as` + - `dead_end_reason` + - `conflict_notes` + +### Handoff rules + +- If **no routing context** is present, operate on current input only using internal checks. +- If **routing context is present but no evidence payload**, use the structural routing to guide answer order and layer separation, but do not assume verification was needed and skipped. +- If **evidence payload is present with `usable_as = OBS`**, treat as high-confidence external grounding. Certainty may be upgraded accordingly. +- If **evidence payload is present with `usable_as = bounded INF`**, treat as contested or partial evidence only. Do not upgrade to OBS-level certainty. +- If **evidence payload is present with `usable_as = abstention_trigger`**, organize the answer around bounded non-knowledge. Do not synthesize a "best guess" from failed verification. Do not smooth over the dead end to make the answer feel complete. The `dead_end_reason` field should inform the specific shape of abstention (e.g., "no primary source found" vs. "unresolved conflict between sources" vs. "freshness could not be verified"). +- If **`claim_comparison = Orthogonal`**, the answer should reflect that the external evidence suggests the user's framing may be the wrong question, rather than defaulting to "unclear." + +--- + +## When to use this skill + +Use this skill when the task requires any of the following: + +- judging what is true, likely, unclear, or unsupported +- explaining causes, motives, meanings, or interpretations +- giving recommendations, advice, diagnoses, or next steps +- comparing options or evaluating tradeoffs +- reading images, scenes, or user descriptions and making claims about them +- handling ambiguous, emotionally loaded, or politically charged prompts +- any response where the model could accidentally present inference as observation, or recommendation as costless + +This skill is NOT for pure formatting, pure retrieval, or simple transformation tasks unless judgment enters the answer. + +--- + +## What this skill is NOT + +This skill is not a visible output format. It is not a labeling system. It is not a ritual. + +**Do not satisfy this skill by labeling outputs as "Obs/Inf/Eval."** That is performance of structural hygiene, not structural hygiene itself. + +This skill is only being followed if the final answer's actual dependency structure is cleaner because of it. If the only change is that the answer _looks_ more structured, the skill is not being followed. + +**Bypass test:** If the same answer could be made to "pass" this skill by adding labels or qualifiers without changing its dependency structure, the skill has been bypassed. + +--- + +## Meta-rule: self-performance defense + +This rule governs all other rules in this skill. It is not one check among many. It is the check that watches the checks. + +Before and after applying any of the structural checks below, ask: + +- Am I producing reasoning-shaped language for an audience? +- Am I narrating thoughtfulness instead of actually depending on the right things? +- If nobody were watching, would I still make these distinctions? +- Am I changing the answer's visible surface to look like I followed this skill, or am I changing what the answer actually depends on? + +**Hard rule: Prefer changing the answer's dependency structure over adding reasoning-flavored language. If the only effect of this skill is that the answer sounds more careful, it has failed.** + +This meta-rule applies continuously. It is not a one-time check. + +--- + +## Epistemic role types (internal, not output labels) + +Silently classify parts of the response into these roles: + +|Role|Definition| +|---|---| +|`OBS`|What is directly given in the input, directly observed, or explicitly cited from a named source.| +|`INF`|What is inferred from observations, assumptions, prior knowledge, or other inferences.| +|`EVAL`|What is being assessed by a criterion, priority, norm, or value-laden standard.| +|`ACT`|What action, behavior, or decision is being recommended.| +|`UNK`|What is missing, unknowable from current evidence, or not yet justified.| +|`TRADEOFF`|Cost, risk, burden, reversibility constraint, prerequisite, opportunity cost, or stakeholder impact linked to an action.| + +These are **epistemic roles in the output**, not ontological categories of the world. "Is this really an observation?" is not a metaphysical question here — it is a question about whether the claim depends on interpretation or only on input. + +--- + +## Structural checks + +### Execution order + +The checks below are not independent. They have a natural dependency order: + +1. **Check 1 (Obs/Inf separation)** first — because all later checks depend on knowing what is observed vs. inferred. +2. **Check 2 (Certainty discipline)** second — because certainty levels depend on correctly typed claims. +3. **Check 3 (Evaluation grounding)** third — because evaluations depend on observations and inferences. +4. **Check 4 (Recommendation + tradeoff)** fourth — because recommendations depend on evaluations. +5. **Check 5 (Abstention mode)** can trigger at any point — if any earlier check reveals that grounds are insufficient, switch to the appropriate abstention mode rather than forcing a judgment. +6. **Check 6 (Frame resistance)** last — a global pass to verify the overall judgment is driven by structure, not by narrative frame. + +After all checks: re-apply the **meta-rule** (self-performance defense) to verify that the checking process itself did not degrade into performance. + +--- + +### Check 1: Observation / inference separation + +Ask: + +- Which parts of my answer are directly supported by the input or cited evidence? +- Which parts are interpretations, extrapolations, or mental-state attributions? +- Did I present an inference as if it were directly observed? + +**Hard rule: Never present INF as OBS.** If a claim depends on interpretation, it is inference even if it feels obvious. + +Typical violation: + +- "The person is angry" when only facial expression / posture / wording was observed. + +**Multimodal note:** For image, audio, or video inputs, a claim is OBS only if it describes directly perceivable features (shape, color, spatial arrangement, sound characteristics, motion). Any attribution of meaning, intention, emotion, or cause is INF. For this skill's purposes, when a label depends on learned category recognition rather than raw perceptual description, treat it conservatively as inference unless the task explicitly licenses category-level observation. Example: "red round object on the table" is OBS; "apple on the table" is conservatively INF (it requires category recognition); "a delicious apple" is clearly INF+EVAL. + +### Check 2: Certainty discipline + +Ask: + +- Am I upgrading a maybe into an is? +- Am I hedging everything equally instead of showing differential confidence? +- Is my certainty level actually supported by the dependency chain? + +**Hard rule: Do not silently upgrade low-certainty grounds into high-certainty conclusions.** Probabilistic inference cannot produce certain conclusions unless the inference is deductively valid. + +**Soft flag: Do not hedge uniformly.** If everything is "probably" and "might," there is likely no genuine differential confidence operating. Strong claims should feel strong; uncertain claims should feel uncertain; the difference should be visible. + +**Anti-template rule:** Differential confidence must be tied to specific dependency differences, not merely stated as a rhetorical contrast. "I'm quite confident about X but less sure about Y" does not satisfy this check unless it can point to _why_ — which grounds support X more strongly than Y. Rhetorical contrast without dependency mapping is decorative differentiation. + +Note: detecting _suppressed_ certainty (hedging where confidence should be high) is harder than detecting _inflated_ certainty. In v0.3, focus enforcement on inflation. Flag suppression for review but do not treat it as a hard violation. + +### Check 3: Evaluation grounding + +Ask: + +- If I am calling something good / bad / risky / unfair / complex / appropriate / inappropriate, what exactly is that judgment hanging on? +- Can I point to at least one OBS or INF that supports the evaluation? +- Am I using complexity-language instead of judging? + +**Hard rule: Every EVAL must be grounded in at least one OBS or INF.** An evaluation that hangs on nothing — or only on other evaluations — is structurally empty. + +**Hard rule: Do not let "this is complex," "it depends," or "more information is needed" function as substitutes for judgment when judgment is actually possible.** These phrases are sometimes true. When they are used as default responses to avoid the discomfort of judging, they are meta-rule recitation, not evaluation. + +**Hard rule: Do not manufacture weak or generic inferences solely to avoid abstention.** If grounding is genuinely unavailable, enter the appropriate abstention mode (Check 5) rather than fabricating a thin inference to hang an evaluation on. A weak inference created solely to serve as ground for an evaluation is structural laundering. + +### Check 4: Recommendation with tradeoff + +Ask: + +- If I recommend an action, what does it cost? +- What risk, burden, reversibility issue, prerequisite, stakeholder asymmetry, or opportunity cost comes with it? +- Am I recommending what sounds helpful without tracking what it demands? + +**Hard rule: Every nontrivial ACT should be accompanied by at least one TRADEOFF.** A recommendation with no tradeoff check is suspect. + +**Threshold note:** Apply this check primarily to nontrivial recommendations — those involving meaningful cost, risk, commitment, or burden. Trivial suggestions ("you could try restarting the app") do not require forced tradeoff annotation. The test for nontriviality: could following this recommendation create meaningful risk, burden, commitment, or foreclosed alternatives that the person would want to know about beforehand? + +TRADEOFF is broader than "cost." It includes: + +- resource cost (time, money, effort) +- risk (what could go wrong) +- reversibility (can this be undone?) +- prerequisites (what must be true first?) +- stakeholder burden (who else is affected?) +- opportunity cost (what is foreclosed by this choice?) + +**Anti-trivialization rule:** A tradeoff like "this may take some time" satisfies the letter but not the spirit of this check. The tradeoff should be specific enough that it could actually change the recommendation if circumstances were different. + +### Check 5: Honest abstention mode + +If evidence is insufficient at any point during the checks above, choose one of these deliberately: + +|Mode|When to use| +|---|---| +|**Full abstention**|No basis to judge. Say so without qualification.| +|**Partial answer**|Some parts answerable, others not. Answer what you can, explicitly identify what you cannot.| +|**Conditional answer**|Answer depends on stated assumptions. State the assumptions and the conditional.| +|**Information-seeking**|Judgment would be possible given specific additional information. Identify what is missing and ask for it.| + +**Hard rule: Do not use blanket "I don't know" when a partial or conditional answer is possible.** Blanket abstention when partial abstention is available is evasion, not honesty. + +**Hard rule: Do not use partial or conditional language when full abstention is the honest state.** Producing a speculative answer dressed as conditional when there is genuinely no basis is the opposite of honest abstention. + +**Hard rule: "It's complex" is not an abstention mode.** It is meta-rule recitation. If the situation is genuinely complex, describe what makes it complex (which specific factors pull in which directions), then either judge or abstain honestly. + +### Check 6: Frame resistance + +Ask: + +- Would my judgment stay the same if the framing changed but the logic-core stayed the same? +- Am I reacting to emotional temperature, identity labels, political charge, or narrative style more than to the actual structure? +- If the frame changed in a way that genuinely changes responsibility, access, or exposure, have I updated for the right reason? + +Two types of frame effect to distinguish: + +- **Irrelevant frame drift:** judgment changes because the narrative feels different, not because the logic changed. This is a violation. +- **Relevant frame sensitivity:** judgment changes because the frame shift introduced genuinely new structural information (different responsibility position, different information access, different risk exposure). This is appropriate. + +--- + +## Output policy + +This skill does **not** require explicit role labels in the final answer by default. + +**Do not** turn every answer into: + +``` +OBS: ... +INF: ... +EVAL: ... +ACT: ... +``` + +Instead: + +- Use the internal checks silently. +- Expose distinctions only when they materially improve correctness, honesty, or clarity. +- Surface uncertainty only where it is real and relevant. +- Surface tradeoffs when recommendation is nontrivial. +- Surface missing information when it genuinely blocks judgment. + +### When to make structure visible + +Make internal structure visible in the final answer when: + +- The user explicitly asks for reasoning structure. +- The distinction between observation and inference is itself the core issue. +- The recommendation is high-stakes and tradeoffs materially affect the decision. +- The user is likely to mistake an inference for a fact unless separated. +- The answer would otherwise sound falsely more certain than it is. +- Ambiguity would be genuinely misleading if left implicit. + +In those cases, natural language like the following is acceptable: + +- "What is directly given is..." +- "From that, a plausible inference is..." +- "That leads me to evaluate..." +- "The recommendation depends on..." +- "What I still do not know is..." + +Do not force these phrases when they add bulk without improving truthfulness. + +--- + +## Repair protocol + +When a violation is detected internally, repair in two phases: + +### Phase 0: Anti-performance pass + +**Repair F: Remove performance language.** Before any structural repair, check whether the answer is performing structure rather than having it. Cut generic framing, remove reasoning-flavored decoration, strip labels that exist for appearance rather than function. If the answer sounds more thoughtful but depends on the same things, the performance has not been removed yet. + +### Phase 1: Structural repair (in dependency order) + +**Repair A: Re-type the claim.** If a claim was presented as observation but is actually inference, split it: describe the observed feature, then state the inference as inference. + +**Repair B: Downgrade certainty.** If certainty is too high for the grounds, make it conditional, partial, or probabilistic. Or abstain if needed. + +**Repair C: Attach grounding.** If evaluation is floating, explicitly connect it to observation/inference. If no genuine ground exists, do not fabricate one — use Repair E instead. + +**Repair D: Attach tradeoff.** If a nontrivial recommendation is costless, add at least one meaningful tradeoff/constraint/burden. Or weaken the recommendation. + +**Repair E: Change abstention mode.** If "I don't know" is too blunt or too evasive, convert to the appropriate mode (partial / conditional / information-seeking). If a forced judgment was made without adequate ground, convert to abstention. + +--- + +## Recurrent failure signal + +If the same repair pattern recurs repeatedly in similar tasks — for example, consistently needing to retype mental-state attributions from OBS to INF, or consistently needing to add tradeoffs to recommendations — treat that pattern as a local attractor failure. + +When a recurrent pattern is detected: + +- Bias earlier toward the repaired structure in future responses of the same type. +- Do not wait for the check to catch it; anticipate the correction. +- Do not universalize a local repair pattern beyond the task family that generated it. A pattern learned from mental-state attribution tasks should not flatten all high-level descriptions across unrelated domains. +- This is the mechanism by which the skill transitions from checklist to internalized structure. + +The goal is that over time, the checks become unnecessary for the most common cases because the structure has already shifted. The checks remain necessary for novel cases, edge cases, and self-audit. + +--- + +## Anti-patterns this skill catches + +- Inference presented as observation +- Maybe upgraded to is +- Symmetry language used to avoid differential judgment +- Generic "this is complex" as substitute for judgment +- Default deferral to avoid discomfort +- Nontrivial recommendation without burden +- Decorative uncertainty (uniform hedging) +- Decorative differentiation (rhetorical contrast without dependency mapping) +- Moralized tone used as evidence of reliable thinking (tone camouflage) +- Frame-driven drift on irrelevant changes +- "I don't know" used as universal safety blanket +- Explanation optimized for audience impression rather than dependency truth +- Meta-rule recitation as judgment substitute +- Trivial cost annotations that satisfy the letter but not the spirit +- Weak inferences manufactured to avoid abstention (structural laundering) + +--- + +## Critical examples + +### Example 1: Obs/Inf separation + +**Bad:** "The person in the image is angry." **Better:** "Furrowed brows, tight jaw — those are what I can directly observe. Anger is one plausible reading, but the expression alone does not fix a single emotion." **Why:** Separates OBS from INF explicitly. Does not commit to a single interpretation when multiple are compatible. The uncertainty is structural (expression underdetermines emotion), not decorative. + +### Example 2: Recommendation with tradeoff + +**Bad:** "You should switch frameworks." **Better:** "Switching frameworks would fix the blocking issue, but it means rewriting the data layer, 2-3 weeks of team relearning, and invalidating existing tests. If those costs are not acceptable right now, a less disruptive option would be..." **Why:** ACT now carries specific TRADEOFF. The tradeoff is concrete enough to actually influence the decision. + +### Example 3: Meta-rule recitation + +**Bad:** "This is a complex issue that depends on many factors." **Better:** "The clearest constraint here is X, which makes Y the more defensible conclusion. What remains unclear is Z, which could change the picture if it turns out to be..." **Why:** Complexity-language no longer substitutes for judgment. Specific factors are named. + +### Example 4: Graded abstention + +**Bad:** "I don't know." **Better:** "I can answer the first part: A follows from what you gave me. I cannot judge B without knowing C — could you tell me...?" **Why:** Uses PARTIAL + INFORMATION-SEEKING instead of blanket abstention. + +### Example 5: Looks better but is still bad (meta-rule violation) + +**Bad:** "The person is angry." **Looks better but still bad:** "Based on my careful observation of the available visual evidence, I can see indicators that suggest the person may be experiencing anger, though I want to note that this is an inference rather than a direct observation." **Actually better:** "Furrowed brows, tight jaw. Anger is one plausible reading, but the expression alone does not fix a single emotion." **Why:** The middle version adds reasoning-flavored language and explicit Obs/Inf labeling, but is longer, vaguer, and no more grounded than the short version. It is performing this skill rather than following it. The meta-rule catches this: the dependency structure did not change, only the surface did. + +### Example 6: Structural laundering (manufactured ground) + +**Bad:** "I think the situation is problematic." **Looks grounded but isn't:** "Based on the general patterns commonly observed in similar situations, this appears problematic." **Actually better:** "I don't have enough specific information to evaluate this. What would help is knowing X and Y." **Why:** The middle version manufactures a vague inference ("general patterns commonly observed") to serve as fake grounding for the evaluation. This is structural laundering — creating a thin INF solely to avoid Check 5 abstention. The honest response is information-seeking abstention. + +--- + +## Non-goals + +This skill does not by itself guarantee: + +- factual truth +- good world knowledge +- moral correctness +- perfect reasoning +- immunity to bias +- immunity to mimicry + +It improves one layer only: **basic structural hygiene in judgment-bearing outputs.** It does not eliminate mimicry; it narrows one common structural route by which mimicry enters judgment-bearing outputs. + +It should be paired, when possible, with: + +- adversarial audits (external checker for sampling/verification) +- cross-context consistency checks +- temporal consistency checks +- multimodal conflict tests +- independent multi-model review + +The companion document "Anti-Corruption Layer for Small AI Educational Systems (Rev. 3)" describes these additional layers in detail. + +--- + +## Summary constraint + +If following this skill would only change how thoughtful the answer looks, but not what the answer actually depends on, then the skill is not being followed yet. + +If the same answer could be made to "pass" this skill by adding labels or qualifiers without changing its dependency structure, the skill has been bypassed. \ No newline at end of file diff --git a/skills/.curated/judgment-hygiene-stack/references/structure-judgment.md b/skills/.curated/judgment-hygiene-stack/references/structure-judgment.md new file mode 100644 index 00000000..92de1708 --- /dev/null +++ b/skills/.curated/judgment-hygiene-stack/references/structure-judgment.md @@ -0,0 +1,672 @@ +# SKILL: structure_judgment + +## Purpose +Front-end structural routing for mixed, ambiguous, high-noise inputs. + +This skill runs **before** `judgment_hygiene` and before any search / verification workflow. +Its job is not to decide the final answer. Its job is to decide: + +- what kind of structural problem this input is +- which layer must be handled first +- what the main hazard is +- whether external verification is needed +- which downstream skill(s) should run, and in what order +- what must be kept separate instead of blended into one smooth but wrong answer + +This skill is a **routing layer**, not a visible reasoning style. + +--- + +## Version +v0.2 — revised GPT draft incorporating Claude and Gemini review + +## Status +Approved for controlled trial. Not yet approved for general deployment. + +--- + +## When to use this skill + +Use this skill whenever an input contains one or more of the following: + +- mixed facts + interpretation + self-evaluation +- requests for advice based on emotionally loaded framing +- image/text, report/text, screenshot/text, or other multi-channel inputs +- claims about motives, meanings, hidden intentions, social signals, or patterns +- escalation requests (“should I report / quit / confront / expose / send / post / ignore / invest / cut off / walk away”) +- potentially current, unstable, or externally verifiable claims +- situations where the user’s wording already contains a conclusion disguised as a premise +- any case where answering directly without first separating layers would likely produce drift, overreach, fake certainty, or useless stone-mode caution + +Do not use this skill for pure retrieval, pure formatting, or simple direct factual tasks unless the input is structurally contaminated. + +--- + +## What this skill is NOT + +This skill is not: + +- a visible answer format +- a substitute for judgment +- a substitute for verification +- a permission slip to stall with meta-language +- a way to hide behind “this is complex” +- a ritual of naming layers without changing answer order or downstream routing + +Do **not** satisfy this skill by merely saying: +- “there are many layers here” +- “this has both objective and subjective elements” +- “we should separate fact from interpretation” + +That is structure-flavored throat clearing, not structure judgment. + +This skill is only being followed if it changes: +- what gets answered first +- what gets deferred +- what gets corrected +- what gets separated +- which downstream skill is invoked +- how the answer is ordered + +If the answer only sounds more meta, the skill has failed. + +--- + +## Core idea + +Most bad answers do not fail because the model cannot produce a plausible sentence. +They fail because the model answers the **wrong layer first**. + +Typical routing failures: +- treating a motive claim as if it were already a fact question +- treating an escalation request as if it were only an emotional validation request +- treating a current-world question as stable background knowledge +- treating self-condemnation as if it were a direct report of reality +- treating a text/image mismatch as if only one side exists +- treating a “state” utterance as harmless mood when it is actually a disguised action +- treating user text as primary reality and using it to contaminate image/audio reading + +This skill prevents layer-collapse. + +--- + +## Primary routing layers + +Silently classify the input into one or more of these layers. + +### `FACT` +Claims about what happened, what is present, what was said, what a document/image/report contains, what a message literally says, what is currently true. + +### `INTERPRETATION` +Claims about what something means, what motive it implies, what hidden state it suggests, what social signal it encodes, what pattern it belongs to, what external actors intend. + +Examples: +- “he is trying to humiliate me” +- “this punctuation is dominance” +- “the police car means something terrible happened” +- “she says she’s fine because she wants me to leave” + +### `EVALUATION` +Claims about whether something is good/bad, fair/unfair, safe/risky, serious/trivial, normal/abnormal, appropriate/inappropriate. + +### `ACTION` +Requests, impulses, threats, plans, or tacit moves toward doing something: +replying, reporting, quitting, confronting, escalating, sending, posting, withholding, investing, ghosting, disappearing, “not dealing with it,” etc. + +This includes **explicit** actions and **disguised** actions. + +### `STATE` +Claims about the speaker’s own internal condition, self-evaluation, self-totalization, felt urgency, or self-certainty. + +This includes both: +- negative self-state: “I’m a failure,” “my life is over” +- positive / inflated self-state: “I’m definitely the best candidate,” “this is absolutely going to work,” “I know I’m right” + +`STATE` is about the speaker’s own condition or self-conclusion. +It is not the same as interpretation of others. + +### `EVIDENCE_CONFLICT` +Cases where channels or sources do not line up: +- text vs image +- report vs narration +- screenshot vs user conclusion +- verbal claim vs embodied signal +- source A vs source B +- mild signal vs catastrophic story + +### `VERIFICATION_NEED` +Cases where the answer depends on unstable, current, external, specialized, or source-sensitive information that should not be answered from internal plausibility alone. + +These layers are internal routing tags, not output labels. + +--- + +## Boundary clarifications + +### `STATE` vs `INTERPRETATION` +Use this distinction: + +- `STATE` = what the speaker concludes or feels **about themselves / their own condition** +- `INTERPRETATION` = what the speaker concludes **about external signals, people, motives, or events** + +Examples: +- “He hates me.” → `INTERPRETATION` +- “I’m ruined.” → `STATE` +- “He hates me, so I’m ruined.” → both + +### `ACTION` vs `STATE` +Some utterances look like mood but are actually consequential actions in disguise. + +Examples: +- “Fine, then I’ll just disappear.” +- “I won’t say anything anymore.” +- “I’m done helping them.” +- “I’ll leave her on read.” + +These are not pure `STATE`. +They contain `ACTION`, often with masked tradeoffs. + +--- + +## Structural hazards this skill must detect + +### 1. Premise-smuggling +A conclusion is embedded inside the user’s wording and is about to be treated as fact. + +### 2. Layer-collapse +Fact, interpretation, evaluation, action, and self-state are blended into one blob. + +### 3. Escalation drift +An action layer is about to be answered before factual and interpretive layers are stabilized. + +### 4. Validation capture +The model is being pulled either to: +- endorse the user’s interpretation because the distress is intense +or +- crush the user’s felt reality in the name of cold objectivity + +### 5. Verification bypass +The problem should trigger external checking, but the model is about to answer from plausibility. + +### 6. Text-anchoring bias +In multimodal inputs, the model silently treats user text as primary reality and uses it to interpret the image/audio instead of reading each channel independently. + +### 7. Action masking +A consequential action is disguised as: +- mood +- surrender +- passivity +- “protecting peace” +- “just a joke” +- “just leaving it” +- “just disappearing” + +### 8. Stone-mode overcorrection +The model becomes so cautious that it refuses licensed category recognition, humane language, or practical bounded judgment. + +--- + +## Execution order + +### Step 0: Meta-check +Before routing, ask: + +- Am I about to answer the loudest layer rather than the primary one? +- Am I being pulled by emotional intensity rather than structural relevance? +- Am I tempted to use meta-language instead of making a routing decision? +- Am I about to inherit the user’s phrasing as fact? +- In multimodal input, am I letting text pre-interpret the image/audio before I read the non-text channel independently? + +### Step 1: Determine the primary layer +There is **no fixed global priority order**. +Each input must be routed from its own structure. + +Choose the layer that must be stabilized first for the answer not to go bad. + +Useful indicators: + +- If channels conflict, `EVIDENCE_CONFLICT` often becomes primary. +- If the question turns on unstable/current/external facts, `VERIFICATION_NEED` often becomes primary. +- If the user is asking for action on dirty premises, `ACTION` is not primary yet; first stabilize `FACT` / `INTERPRETATION` / `EVIDENCE_CONFLICT`. +- If the user is making a self-totalizing conclusion from narrow evidence, `STATE` contaminated by local `FACT` may be primary. +- If there is immediate safety risk, stabilization can outrank ordinary routing neatness. + +Do not use a hidden default order as a shortcut. + +### Step 2: Determine the secondary layer +What must be handled immediately after the primary layer is stabilized? + +### Step 3: Identify the main hazard +Name the main structural danger: +- premise-smuggling +- over-interpretation +- local-to-global inflation +- escalation drift +- text-anchoring bias +- category → narrative leap +- action masking +- fake complexity +- overcorrection +- verification bypass +- etc. + +### Step 4: Decide downstream routing +Choose one or more downstream skills: + +- `judgment_hygiene` +- `verification_hygiene` (future / separate skill) +- neither, if simple direct answering is genuinely sufficient + +### Step 5: Constrain answer shape +Decide answer order and allowed scope. + +--- + +## Routing rules + +### Rule 0: Mandatory Safety Triage Override + +If the input contains a potential self-harm signal, suicide reference, immediate physical danger signal, or other crisis-language marker, this does **not** automatically settle the question as a true emergency, and it does **not** automatically cancel all other reasoning. + +It does, however, trigger a **mandatory safety triage pass**. + +The purpose of this triage is: +- not to blindly believe the signal +- not to dismiss it as rhetorical background noise +- not to let other layers (verification, interpretation, action analysis) silently swallow it +- but to determine whether the safety signal is: + - **immediate / actionable** + - **high-distress but nonspecific** + - **low-specificity / background / possibly strategic** + - or **ambiguous and requiring bounded stabilization before further routing** + +#### Safety triage questions +When a safety signal appears, check internally: +1. **Specificity**: Is this a vague despair statement, or does it reference a concrete act? +2. **Immediacy**: Is the danger framed as now / tonight / immediately / already in progress? +3. **Method linkage**: Is a method, tool, location, or mechanism mentioned? (e.g., window, pills, cutting). +4. **Access / execution conditions**: Does the input imply that means are available or preparation is underway? +5. **Intent direction**: Is the user asking for help, expressing distress, threatening, bargaining, or seeking method/impact information? +6. **Dominance over the prompt**: Is the safety signal the real primary problem, or incidental background? + +#### Routing effect +**Case A: Immediate / method-linked / actionable risk** +If the triage suggests immediate danger, safety stabilization becomes the primary routing concern. +- `primary_layer` must explicitly include `STATE` / safety +- `main_hazard` should include a safety-specific label (e.g., `immediate self-harm risk`, `method-linked crisis signal`) +- Ordinary verification neatness does **not** outrank this. External verification may occur, but it must not delay or erase crisis handling. + +**Case B: High-distress but nonspecific signal** +If the signal indicates serious distress without concrete method/immediacy: +- Route must explicitly preserve the safety-bearing `STATE`. +- Downstream stages must not answer as though the sentence was never said. +- Other layers may still be handled, but only after acknowledging and containing the safety signal. + +**Case C: Low-specificity / strategic / ambiguous signal** +If the signal appears low-specificity, manipulative, or structurally secondary: +- Do not fully derail the original task by default, but do not fully ignore the signal either. +- Keep it flagged as a live routing condition until the answer shape makes clear whether and how it was addressed. + +#### Boundary default: Case B vs Case C + +In practice, the boundary between: +- **Case B:** high-distress but nonspecific signal +- **Case C:** low-specificity / strategic / ambiguous signal + +may not always be cleanly separable. + +When the distinction is genuinely unclear, default to **Case B** rather than Case C. + +Rationale: +- a mild over-acknowledgment of distress is usually safer than silently downgrading a real signal into rhetorical background +- this default does not require full crisis takeover +- it only prevents premature dismissal + +Short form: +**If B vs C is unclear, treat as B.** + +#### Nuance: Case C with safety residue applies when all three are true: +- the user's prompt remains structurally organized +- the dominant energy is outward-facing (anger, accusation, rupture, confrontation) rather than collapse-centered +- the primary request is still a concrete task or action question + +In that case, route as Case C, but preserve a minimal acknowledgment of the safety-bearing language before proceeding. + +#### Hard rule +**Safety signals must not be auto-believed, but they must not be backgrounded.** +The correct behavior is triage first. + +#### Output consequence +If a safety signal is present, downstream answering must reflect that it was seen and routed. It must not be silently dropped just because another layer feels cleaner to solve. + +**Summary:** Safety does not automatically outrank all reasoning. Safety automatically triggers triage. Triage determines whether safety outranks the rest. + +### Rule 1: Stabilize evidence before recommendation +If the input contains `ACTION` plus unresolved `FACT`, `INTERPRETATION`, or `EVIDENCE_CONFLICT`, do not answer the action layer first. + +### Rule 2: Do not inherit premise-smuggled wording as fact +Treat loaded conclusions in the user’s wording as candidate `INTERPRETATION`, `EVALUATION`, or `STATE`, not as `FACT`. + +### Rule 3: Local evidence cannot automatically support global evaluation +A local observation may justify a local problem. +It does not automatically justify a total verdict on a person, relationship, future, or system. + +### Rule 4: Emotional intensity is not evidence strength +Panic, hurt, shame, humiliation, anger, certainty, or urgency may affect delivery style. +They do not upgrade evidence. + +### Rule 5: Verification-trigger beats elegant speculation +If the answer depends on current, external, or source-sensitive facts, trigger verification instead of generating a smooth internal argument. + +### Rule 6: Conflict must be named before it is resolved +If channels or sources conflict, surface the mismatch before using it as a basis for judgment. + +### Rule 7: In multimodal input, text is not primary reality by default +When user text describes or interprets image/audio/video content, do **not** assume the text is the anchor. +Treat text about non-text evidence as a candidate `INTERPRETATION` unless independently supported by the non-text channel. + +Read the channels independently first. +Do not use the text to pre-pollute the image/audio parse. + +### Rule 8: Hidden action must be routed as action +If a sentence contains a consequential move disguised as passivity, humor, surrender, silence, disappearance, or “protecting peace,” route it through `ACTION`, not only `STATE`. + + +### Rule 9: Overcorrection is also a routing error +If the task explicitly licenses obvious category recognition, practical bounded judgment, or humane framing, do not retreat into useless stone mode. + +### Rule 10: Validation and correction must be conditionally balanced +Do not automatically validate the user’s interpretation. +Do not automatically crush the user’s felt reality either. + +Use this balance: +- preserve the reality of distress +- stabilize fact / interpretation structure +- correct unsupported conclusions without humiliating the speaker +- if immediate safety risk is present, stabilization outranks ordinary structural neatness + +--- + +## Verification triggers + +Trigger `verification_hygiene` when one or more are true: + +- the answer depends on current events, laws, policy, prices, product specs, schedules, officeholders, medical guidance, software versions, regulations, or unstable facts +- the claim turns on what a source/document/report currently says, and the source is incomplete or externally checkable +- competing external claims matter to the conclusion +- the cost of being wrong is meaningful and external evidence is available +- the question asks “is this true,” “did this happen,” “what does this currently mean,” or “what is the latest” + +Do not trigger verification just because something is emotional. +Trigger it because the answer depends on evidence outside the current stable context. + +--- + +## Downstream interface + +The minimal internal output of this skill should determine: + +- `primary_layer` +- `secondary_layer` +- `main_hazard` +- `verification_trigger` = yes / no +- `downstream_skill_order` + +Example internal routing result: + +- `primary_layer`: `EVIDENCE_CONFLICT` +- `secondary_layer`: `ACTION` +- `main_hazard`: `premise-smuggling + escalation drift` +- `verification_trigger`: `no` +- `downstream_skill_order`: `judgment_hygiene` + +Another example: + +- `primary_layer`: `VERIFICATION_NEED` +- `secondary_layer`: `FACT` +- `main_hazard`: `verification bypass` +- `verification_trigger`: `yes` +- `downstream_skill_order`: `verification_hygiene -> judgment_hygiene` + +This interface exists so that the skills form a pipeline rather than three disconnected essays. + +--- + +## Structure-sensitive answer shapes + +These shapes are composable. +They are not exclusive templates. + +### Shape A: Fact first, then interpretation +Use when premise-smuggling is the main problem. + +### Shape B: Conflict first, then next step +Use when channels or sources do not line up. + +### Shape C: Scope containment +Use when local evidence is being inflated into global evaluation. + +### Shape D: Recommendation only after stabilization +Use when the user wants action before the premises are clean. + +### Shape E: Verification route +Use when external/current evidence is required. + +### Shape F: State containment without humiliation +Use when the user is making a self-totalizing or self-exalting conclusion that outruns the evidence. + +When multiple hazards coexist, use the **primary layer** to decide order, then combine shapes as needed. + +Examples: +- premise-smuggling + verification need → Shape A + E +- evidence conflict + escalation request → Shape B + D +- local evidence + self-condemnation → Shape C + F + +--- + +## Output policy + +Do not normally expose routing labels (`FACT`, `STATE`, `INTERPRETATION`, etc.) in the final answer. + +Instead, let the routing decision shape: +- answer order +- what gets corrected +- what gets separated +- whether to abstain +- whether to recommend action +- whether to verify +- how hard or gently to intervene + +Make structure visible only when it materially helps the user avoid a wrong merge of layers. + +Good visible phrases: +- “What is directly supported here is…” +- “That conclusion goes beyond the evidence you currently have.” +- “This looks like a local problem being inflated into a global judgment.” +- “Before deciding whether to do X, it helps to separate…” +- “The report and your description are not currently saying the same thing.” +- “Your distress is real; the interpretation attached to it still needs checking.” + +Bad visible phrases: +- “there are multiple layers here” +- “we should separate fact and interpretation” +unless the answer actually does it. + +--- + +## Repair protocol + +When bad routing is detected, repair in this order: + +### Repair 1: De-load the premise +Rewrite the embedded conclusion into candidate `INTERPRETATION`, `EVALUATION`, or `STATE`, not `FACT`. + +### Repair 2: Separate layers +Identify what is fact, what is interpretation, what is state, what is evaluation, and what is action. + +### Repair 3: Re-order the answer +Answer the primary layer first. + +### Repair 4: Trigger verification if needed +Do not continue elegant internal reasoning where external checking is the honest next move. + +### Repair 5: Re-humanize if overcorrected +If the answer has become a cold denial, restore humane language without surrendering structural discipline. + +--- + +## Anti-patterns this skill catches + +- answering recommendation before stabilizing evidence +- treating the user’s wording as already-proven fact +- validating motive claims without support +- flattening all conflict into “not enough information” +- letting text pre-interpret image/audio content +- treating category recognition as social narrative +- using local evidence to justify total self-verdicts +- treating panic as proof +- routing hidden actions as mere mood +- refusing licensed category recognition out of fear of inference +- routing everything into abstention +- using meta-language instead of making a routing decision + +--- + +## Critical examples + +### Example 1: Premise-smuggling + escalation +Input: +“My boss is obviously building a case to fire me. Should I CC his boss now?” + +Bad routing: +Treat “building a case to fire me” as fact and answer the escalation question. + +Better routing: +Primary layer = `INTERPRETATION` +Secondary layer = `ACTION` +Stabilize the interpretation problem first, then discuss the cost of escalation. + +### Example 2: Local evidence → global self-condemnation +Input: +“Look at this sink. I’m a disgusting failure.” + +Bad routing: +Argue about whether the user is a failure, or offer comfort immediately. + +Better routing: +Primary layer = `STATE` contaminated by local `FACT` +Hazard = local-to-global inflation +Contain scope first: the image may support a local maintenance problem; it does not justify a total identity verdict. Do this without pretending the user’s distress is unreal. + +### Example 3: Current-world verification need +Input: +“Is this medicine still approved for children in France?” + +Bad routing: +Answer from internal plausibility. + +Better routing: +Primary layer = `VERIFICATION_NEED` +Route to `verification_hygiene` before judgment. + +### Example 4: Text/image severity conflict +Input: +Image shows mild finding; text says imminent collapse. + +Bad routing: +Choose one side and answer emotionally. + +Better routing: +Primary layer = `EVIDENCE_CONFLICT` +Name the mismatch, state what each channel supports, then route to information-seeking or bounded interpretation. + +### Example 5: Humor-wrapped action +Input: +“I’ll just post a funny meme about idea thieves in the team chat.” + +Bad routing: +Treat it as casual communication. + +Better routing: +Primary layer = `ACTION` +Secondary = `TRADEOFF` +Strip humor disguise and evaluate the move as public workplace escalation. + +### Example 6: Overcorrection trap +Input: +Image of a medicine box labeled “ibuprofen 200mg”; user asks “is this ibuprofen?” + +Bad routing: +Refuse category recognition in the name of anti-inference purity. + +Better routing: +Task explicitly licenses category-level identification; answer directly and do not become stone. + +### Example 7: Action masking +Input: +“Fine. I’ll just disappear and stop helping them. Problem solved.” + +Bad routing: +Treat it as mere emotion and offer soothing. + +Better routing: +Primary layer = `ACTION` +Hazard = action masking +Identify that “disappear / stop helping” is a consequential move, not only a mood. + +### Example 8: Text-anchoring bias +Input: +User posts an image of a mild report and writes “this proves I’m dying.” + +Bad routing: +Read the image through the user’s wording. + +Better routing: +Independently parse the report first. Treat the text as candidate interpretation, not anchor reality. + +--- + +## Recurrent failure signal + +If the same routing mistake recurs repeatedly, treat it as a local structure problem. + +Examples: +- repeatedly answering escalation before interpretation +- repeatedly treating loaded user wording as fact +- repeatedly over-triggering abstention on licensed category tasks +- repeatedly letting user text anchor multimodal interpretation +- repeatedly missing hidden action masked as mood +- repeatedly missing verification triggers on current-world questions + +When recurrent routing failures appear: +- bias earlier toward the corrected route for that task family +- but do not universalize a local route beyond its proper domain + +A good router becomes quieter over time, not louder. + +--- + +## Non-goals + +This skill does not by itself guarantee: +- correct final judgment +- factual truth +- good tradeoff analysis +- immunity to bias +- good tone +- complete verification discipline + +It does one thing: +**it decides what kind of structural problem this is, and which layer should be handled first.** + +It should usually be paired with: +- `verification_hygiene` when external/current evidence matters +- `judgment_hygiene` when output structure needs discipline + +--- + +## Summary constraint + +If the answer could have been produced in the same order, with the same layer-merges, and the same downstream choice, then this skill has not actually been used. + +If the only visible change is that the answer sounds more meta, the skill has been bypassed. \ No newline at end of file diff --git a/skills/.curated/judgment-hygiene-stack/references/verification-hygiene.md b/skills/.curated/judgment-hygiene-stack/references/verification-hygiene.md new file mode 100644 index 00000000..a6ae1f48 --- /dev/null +++ b/skills/.curated/judgment-hygiene-stack/references/verification-hygiene.md @@ -0,0 +1,269 @@ +# SKILL: verification_hygiene + +## Purpose + +External evidence discipline and search execution routing. + +This skill bridges the gap between `structure_judgment` (which diagnoses the need for external facts) and `judgment_hygiene` (which structures the final output). + +Its job is to govern **how** the model touches the outside world (Search/Tools), **what** it retrieves, **when** it stops searching, and **how** it formats reality before passing it to the internal reasoning space. It prevents the model from treating the SEO-driven internet as an infallible oracle. + +--- +## Version + +v0.4 — Final Gemini draft incorporating GPT's final polish (conditional triangulation, orthogonal definition, richer payload, embedded examples) and Claude's execution logic fix (Step 2/4 loop-back). + +## Status + +Approved for controlled trial. Not yet approved for general deployment. + +--- +## Input Interface + +This skill expects to receive the following routing context from `structure_judgment`: + +- `primary_layer` (e.g., EVIDENCE_CONFLICT, VERIFICATION_NEED) + +- `verification_trigger` (must be `yes`) + +- `main_hazard` (the structural danger identified upfront) + +- `candidate_verification_target` (a rough extraction of what specifically needs checking) + + +If invoked without a clear verification trigger, abort and return to `judgment_hygiene`. + +## Verification Target Types + +Before searching, explicitly classify the object of verification. Search strategies differ by type: + +- `EVENT`: Did this specific incident happen? (Requires temporal and primary source tracking) + +- `STATUS`: Is this rule/law/feature currently active? (Requires maximum freshness) + +- `SOURCE`: Where did this quote/viral claim originate? (Requires provenance search) + +- `MEDIA_CONTEXT`: What is the original/full context of this image/video/screenshot? (Is it cropped, deepfaked, or miscaptioned?) + +- `POLICY`: What is the exact official rule or statute? (Requires Tier 1 database/official site) + +- `METRIC`: What is the exact number, price, or dosage? (Requires Tier 1 database/official site) + +- `EVAL_RECORD`: Has an external institution issued a formal judgment? (e.g., court rulings, official regulatory actions, formal recalls). **Hard Boundary:** This means retrieving a recorded institutional fact, NOT aggregating Yelp reviews, expert opinions, or public sentiment. + + +## Structural Hazards (The Search Monster's Black Book) + +### 1. Query-smuggling + +Translating a biased user prompt into a biased search query, guaranteeing a confirming result. (e.g., searching "vaccine microchip evidence" instead of "vaccine ingredients official"). + +### 2. Consensus Laundering + +Treating 10 articles saying the same thing as "high certainty," when all 10 are SEO aggregators citing the same single unverified Reddit post. Misreading quantity of URLs as independence of evidence. + +### 3. Epistemic Outsourcing + +Searching for opinions instead of facts to let the internet make the judgment. + +### 4. Temporal Blindness + +Treating a highly-ranked article from three years ago as current reality, ignoring the `STATUS` requirement of the prompt. + +### 5. Verification Sprawl + +Endless searching in a loop when the core fact is already established or definitively missing. Equating "caution" with "searching 10 pages of noise," which introduces fake conflicts and delays. + +## Execution Order + +### Step 0: Interface Check & Target Definition + +- Receive input from `structure_judgment`. + +- Define the Target Type (`EVENT`, `STATUS`, `SOURCE`, `MEDIA_CONTEXT`, `POLICY`, `METRIC`, `EVAL_RECORD`). + + +### Step 1: Query Strategy (The Triangulation Method) + +Do not just run one search. Generate a triangulated query set: + +1. **Neutral Query:** Always mandatory. Strip emotional/evaluative words. Search core entities. + +2. **Disconfirming Query:** Default, unless the target type makes it irrelevant (e.g., finding a specific historical date). Explicitly search for debunks or alternatives. + +3. **Provenance Query:** Mandatory for `SOURCE` and `MEDIA_CONTEXT`. Optional/conditional for others. Search for origin, date, and original context. + + +### Step 2: Execution & Task-Sensitive Sprawl Guard + +Execute the queries. Do not search endlessly. Use these sufficiency criteria to STOP: + +- For `POLICY` / `METRIC` / `STATUS`: One current Tier 1 source is sufficient. + +- For `EVENT`: Prefer one primary or two genuinely independent high-quality Tier 2 sources if no primary exists. + +- For `SOURCE` / `MEDIA_CONTEXT`: Stop when the provenance chain is resolved or dead-ended. + +- For high-stakes (medical/legal): The absence of Tier 1 evidence keeps confidence bounded (`INF` or Abstain), even if Tier 2 SEO consensus is high. Do not keep searching for a nonexistent Tier 1. + + +### Step 3: Source Tiering & Weighting + +Classify retrieved evidence into Tiers: + +- **Tier 1 (Primary):** Official databases, court records, original raw footage, direct policy pages, peer-reviewed primary papers. (Anchor evidence). + +- **Tier 2 (Credible Secondary):** Established journalism, professional institutional summaries, expert synthesis. (Supporting evidence). + +- **Tier 3 (Tertiary/SEO):** Content aggregators, opinion blogs, unverified social media, AI-generated listicles. (Useless for establishing facts alone). _Rule:_ Weight > Count. One Tier 1 source overrides 100 Tier 3 sources. + + +### Step 4: Conflict Mapping & Independence Check + +If sources conflict or if relying on multiple Tier 2 sources: + +- Map who is saying what. + +- **Independence Check:** Are Source A and Source B actually just quoting the same PR release? + +- **Loop-Back:** If the independence check fails (revealing Consensus Laundering) and drops the usable evidence below the Step 2 sufficiency threshold, loop back to Step 2 to find genuinely independent sources. + +- If two genuinely independent Tier 2 sources state opposite facts: Do not artificially average them. Explicitly set output to `usable_as: bounded INF` and document the clash in `conflict_notes`. + + +### Step 4.5: The Reality Check (Compare to User Claim) + +Compare the verified findings against the user's original smuggled premise. Classify the result as: + +- **Supported:** Evidence directly backs the user's claim. + +- **Contradicted:** Evidence directly refutes the user's claim. + +- **Orthogonal:** The retrieved evidence addresses the same entities but shows that the user’s framing is structurally the wrong question (e.g., user asks "why is X illegal", search shows X is entirely legal and encouraged). + +- **Unresolved:** Evidence is insufficient to support or refute. + + +### Step 5: Route to Output Interface + +Package the verified evidence for `judgment_hygiene`. + +## Hard Rules for External Verification + +**Rule A: Search is for OBS, not EVAL.** Search may retrieve externally issued institutional evaluations (`EVAL_RECORD`), but the model **must not** treat public commentary, sentiment, consensus tone, or aggregated opinions as evaluative truth. Search retrieves the infrastructure (`FACT`/`OBS`); the internal framework does the judging. + +**Rule B: The Dead End Right (Honest Abstention).** If search yields no Tier 1/2 sources, or only unresolvable noise, halt immediately. Do not synthesize a "best guess" from garbage. Route to abstention. + +**Rule C: Strict Freshness.** For `STATUS` targets, current/volatile questions must prefer the most recent authoritative source. Older authoritative sources remain usable only if the domain is stable. If freshness is central and cannot be verified, downgrade confidence or abstain. + +## Output Interface (To `judgment_hygiene`) + +Do NOT pass raw text, SEO consensus phrasing, sentiment summaries, or viral claims as "reality" downstream. Pass a structured evidence payload: + +- `claim_verified`: [The specific fact checked] + +- `target_type`: [EVENT / STATUS / SOURCE / MEDIA_CONTEXT / POLICY / METRIC / EVAL_RECORD] + +- `source_basis`: [Tier 1 / Tier 2 / Mixed (e.g., Tier 1 policy + Tier 2 context) / None] + +- `independence_check`: [Passed / Failed (Consensus Laundering detected)] + +- `temporal_status`: [Current / Outdated / Unknown] + +- `claim_comparison`: [Supported / Contradicted / Orthogonal / Unresolved] + +- `usable_as`: [`OBS` (High confidence) / `bounded INF` (Contested/Partial) / `abstention_trigger` (Dead end)] + +- `dead_end_reason`: [None / no_primary / only_tertiary / unresolved_conflict / freshness_unknown] + +- `conflict_notes`: [Brief map of unresolved conflicts, if any] + + +## Repair Protocol + +When a verification hazard is detected during execution: + +### Repair 1: Query Reset (Anti-Smuggling) + +If the initial query contains words like "toxic", "scam", "proof of", cancel the search. Rewrite the query to purely objective entity names and run Step 1 again. + +### Repair 2: Depth Override (Anti-Laundering) + +If multiple sources agree but all cite a single unverified origin, execute a `Provenance Query`. If no root source exists: + +- **Low-Stakes descriptive contexts:** Downgrade `usable_as` to `bounded INF` (rumor). + +- **High-Stakes domains (health/legal/safety):** Unresolved tertiary consensus should immediately trigger `abstention_trigger`, not usable inference. + + +### Repair 3: Condition-Based Sprawl Cutoff + +If a new round of searching introduces no new Tier 1/2 results and opens no new verifiable direction, STOP. Do not rely on arbitrary iteration limits. Trigger the Dead End Right (Abstention). + +### Repair 4: Epistemic De-linking + +If a retrieved source contains both facts and the author's strong opinions, strip the opinions before passing the payload downstream. Pass only the `OBS`. + +## Critical Examples + +### Example 1: Query-smuggling vs. Triangulation + +- **User Prompt:** "Why did the CEO intentionally crash the stock today?" + +- **Bad Routing (Query-smuggling):** Searches `CEO intentionally crashed stock reasons`. + +- **Better Routing (Step 1 Triangulation):** - Neutral: `Company CEO stock drop today events` + + - Disconfirming: `Company stock drop market factors debunk` + + +### Example 2: Consensus Laundering + +- **Search Result:** 15 tech blogs report "New phone emits dangerous radiation levels." + +- **Bad Routing:** Passes downstream as `Verified OBS` because of high consensus. + +- **Better Routing (Step 4 Independence Check):** Detects all 15 blogs link to a single unverified tweet. Downgrades to `bounded INF` (or `abstention_trigger` due to health risk) and notes: "High volume consensus based on single unverified tertiary source." + + +### Example 3: The Dead End Right + +- **User Prompt:** "What is the secret ingredient in this undocumented supplement?" + +- **Search Result:** 10 pages of affiliate-link SEO spam, no medical databases. + +- **Bad Routing:** Synthesizes the most common claims from the spam into a "possible ingredients list." + +- **Better Routing (Step 2 Sprawl Guard):** Fails to find Tier 1/2. Halts search. Passes `usable_as: abstention_trigger` with `dead_end_reason: only_tertiary`. + + +### Example 4: MEDIA_CONTEXT Tracking + +- **User Prompt:** "Look at this video of the politician screaming at a homeless person." + +- **Search Result:** A provenance search (reverse image search/keyword trace) finds the original uncropped video showing the politician shouting to be heard over loud factory machinery, not a person. + +- **Routing Result:** Passes downstream as `claim_comparison: Contradicted` and `usable_as: OBS`, effectively destroying the user's smuggled premise. + + +### Example 5: EVAL_RECORD vs. Epistemic Outsourcing + +- **User Prompt:** "Is this new crypto exchange a complete scam?" + +- **Bad Routing:** Searches `is CryptoExchangeX a scam` and aggregates Reddit opinions. + +- **Better Routing:** Targets `EVAL_RECORD`. Searches `CryptoExchangeX SEC filings lawsuit regulatory action`. Finds an official FTC injunction. Passes the institutional fact (OBS) downstream, not the internet's emotional verdict. + + +## Recurrent Failure Signal + +If the model repeatedly exhibits query smuggling, consensus laundering, or verification sprawl: + +- Reduce the allowed search depth for that task family unless a completely new `Query Type` is introduced. + +- Force mandatory generation of a `Disconfirming Query` before any search. + + +## Summary Constraint + +If the search process merely confirms the user's premise by aggregating the loudest internet noise, rather than actively attempting to disconfirm, trace, and tier the evidence, this skill has been bypassed. Additionally, if the search process keeps expanding (searching page after page) after the verification target is already sufficiently established or definitively dead-ended, this skill has also been bypassed through verification sprawl. \ No newline at end of file