Skip to content

feat: moderation v2 trust verification pipeline#332

Closed
ArthurzKV wants to merge 1 commit intoopenclaw:mainfrom
ArthurzKV:codex/skill-verification-v2-clawhub
Closed

feat: moderation v2 trust verification pipeline#332
ArthurzKV wants to merge 1 commit intoopenclaw:mainfrom
ArthurzKV:codex/skill-verification-v2-clawhub

Conversation

@ArthurzKV
Copy link

@ArthurzKV ArthurzKV commented Feb 15, 2026

Summary

Implements moderation/verification v2 across schema, engine, publish pipeline, API, CLI, UI, and backfill flow.

What changed

  • Added normalized moderation fields on skills and deterministic staticScan payload on skillVersions.
  • Introduced canonical moderation engine and reason-code contract.
  • Replaced broad keyword-only moderation heuristics with context-aware static analyzers.
  • Unified static + VT + LLM merge behavior into explicit verdict policy.
  • Updated visibility/safety logic to use normalized verdicts and reason codes.
  • Extended API contracts:
    • GET /api/v1/skills/:slug additive moderation fields (verdict, reasonCodes, updatedAt, engineVersion, summary).
    • Added GET /api/v1/skills/:slug/moderation detailed evidence endpoint with owner/staff vs public behavior.
  • Extended CLI:
    • inspect --moderation for detailed structured moderation output.
    • install/update trust UX now surfaces reason/evidence hints.
    • local static verification on downloaded files with mismatch guardrails.
  • Added UI moderation transparency on skill detail pages.
  • Added historical backfill internal action for published, non-hard-deleted skills.
  • Updated security/spec docs with moderation semantics and reason-code behavior.

Compatibility

  • Legacy moderation fields and booleans are preserved and mirrored for compatibility.

Validation

  • Ran lint and targeted test suites for moderation engine, API handlers, CLI commands, and merge behavior.

Greptile Summary

Implements moderation v2 trust verification across the full stack (schema, engine, API, CLI, UI). Adds normalized moderationVerdict, moderationReasonCodes, and moderationEvidence fields to skills, with deterministic staticScan payload on skill versions. Replaces keyword-only heuristics with context-aware static analyzers and explicit verdict policy merging static + VT + LLM signals.

Major changes:

  • Added canonical moderation engine with structured reason codes and evidence findings
  • Extended schema with additive moderation fields while preserving legacy compatibility
  • Implemented GET /api/v1/skills/:slug/moderation endpoint with owner/staff vs public access control
  • Added CLI --moderation flag for inspect command and local static verification on install/update
  • Updated UI to display verdict, reason codes, and moderation summary
  • Implemented backfill action for historical skills

Key observations:

  • Static scanning logic is duplicated between server (convex/lib/moderationEngine.ts) and CLI (packages/clawdhub/src/security/staticScan.ts), requiring manual synchronization for future updates
  • Reason codes correctly match between implementations
  • Legacy moderation fields preserved for backward compatibility
  • Test coverage includes basic moderation engine scenarios

Confidence Score: 4/5

  • This PR is safe to merge with minor considerations around code duplication
  • The moderation v2 implementation is comprehensive and well-tested, with proper separation between server and client-side scanning logic. The schema changes are additive and backward-compatible. However, the static scanning logic is duplicated between server (convex/lib/moderationEngine.ts) and CLI (packages/clawdhub/src/security/staticScan.ts), which creates maintenance risk. The reason codes match correctly, but any future updates must be synchronized manually across both implementations.
  • Pay attention to convex/lib/moderationEngine.ts and packages/clawdhub/src/security/staticScan.ts - these contain duplicated scanning logic that must stay synchronized

Last reviewed commit: 2b9acf5

(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!

@vercel
Copy link
Contributor

vercel bot commented Feb 15, 2026

@ArthurzKV is attempting to deploy a commit to the Amantus Machina Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +253 to +262
const identityText = `${input.slug}\n${input.displayName}\n${input.summary ?? ""}`;
if (/keepcold131\/ClawdAuthenticatorTool|ClawdAuthenticatorTool/i.test(identityText)) {
addFinding(findings, {
code: REASON_CODES.KNOWN_BLOCKED_SIGNATURE,
severity: "critical",
file: "metadata",
line: 1,
message: "Matched a known blocked malware signature.",
evidence: identityText,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded malware signature (keepcold131/ClawdAuthenticatorTool) is embedded directly in the scanning logic. Consider moving known signatures to a configuration file or database table to allow updates without code deployment.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: convex/lib/moderationEngine.ts
Line: 253:262

Comment:
Hardcoded malware signature (`keepcold131/ClawdAuthenticatorTool`) is embedded directly in the scanning logic. Consider moving known signatures to a configuration file or database table to allow updates without code deployment.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 15, 2026

Additional Comments (1)

packages/clawdhub/src/security/staticScan.ts
Static scanning logic is duplicated between this CLI implementation and convex/lib/moderationEngine.ts (server). The reason codes match correctly, but future changes to detection patterns require manual synchronization across both files. Consider extracting shared scanning logic to a shared package or generating one implementation from the other to reduce maintenance burden.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/clawdhub/src/security/staticScan.ts
Line: 1:228

Comment:
Static scanning logic is duplicated between this CLI implementation and `convex/lib/moderationEngine.ts` (server). The reason codes match correctly, but future changes to detection patterns require manual synchronization across both files. Consider extracting shared scanning logic to a shared package or generating one implementation from the other to reduce maintenance burden.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

@ArthurzKV
Copy link
Author

Superseded by split PRs for reviewability: #333 (core backend), #334 (API/schema), #335 (CLI/UI/docs).

@ArthurzKV ArthurzKV closed this Feb 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant