Skip to content

Phase 7: Connector framework + EDGAR (Tier 1 reference)#8

Merged
Archibald312 merged 1 commit into
mainfrom
phase-7-connector-framework-edgar
May 20, 2026
Merged

Phase 7: Connector framework + EDGAR (Tier 1 reference)#8
Archibald312 merged 1 commit into
mainfrom
phase-7-connector-framework-edgar

Conversation

@Archibald312
Copy link
Copy Markdown
Owner

Summary

  • Thin connector framework (types + registry + shared ingest helper); each connector owns its own client, auth, and fetch primitives.
  • EDGAR Tier-1 reference connector: ticker lookup, filings list, ingest primary doc (HTML→PDF), optional exhibits, optional XBRL.
  • edgar_facts table populated by a pure XBRL instance parser — Phase 8 will consume these facts directly to cross-check prose-extracted numbers.
  • /connectors/edgar/{lookup,filings,ingest} routes; backend-only by design (no UI this phase). Each ingest emits a connector_fetch audit row.

See decisions.md (2026-05-20) for full scope rationale.

Test plan

  • backend tsc --noEmit clean
  • backend vitest 84/84 passing (11 new — registry, xbrl, client)
  • Apply backend/migrations/connectors_phase7.sql to staging Supabase
  • Set EDGAR_USER_AGENT env var (must contain a contact email per SEC fair-use)
  • curl POST /connectors/edgar/lookup with {"ticker":"AAPL"} returns the right CIK
  • curl POST /connectors/edgar/ingest with a known accession + extract_xbrl: true lands documents + populates edgar_facts
  • Re-ingesting the same accession is deduped (no new rows, returns deduped: true)

🤖 Generated with Claude Code

- Thin framework: types, registry, shared ingest helper that writes
  documents/document_versions. Each connector owns its client + auth +
  fetch primitives; framework only standardizes the ingest tail.
- documents.source_connector + source_ref jsonb with a unique partial
  index on (connector, accession_number, document_role) for dedupe.
- EDGAR client: ticker lookup, recent-filings list, filing index +
  document fetch. Requires EDGAR_USER_AGENT (SEC fair-use policy).
- EDGAR connector: ingest primary doc (HTML transcoded to PDF for
  citation parity), optional exhibits, optional XBRL.
- edgar_facts table + pure XBRL instance parser. Phase 8 consumes
  these facts directly — no LLM in the producer path.
- /connectors/edgar/{lookup,filings,ingest} routes; backend-only by
  design (no UI this phase). connector_fetch audit row per ingest.
- htmlToPdf in convert.ts (LibreOffice).

See decisions.md (2026-05-20) for scope decisions.

Verification: backend tsc --noEmit clean; vitest 84/84 (11 new).
@Archibald312 Archibald312 merged commit 6093c23 into main May 20, 2026
4 checks passed
@Archibald312 Archibald312 deleted the phase-7-connector-framework-edgar branch May 20, 2026 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant