Skip to content

feat: add provenance metadata for pipeline entries#894

Closed
luochen211 wants to merge 2 commits into
santifer:mainfrom
luochen211:codex/pipeline-provenance-metadata
Closed

feat: add provenance metadata for pipeline entries#894
luochen211 wants to merge 2 commits into
santifer:mainfrom
luochen211:codex/pipeline-provenance-metadata

Conversation

@luochen211

@luochen211 luochen211 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add backward-compatible provenance metadata to new scan-written pipeline rows.
  • Include source, provider, and verified_at, using verified_at=unverified when scans run without --verify.
  • Document the metadata format and add a regression test proving the leading URL remains parseable by existing readers.

Fixes #878

Tests

  • node --check scan.mjs && node --check test-all.mjs
  • node test-all.mjs --quick — 166 passed, 0 failed, 7 existing README.ua personal-data warnings

Summary by CodeRabbit

  • Documentation

    • Updated scan script and pipeline documentation with new metadata field specifications.
  • New Features

    • Pipeline entries now include provenance metadata (source, provider) and verification timestamps.
    • Runs without verification flag record verified_at=unverified for transparency.
    • Trailing metadata is backward-compatible; existing systems safely ignore unknown fields.

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@luochen211, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 1 minute and 39 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d864be09-561d-435f-b204-7a1d42f4cd26

📥 Commits

Reviewing files that changed from the base of the PR and between 9576631 and 299e8a6.

📒 Files selected for processing (1)
  • scan.mjs
📝 Walkthrough

Walkthrough

The PR adds lightweight provenance metadata to pipeline entries stored in data/pipeline.md. It introduces new helper functions to normalize source values and format pipeline rows with trailing metadata fields (source, provider, verified_at), updates the pipeline insertion logic to use the formatter, and includes tests validating the metadata format while maintaining backward compatibility.

Changes

Pipeline Provenance Metadata

Layer / File(s) Summary
Pipeline metadata format and documentation
modes/pipeline.md, docs/SCRIPTS.md
Defines backward-compatible pipeline metadata syntax with examples showing trailing metadata fields (source, provider, verified_at) appended after URLs, and documents that existing readers should ignore unknown trailing fields.
Pipeline formatting helpers
scan.mjs
Exports pipelineSourceFor to normalize an offer's source into pipeline source values, and formatPipelineEntry to generate checkbox rows with appended metadata fields (source, provider, verified_at).
Pipeline insertion with metadata
scan.mjs
Updates appendToPipeline to accept a verifiedAt option and use formatPipelineEntry to format pending section entries; passes verifiedAt based on --verify flag at call site.
Metadata format validation
test-all.mjs
New test imports formatPipelineEntry and validates that generated entries include backward-compatible provenance metadata fields and correct URL/checkbox formatting with a fixed verifiedAt date.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • santifer/career-ops#487: The retrieved PR's --verify workflow changes how scan.mjs classifies/drops offers before they're appended to pipeline.md, which directly intersects with the main PR's scan.mjs updates to pipeline row formatting (including verified_at=... and provenance metadata) during pipeline append.
  • santifer/career-ops#602: Both PRs modify scan.mjs's provenance labeling around an offer's source/provider identity (retrieved PR changes how source is derived from provider.id, while main PR formats and persists that source into pipeline.md with provider=/verified_at metadata).

Suggested labels

🔧 scripts, 📄 docs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the main change: adding provenance metadata to pipeline entries, which directly aligns with the primary objective of the changeset.
Linked Issues check ✅ Passed All implementation requirements from issue #878 are met: metadata syntax defined and documented, scan.mjs updated to append source/verified metadata, docs updated, and regression tests added ensuring backward compatibility.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #878 objectives: documentation updates, metadata formatting logic, and regression tests for backward compatibility with no extraneous modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scan.mjs`:
- Line 239: The assignment const verified = verifiedAt || 'unverified' is
redundant given the default parameter verifiedAt = 'unverified'; replace it with
a nullish-coalescing fallback to preserve safety for null/undefined but allow
empty strings: change to const verified = verifiedAt ?? 'unverified' (or remove
the local const and use verifiedAt directly) so the code is clearer and not
masking valid empty-string values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f8459be8-9de8-4b25-a842-efc62066ae31

📥 Commits

Reviewing files that changed from the base of the PR and between 214f5f8 and 9576631.

📒 Files selected for processing (4)
  • docs/SCRIPTS.md
  • modes/pipeline.md
  • scan.mjs
  • test-all.mjs

Comment thread scan.mjs Outdated
@luochen211

Copy link
Copy Markdown
Contributor Author

Addressed the CodeRabbit nitpick on provenance metadata formatting. formatPipelineEntry() now relies on the destructured verifiedAt = "unverified" default directly, removing the redundant local fallback while preserving the existing output contract.

Validation: node --check scan.mjs && node test-all.mjs --quick (166 passed, 0 failed, existing README.ua warnings only).

@santifer

Copy link
Copy Markdown
Owner

Closing under the acceptance criterion explained in full on #890: core takes what the candidate uses; project-artifact tooling lives outside the core. For this one specifically: provenance fields have no consumer today — when something needs them, the schema should be designed against that consumer, not ahead of it. The companion-repo door from #890 applies to this one too. 🙏

@santifer santifer closed this Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add provenance metadata for pipeline entries

2 participants