Skip to content

feat: add deterministic evaluation report flow#35

Open
bilalbayram wants to merge 6 commits intomainfrom
feat/eval-page
Open

feat: add deterministic evaluation report flow#35
bilalbayram wants to merge 6 commits intomainfrom
feat/eval-page

Conversation

@bilalbayram
Copy link
Copy Markdown
Owner

Summary

  • add a deterministic report_evaluation tool and artifact writer for evaluation.md and evaluation-snapshot.json
  • require those evaluation artifacts in the report PDF flow and render a dedicated page 2 with metric groups and quarterly charts
  • update report prompts/skill/install-script sync, add focused tests, and bump opencode-finance to 1.2.9-beta.0

Verification

  • TMPDIR=/tmp bun run typecheck
  • TMPDIR=/tmp bun run build
  • bun test src/command/finance.test.ts src/tool/report-pdf.test.ts test/unit/finance-plugin-report-hints.test.ts test/unit/report-skill-contract.test.ts test/unit/report-evaluation-artifacts.test.ts test/unit/openclaw-tool-adapter.test.ts test/unit/install-script-skill-sync.test.ts

Notes

  • A full bun test run in the temp worktree still shows unrelated environment/pre-existing failures around storage permissions, Node/OpenClaw ESM import, and npm pack timeout.

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 28, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
opencode-finance-web-wulr Ignored Ignored Preview Mar 30, 2026 1:38pm

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bc490bc19d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const snapshotPath = path.join(input.outputRoot, "evaluation-snapshot.json")
const markdown = renderReportEvaluationMarkdown(input.snapshot)
const snapshotText = JSON.stringify(ReportEvaluationSnapshotSchema.parse(input.snapshot), null, 2)
await Promise.all([Bun.write(evaluationPath, markdown), Bun.write(snapshotPath, snapshotText)])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Replace Bun.write with runtime-neutral file writes

writeReportEvaluationArtifacts uses Bun.write, but this package is also executed in Node-based contexts (for example OpenClaw’s CJS plugin runtime), where Bun is undefined; calling report_evaluation there will throw ReferenceError: Bun is not defined before either artifact is written. This should use the existing runtime-neutral file IO path (fs.writeFile/writeText) so report generation works outside Bun.

Useful? React with 👍 / 👎.

@@ -102,6 +102,6 @@ export const FINANCE_SLASH_COMMANDS: FinanceSlashCommand[] = [
description: "Generate a comprehensive public-company financial report",
hints: ["$1", "$2"],
template:
'Build a very comprehensive public-company financial report for "$1".\n\nFirst call the skill tool with name "finance-comprehensive-report".\n\nThen produce:\n1) a concise in-chat executive summary including `directional_conviction_score` (0-100), score band (`bearish`, `neutral`, `bullish`), and top drivers\n2) markdown artifacts in `reports/$1/<YYYY-MM-DD>/`:\n - `report.md`\n - `dashboard.md`\n - `assumptions.json`\n - `adjustment-log.md`\n\nInside `report.md`, include dedicated sections for technical analysis, fundamental analysis, risk assessment, portfolio fit, market intelligence, scenario valuation, and monitoring triggers.\n\nInside `assumptions.json`, include `scenario_assumptions`, `score_inputs`, `factor_weights`, and `uncertainty_flags`.\n\nUse `financial_search` with `coverage: \"comprehensive\"` for numeric financial claims. Use Exa (`websearch`) only for qualitative news/catalyst context, never for numeric financial metrics.\n\nUse IEEE-style numbered citations: place [N] in-text for each factual claim and collect all references in a final ## Sources section at the end of report.md. Each entry: [N] Publisher/Domain, "title," URL. Retrieved YYYY-MM-DD. Never use generic source labels like `websearch`, `exa`, `search`, or `internet`.\n\nIn `dashboard.md`, include separate KPI rows for Stock price, Previous close, Daily change, Daily change percent, 52W high/low or 52W range, YTD return, Market cap, Analyst consensus, Revenue, Net income, Free cash flow, and Debt-to-equity.\n\nFor insider analysis, call `financial_search` with intent `insider` and `coverage: \"comprehensive\"`, and use returned `ownershipChange` when available.\n\nKeep KPI source labels and source URLs coherent (for example, Yahoo chart label must use Yahoo chart URL).\n\nIn `report.md`, include metadata lines: `Sector: <value>`, `Headquarters: <value>`, `Website: <value>`, and `Icon URL: <value>`.\n\nUse period-aware fundamental labels based on `metricPeriods` (`TTM`, `FY`, `Q`); do not force `(TTM)` when the period differs.\n\nWhen portfolio holdings/benchmark/mandate are missing, render portfolio fit as `Not evaluated: missing portfolio holdings, benchmark, and mandate`.\n\nNever emit the token `N/A` in report artifacts. Use observed values or explicit `unknown`/reason text.\n\nAfter all markdown artifacts are written, ask exactly one user question with the `question` tool:\n- header: `PDF Export`\n- question: `Generate a polished PDF report now?`\n- options:\n 1) `Yes (Recommended)` - Generate a polished PDF in the report directory.\n 2) `No` - Skip PDF generation.\n- custom: `false`\n\nIf user selects `Yes (Recommended)`, call `report_pdf` with:\n- `subcommand`: `report`\n- `outputRoot`: `reports/$1/<YYYY-MM-DD>/`\n- `filename`: `$1-<YYYY-MM-DD>.pdf`\n\nIf `question` is unavailable in this client context, skip PDF export and complete the report normally.\n\nPDF requirements:\n- first page: ticker summary\n- remaining pages: report information from `report.md`, `dashboard.md`, and `assumptions.json` (exclude `adjustment-log.md`)\n- footer on every page must display `opencode.finance` and link to `https://opencode.finance`\n\nInclude source and timestamp for factual claims, label missing values as "unknown", and keep observed data separate from model assumptions and inference. Do not provide investment advice.',
'Build a very comprehensive public-company financial report for "$1".\n\nFirst call the skill tool with name "finance-comprehensive-report".\n\nThen produce:\n1) a concise in-chat executive summary including `directional_conviction_score` (0-100), score band (`bearish`, `neutral`, `bullish`), and top drivers\n2) markdown artifacts in `reports/$1/<YYYY-MM-DD>/`:\n - `report.md`\n - `dashboard.md`\n - `assumptions.json`\n - `adjustment-log.md`\n - `evaluation.md`\n - `evaluation-snapshot.json`\n\nInside `report.md`, include dedicated sections for technical analysis, fundamental analysis, risk assessment, portfolio fit, market intelligence, scenario valuation, and monitoring triggers.\n\nInside `assumptions.json`, include `scenario_assumptions`, `score_inputs`, `factor_weights`, and `uncertainty_flags`.\n\nUse `financial_search` with `coverage: \"comprehensive\"` for numeric financial claims. Use Exa (`websearch`) only for qualitative news/catalyst context, never for numeric financial metrics.\n\nUse IEEE-style numbered citations: place [N] in-text for each factual claim and collect all references in a final ## Sources section at the end of report.md. Each entry: [N] Publisher/Domain, "title," URL. Retrieved YYYY-MM-DD. Never use generic source labels like `websearch`, `exa`, `search`, or `internet`.\n\nIn `dashboard.md`, include separate KPI rows for Stock price, Previous close, Daily change, Daily change percent, 52W high/low or 52W range, YTD return, Market cap, Analyst consensus, Revenue, Net income, Free cash flow, and Debt-to-equity.\n\nFor insider analysis, call `financial_search` with intent `insider` and `coverage: \"comprehensive\"`, and use returned `ownershipChange` when available.\n\nKeep KPI source labels and source URLs coherent (for example, Yahoo chart label must use Yahoo chart URL).\n\nIn `report.md`, include metadata lines: `Sector: <value>`, `Headquarters: <value>`, `Website: <value>`, and `Icon URL: <value>`.\n\nUse period-aware fundamental labels based on `metricPeriods` (`TTM`, `FY`, `Q`); do not force `(TTM)` when the period differs.\n\nWhen portfolio holdings/benchmark/mandate are missing, render portfolio fit as `Not evaluated: missing portfolio holdings, benchmark, and mandate`.\n\nNever emit the token `N/A` in report artifacts. Use observed values or explicit `unknown`/reason text.\n\nAfter `report.md`, `dashboard.md`, `assumptions.json`, and `adjustment-log.md` are written, call `report_evaluation` with:\n- `ticker`: `$1`\n- `outputRoot`: `reports/$1/<YYYY-MM-DD>/`\n\nAfter `report_evaluation` writes `evaluation.md` and `evaluation-snapshot.json`, call `report_pdf` with:\n- `subcommand`: `report`\n- `outputRoot`: `reports/$1/<YYYY-MM-DD>/`\n- `filename`: `$1-<YYYY-MM-DD>.pdf`\n\nDo not ask a PDF question and do not skip PDF generation.\n\nPDF requirements:\n- first page: ticker summary\n- second page: deterministic evaluation snapshot from `evaluation-snapshot.json`\n- remaining pages: report information from `report.md`, `dashboard.md`, and `assumptions.json` (exclude `adjustment-log.md`)\n- footer on every page must display `opencode.finance` and link to `https://opencode.finance`\n\nInclude source and timestamp for factual claims, label missing values as "unknown", and keep observed data separate from model assumptions and inference. Do not provide investment advice.',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Don’t require report_evaluation where the tool is unavailable

The /report template now unconditionally requires calling report_evaluation and forbids skipping PDF generation, but OpenClaw v1 currently exposes only financial_search and report_pdf tools, so this flow cannot complete in that environment and report_pdf will block on missing evaluation.md/evaluation-snapshot.json. Add a capability fallback in this template (or expose report_evaluation in OpenClaw) to avoid hard-failing /report there.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9131dab02e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

function pickMetricValue(metric: Record<string, unknown>, keys: string[]) {
for (const key of keys) {
const value = metric[key]
const percent = /yield|ratio|margin|growth|roe|roa|roic|return/i.test(key) ? toPercent(value) : toNumber(value)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop scaling debt-to-equity as a percent

pickMetricValue treats any key containing ratio as a percentage and runs toPercent, which multiplies values <= 1 by 100. That mis-scales Finnhub’s totalDebtToEquity* fields when they are returned as ratios (for example 0.45 becomes 45), and the downstream Debt to Equity metric is then rendered as 45x instead of 0.45x in evaluation-snapshot.json/PDF outputs. This introduces a material valuation-quality error for low-leverage companies and should exclude debt-to-equity from percent normalization.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6588d52fac

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +716 to +717
const currentRoa = input.currentNetIncome / averageAssets
const priorRoa = input.priorNetIncome / input.priorTotalAssets
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Compute Piotroski ROA with period-consistent denominators

In calculatePiotroskiFScore, the current-period ROA is divided by averageAssets while the prior-period ROA is divided by priorTotalAssets, so the currentRoa > priorRoa signal compares two differently scaled ratios. When assets change materially year-over-year, this can flip the profitability-improvement point and produce an incorrect F-score in evaluation-snapshot.json/PDF outputs.

Useful? React with 👍 / 👎.

Comment on lines +720 to +721
const currentRatio = input.currentCurrentAssets / input.currentCurrentLiabilities
const priorRatio = input.priorCurrentAssets / input.priorCurrentLiabilities
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard Piotroski component ratios from divide-by-zero artifacts

Several Piotroski inputs are only null-checked before division, so zero-valued denominators (for example currentCurrentLiabilities or priorCurrentLiabilities) produce Infinity/NaN and can incorrectly award or suppress score points instead of marking the metric unknown. This is especially problematic when provider payloads contain zeros for incomplete statement fields, because it silently corrupts the deterministic score.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 446d56c6ea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +535 to +536
evaluationMarkdown: await readRequired(path.join(root, "evaluation.md")),
evaluationSnapshotJson: await readRequired(path.join(root, "evaluation-snapshot.json")),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep report_pdf compatible with the packaged report skill

The report profile now hard-requires evaluation.md and evaluation-snapshot.json, but the packaged OpenClaw report skill (packages/opencode-finance/skills/report/SKILL.md) still instructs generating only report.md, dashboard.md, assumptions.json, and adjustment-log.md before calling report_pdf. In that shipped workflow, report_pdf will now fail with missing-artifact errors, so /report cannot complete unless users manually add an extra tool step. Please either update the packaged skill to invoke report_evaluation first or add a compatibility path for legacy report artifacts.

Useful? React with 👍 / 👎.

Comment on lines +11 to +12
if (!existsSync(distIndex) || !existsSync(distOpenClaw)) {
const build = spawnSync("bun", ["run", "build"], {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Rebuild dist unconditionally during prepack

This prepack script now skips bun run build whenever dist/index.js and dist/openclaw/index.cjs already exist, which allows stale compiled output and copied assets to be published after source changes. A maintainer running npm pack/npm publish locally with an existing dist/ can silently ship old code even though the source has changed; prepack should rebuild (or at least verify freshness) every time.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant