feat: add deterministic evaluation report flow#35
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bc490bc19d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const snapshotPath = path.join(input.outputRoot, "evaluation-snapshot.json") | ||
| const markdown = renderReportEvaluationMarkdown(input.snapshot) | ||
| const snapshotText = JSON.stringify(ReportEvaluationSnapshotSchema.parse(input.snapshot), null, 2) | ||
| await Promise.all([Bun.write(evaluationPath, markdown), Bun.write(snapshotPath, snapshotText)]) |
There was a problem hiding this comment.
Replace Bun.write with runtime-neutral file writes
writeReportEvaluationArtifacts uses Bun.write, but this package is also executed in Node-based contexts (for example OpenClaw’s CJS plugin runtime), where Bun is undefined; calling report_evaluation there will throw ReferenceError: Bun is not defined before either artifact is written. This should use the existing runtime-neutral file IO path (fs.writeFile/writeText) so report generation works outside Bun.
Useful? React with 👍 / 👎.
| @@ -102,6 +102,6 @@ export const FINANCE_SLASH_COMMANDS: FinanceSlashCommand[] = [ | |||
| description: "Generate a comprehensive public-company financial report", | |||
| hints: ["$1", "$2"], | |||
| template: | |||
| 'Build a very comprehensive public-company financial report for "$1".\n\nFirst call the skill tool with name "finance-comprehensive-report".\n\nThen produce:\n1) a concise in-chat executive summary including `directional_conviction_score` (0-100), score band (`bearish`, `neutral`, `bullish`), and top drivers\n2) markdown artifacts in `reports/$1/<YYYY-MM-DD>/`:\n - `report.md`\n - `dashboard.md`\n - `assumptions.json`\n - `adjustment-log.md`\n\nInside `report.md`, include dedicated sections for technical analysis, fundamental analysis, risk assessment, portfolio fit, market intelligence, scenario valuation, and monitoring triggers.\n\nInside `assumptions.json`, include `scenario_assumptions`, `score_inputs`, `factor_weights`, and `uncertainty_flags`.\n\nUse `financial_search` with `coverage: \"comprehensive\"` for numeric financial claims. Use Exa (`websearch`) only for qualitative news/catalyst context, never for numeric financial metrics.\n\nUse IEEE-style numbered citations: place [N] in-text for each factual claim and collect all references in a final ## Sources section at the end of report.md. Each entry: [N] Publisher/Domain, "title," URL. Retrieved YYYY-MM-DD. Never use generic source labels like `websearch`, `exa`, `search`, or `internet`.\n\nIn `dashboard.md`, include separate KPI rows for Stock price, Previous close, Daily change, Daily change percent, 52W high/low or 52W range, YTD return, Market cap, Analyst consensus, Revenue, Net income, Free cash flow, and Debt-to-equity.\n\nFor insider analysis, call `financial_search` with intent `insider` and `coverage: \"comprehensive\"`, and use returned `ownershipChange` when available.\n\nKeep KPI source labels and source URLs coherent (for example, Yahoo chart label must use Yahoo chart URL).\n\nIn `report.md`, include metadata lines: `Sector: <value>`, `Headquarters: <value>`, `Website: <value>`, and `Icon URL: <value>`.\n\nUse period-aware fundamental labels based on `metricPeriods` (`TTM`, `FY`, `Q`); do not force `(TTM)` when the period differs.\n\nWhen portfolio holdings/benchmark/mandate are missing, render portfolio fit as `Not evaluated: missing portfolio holdings, benchmark, and mandate`.\n\nNever emit the token `N/A` in report artifacts. Use observed values or explicit `unknown`/reason text.\n\nAfter all markdown artifacts are written, ask exactly one user question with the `question` tool:\n- header: `PDF Export`\n- question: `Generate a polished PDF report now?`\n- options:\n 1) `Yes (Recommended)` - Generate a polished PDF in the report directory.\n 2) `No` - Skip PDF generation.\n- custom: `false`\n\nIf user selects `Yes (Recommended)`, call `report_pdf` with:\n- `subcommand`: `report`\n- `outputRoot`: `reports/$1/<YYYY-MM-DD>/`\n- `filename`: `$1-<YYYY-MM-DD>.pdf`\n\nIf `question` is unavailable in this client context, skip PDF export and complete the report normally.\n\nPDF requirements:\n- first page: ticker summary\n- remaining pages: report information from `report.md`, `dashboard.md`, and `assumptions.json` (exclude `adjustment-log.md`)\n- footer on every page must display `opencode.finance` and link to `https://opencode.finance`\n\nInclude source and timestamp for factual claims, label missing values as "unknown", and keep observed data separate from model assumptions and inference. Do not provide investment advice.', | |||
| 'Build a very comprehensive public-company financial report for "$1".\n\nFirst call the skill tool with name "finance-comprehensive-report".\n\nThen produce:\n1) a concise in-chat executive summary including `directional_conviction_score` (0-100), score band (`bearish`, `neutral`, `bullish`), and top drivers\n2) markdown artifacts in `reports/$1/<YYYY-MM-DD>/`:\n - `report.md`\n - `dashboard.md`\n - `assumptions.json`\n - `adjustment-log.md`\n - `evaluation.md`\n - `evaluation-snapshot.json`\n\nInside `report.md`, include dedicated sections for technical analysis, fundamental analysis, risk assessment, portfolio fit, market intelligence, scenario valuation, and monitoring triggers.\n\nInside `assumptions.json`, include `scenario_assumptions`, `score_inputs`, `factor_weights`, and `uncertainty_flags`.\n\nUse `financial_search` with `coverage: \"comprehensive\"` for numeric financial claims. Use Exa (`websearch`) only for qualitative news/catalyst context, never for numeric financial metrics.\n\nUse IEEE-style numbered citations: place [N] in-text for each factual claim and collect all references in a final ## Sources section at the end of report.md. Each entry: [N] Publisher/Domain, "title," URL. Retrieved YYYY-MM-DD. Never use generic source labels like `websearch`, `exa`, `search`, or `internet`.\n\nIn `dashboard.md`, include separate KPI rows for Stock price, Previous close, Daily change, Daily change percent, 52W high/low or 52W range, YTD return, Market cap, Analyst consensus, Revenue, Net income, Free cash flow, and Debt-to-equity.\n\nFor insider analysis, call `financial_search` with intent `insider` and `coverage: \"comprehensive\"`, and use returned `ownershipChange` when available.\n\nKeep KPI source labels and source URLs coherent (for example, Yahoo chart label must use Yahoo chart URL).\n\nIn `report.md`, include metadata lines: `Sector: <value>`, `Headquarters: <value>`, `Website: <value>`, and `Icon URL: <value>`.\n\nUse period-aware fundamental labels based on `metricPeriods` (`TTM`, `FY`, `Q`); do not force `(TTM)` when the period differs.\n\nWhen portfolio holdings/benchmark/mandate are missing, render portfolio fit as `Not evaluated: missing portfolio holdings, benchmark, and mandate`.\n\nNever emit the token `N/A` in report artifacts. Use observed values or explicit `unknown`/reason text.\n\nAfter `report.md`, `dashboard.md`, `assumptions.json`, and `adjustment-log.md` are written, call `report_evaluation` with:\n- `ticker`: `$1`\n- `outputRoot`: `reports/$1/<YYYY-MM-DD>/`\n\nAfter `report_evaluation` writes `evaluation.md` and `evaluation-snapshot.json`, call `report_pdf` with:\n- `subcommand`: `report`\n- `outputRoot`: `reports/$1/<YYYY-MM-DD>/`\n- `filename`: `$1-<YYYY-MM-DD>.pdf`\n\nDo not ask a PDF question and do not skip PDF generation.\n\nPDF requirements:\n- first page: ticker summary\n- second page: deterministic evaluation snapshot from `evaluation-snapshot.json`\n- remaining pages: report information from `report.md`, `dashboard.md`, and `assumptions.json` (exclude `adjustment-log.md`)\n- footer on every page must display `opencode.finance` and link to `https://opencode.finance`\n\nInclude source and timestamp for factual claims, label missing values as "unknown", and keep observed data separate from model assumptions and inference. Do not provide investment advice.', | |||
There was a problem hiding this comment.
Don’t require
report_evaluation where the tool is unavailable
The /report template now unconditionally requires calling report_evaluation and forbids skipping PDF generation, but OpenClaw v1 currently exposes only financial_search and report_pdf tools, so this flow cannot complete in that environment and report_pdf will block on missing evaluation.md/evaluation-snapshot.json. Add a capability fallback in this template (or expose report_evaluation in OpenClaw) to avoid hard-failing /report there.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9131dab02e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| function pickMetricValue(metric: Record<string, unknown>, keys: string[]) { | ||
| for (const key of keys) { | ||
| const value = metric[key] | ||
| const percent = /yield|ratio|margin|growth|roe|roa|roic|return/i.test(key) ? toPercent(value) : toNumber(value) |
There was a problem hiding this comment.
Stop scaling debt-to-equity as a percent
pickMetricValue treats any key containing ratio as a percentage and runs toPercent, which multiplies values <= 1 by 100. That mis-scales Finnhub’s totalDebtToEquity* fields when they are returned as ratios (for example 0.45 becomes 45), and the downstream Debt to Equity metric is then rendered as 45x instead of 0.45x in evaluation-snapshot.json/PDF outputs. This introduces a material valuation-quality error for low-leverage companies and should exclude debt-to-equity from percent normalization.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6588d52fac
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const currentRoa = input.currentNetIncome / averageAssets | ||
| const priorRoa = input.priorNetIncome / input.priorTotalAssets |
There was a problem hiding this comment.
Compute Piotroski ROA with period-consistent denominators
In calculatePiotroskiFScore, the current-period ROA is divided by averageAssets while the prior-period ROA is divided by priorTotalAssets, so the currentRoa > priorRoa signal compares two differently scaled ratios. When assets change materially year-over-year, this can flip the profitability-improvement point and produce an incorrect F-score in evaluation-snapshot.json/PDF outputs.
Useful? React with 👍 / 👎.
| const currentRatio = input.currentCurrentAssets / input.currentCurrentLiabilities | ||
| const priorRatio = input.priorCurrentAssets / input.priorCurrentLiabilities |
There was a problem hiding this comment.
Guard Piotroski component ratios from divide-by-zero artifacts
Several Piotroski inputs are only null-checked before division, so zero-valued denominators (for example currentCurrentLiabilities or priorCurrentLiabilities) produce Infinity/NaN and can incorrectly award or suppress score points instead of marking the metric unknown. This is especially problematic when provider payloads contain zeros for incomplete statement fields, because it silently corrupts the deterministic score.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 446d56c6ea
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| evaluationMarkdown: await readRequired(path.join(root, "evaluation.md")), | ||
| evaluationSnapshotJson: await readRequired(path.join(root, "evaluation-snapshot.json")), |
There was a problem hiding this comment.
Keep report_pdf compatible with the packaged report skill
The report profile now hard-requires evaluation.md and evaluation-snapshot.json, but the packaged OpenClaw report skill (packages/opencode-finance/skills/report/SKILL.md) still instructs generating only report.md, dashboard.md, assumptions.json, and adjustment-log.md before calling report_pdf. In that shipped workflow, report_pdf will now fail with missing-artifact errors, so /report cannot complete unless users manually add an extra tool step. Please either update the packaged skill to invoke report_evaluation first or add a compatibility path for legacy report artifacts.
Useful? React with 👍 / 👎.
| if (!existsSync(distIndex) || !existsSync(distOpenClaw)) { | ||
| const build = spawnSync("bun", ["run", "build"], { |
There was a problem hiding this comment.
Rebuild dist unconditionally during prepack
This prepack script now skips bun run build whenever dist/index.js and dist/openclaw/index.cjs already exist, which allows stale compiled output and copied assets to be published after source changes. A maintainer running npm pack/npm publish locally with an existing dist/ can silently ship old code even though the source has changed; prepack should rebuild (or at least verify freshness) every time.
Useful? React with 👍 / 👎.
Summary
report_evaluationtool and artifact writer forevaluation.mdandevaluation-snapshot.jsonopencode-financeto1.2.9-beta.0Verification
TMPDIR=/tmp bun run typecheckTMPDIR=/tmp bun run buildbun test src/command/finance.test.ts src/tool/report-pdf.test.ts test/unit/finance-plugin-report-hints.test.ts test/unit/report-skill-contract.test.ts test/unit/report-evaluation-artifacts.test.ts test/unit/openclaw-tool-adapter.test.ts test/unit/install-script-skill-sync.test.tsNotes
bun testrun in the temp worktree still shows unrelated environment/pre-existing failures around storage permissions, Node/OpenClaw ESM import, and npm pack timeout.