feat(ci): add nightly onboard trace timing summaries#5245
Conversation
Signed-off-by: Angel Mata <amata@nvidia.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds recursive trace sanitization, always-on artifact inputs/uploads for E2E runs, a trace-timing analyzer that compares current vs prior nightly traces, scorecard integration to include traceTimingLine/traceSummaryLines, Slack block updates to render trace summaries, and tests/docs validating wiring and output. ChangesNightly Trace Timing and Slack Summaries
Sequence Diagram(s)sequenceDiagram
participant ComponentA
participant ComponentB
ComponentA->>ComponentB: observable interaction
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: None Full advisor summaryE2E Recommendation AdvisorFailed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-advisor-raw-output.txt |
Vitest E2E Scenario RecommendationRequired Vitest E2E scenarios: None Full Vitest E2E advisor summaryVitest E2E Scenario AdvisorFailed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-scenario-advisor-raw-output.txt |
PR Review AdvisorFindings: 0 needs attention, 1 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Consider writing more tests for
This is an automated advisory review. A human maintainer must make the final merge decision. |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/ci/sanitize-trace-artifacts.js (1)
1-104:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winCritical: File must use TypeScript extension per CI guardrails.
The pipeline failure indicates this file violates the codebase growth guardrails policy. The project requires using
.tsinstead of.js/.cjs/.mjsfor new Node.js code. Rename this file tosanitize-trace-artifacts.tsand add appropriate TypeScript type annotations.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/ci/sanitize-trace-artifacts.js` around lines 1 - 104, Rename the file to sanitize-trace-artifacts.ts and convert it from CommonJS to TypeScript: replace require(...) with TypeScript imports (e.g. import * as fs from "node:fs"; import * as path from "node:path"), add explicit type annotations for functions and variables (e.g. redactString(value: string): string, sanitize(value: unknown, key = ""): unknown, listJsonFiles(directory: string): string[], sanitizeTraceArtifacts(sourceDirectory: string, outputDirectory: string): { files: number; outputDirectory: string }), cast JSON.parse results to unknown/any before sanitizing, and update exports to ESNamed exports (export { REDACTED, sanitize, sanitizeTraceArtifacts }); keep the CLI block but ensure TypeScript accepts require.main by either using if (require.main === module) with a top-level declare const require: any; or convert to an import.meta.url check, and adjust process.argv typing as string[] so the script compiles under tsconfig.Source: Pipeline failures
🧹 Nitpick comments (6)
scripts/ci/sanitize-trace-artifacts.js (3)
73-73: ⚡ Quick winAdd error handling for malformed JSON.
The
JSON.parsecall will throw if a.jsonfile contains invalid JSON. Consider wrapping this in a try-catch to provide a more informative error message that includes the filename.🛡️ Proposed fix to add error handling
- const parsed = JSON.parse(fs.readFileSync(file, "utf8")); + let parsed; + try { + parsed = JSON.parse(fs.readFileSync(file, "utf8")); + } catch (error) { + throw new Error(`Failed to parse JSON from ${file}: ${error.message}`); + } const sanitized = sanitize(parsed);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/ci/sanitize-trace-artifacts.js` at line 73, The JSON.parse call that creates the local variable parsed should be wrapped in a try-catch to handle malformed JSON; locate the line using JSON.parse(fs.readFileSync(file, "utf8")) and surround it with a try block, catch the thrown error, and produce an informative message that includes the filename (file) and the parser error (e.g., using processLogger.error or console.error), then either rethrow or skip/continue depending on sanitizeTraceArtifacts' desired behavior; ensure the catch preserves stack/error details for debugging.
69-71: 💤 Low valuePath validation is correct but document the security contract.
The path safety check correctly rejects
..(directory traversal), absolute paths, and paths that would escape the source root. This prevents writing sanitized output to arbitrary filesystem locations.Consider adding a comment explaining the security rationale:
// Security: Reject paths that escape sourceRoot via "..", absolute paths, // or would resolve outside the intended output directory.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/ci/sanitize-trace-artifacts.js` around lines 69 - 71, Add a short security comment above the path-safety check that documents the contract: explain that the check using relativePath.startsWith("..") and path.isAbsolute(relativePath) (and the thrown Error referencing file) intentionally rejects directory traversal and absolute paths to prevent sanitized traces from being written outside the intended output/source root; keep it concise and mention the expected guarantee that resolved paths will remain inside the intended output directory.
10-18: ⚖️ Poor tradeoffVerify completeness of sensitive value patterns.
The current patterns cover common token formats (Bearer, Slack, NVIDIA API, GitHub), but consider whether additional patterns are needed:
- AWS credentials (
AKIA..., secret keys)- Azure tokens
- Generic JWT patterns
- Other cloud provider tokens that might appear in traces
Would you like me to search the codebase for additional credential patterns that should be included?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/ci/sanitize-trace-artifacts.js` around lines 10 - 18, SENSITIVE_VALUE_RES currently misses several common credential/token formats; update the array used alongside SENSITIVE_KEY_RE to add regexes for AWS access keys (AKIA/ASIA followed by 16 chars) and AWS secret access key-like base64 strings, generic JWT-looking tokens (three dot-separated Base64URL segments), Azure/AD tokens (eyJ0eXAi... plus long Base64URL), and other common cloud prefixes (e.g., GCP service account keys, long hex API keys); modify the SENSITIVE_VALUE_RES constant to include these additional regex patterns and ensure they are case-insensitive or global where appropriate so the sanitizer will match and redact these tokens during trace processing (refer to SENSITIVE_VALUE_RES and SENSITIVE_KEY_RE to locate and update the patterns)..github/actions/run-e2e-script/action.yaml (1)
87-87: Document CI execution/build strategy forsanitize-trace-artifacts(future TS migration)
- Right now
.github/actions/run-e2e-script/action.yamlrunsscripts/ci/sanitize-trace-artifacts.jsvianode(and the e2e workflow/test reference the same.js), so a TypeScript rename would require updating the action/workflow/tests to either use the repo’s TS runner (e.g.,tsx) or rely on a compiled.jsoutput.- Add a short note describing the intended CI run/compile strategy to prevent breakage during a future
.js→.tsmigration.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/actions/run-e2e-script/action.yaml at line 87, Add a short documentation note in the repository explaining how CI will run the sanitize-trace-artifacts script so a future .js→.ts rename won't break the workflow: reference the action invocation that currently runs node "$GITHUB_ACTION_PATH/../../../scripts/ci/sanitize-trace-artifacts.js" and state the chosen strategy (either keep calling a compiled .js artifact produced by CI builds or switch the action/workflow/tests to use a TS runner such as tsx in place of node), and include instructions for updating .github/actions/run-e2e-script/action.yaml and any workflows/tests that call sanitize-trace-artifacts.js when the migration happens..github/workflows/nightly-e2e.yaml (2)
633-2925: 🏗️ Heavy liftConsider extracting the trace timing logic to a separate script.
The 2,292-line inline JavaScript block makes this workflow file difficult to maintain and test. The trace analysis logic (semver parsing, artifact download, phase extraction, delta computation) is complex enough to warrant extraction to a dedicated script file (e.g.,
scripts/scorecard/analyze-trace-timing.jsorscripts/ci/trace-timing-report.js).Benefits:
- Easier to unit test the logic directly (similar to
test/sanitize-trace-artifacts.test.ts)- Better code organization and reusability
- Workflow file remains focused on orchestration
- Easier to review and maintain the analysis logic
The current inline approach works but creates a maintenance burden as trace analysis requirements evolve.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/nightly-e2e.yaml around lines 633 - 2925, The scorecard job embeds a very large inline JS trace analysis block (buildTraceTimingResult, readTraceSummaryFromRun, resolvePriorReleaseTag, selectOnboardTrace, extractPhaseDurations, formatTraceDelta, etc.), which makes the workflow hard to maintain; move that logic into a dedicated script (e.g., scripts/scorecard/analyze-trace-timing.js) and have the workflow call the script from the Generate nightly scorecard step (invoke node or run the script via actions/github-script by loading its exported functions), preserving the existing function names/behaviour and error handling so the scorecard step still returns the same traceTimingLine and traceSummaryLines outputs.
2641-2652: 💤 Low valuePhase order knowledge is duplicated.
The
ONBOARD_PHASE_ORDERarray hardcodes the sequence of onboarding phases. If this ordering logic exists elsewhere in the product code (e.g., in the trace generation code), consider documenting the canonical source or extracting this to a shared constant file that both the tracer and the CI script can reference.This prevents drift if phase names or ordering change.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/nightly-e2e.yaml around lines 2641 - 2652, Replace the hardcoded ONBOARD_PHASE_ORDER array with a single source of truth: import the canonical phase ordering constant (e.g., export named constant like onboardPhaseOrder or ONBOARD_PHASE_ORDER) from the product code that generates onboarding traces; if that constant doesn't exist yet, add an exported constant in the product module used for trace generation and update both the tracer code and this CI workflow to import it, or at minimum add an inline comment pointing to the canonical definition (trace generation module/function) so the ordering isn't duplicated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/e2e-script.yaml:
- Around line 145-147: The sparse-checkout currently includes the sanitizer
script entry 'scripts/ci/sanitize-trace-artifacts.js' and the action directory
'.github/actions/run-e2e-script'; when the sanitizer is converted to TypeScript,
update this sparse-checkout entry to point to the correct executable artifact
(for example the compiled JS output under scripts/ci/dist or the TypeScript
source and adjust the execution step to use ts-node) so the sanitize step can
find and run the sanitizer; ensure the workflow step that invokes the sanitizer
(the sanitize step) is also updated to run the chosen target.
In `@test/sanitize-trace-artifacts.test.ts`:
- Around line 18-20: Update the test import to match the sanitizer's
TypeScript/ESM conversion: replace the createRequire(...) + require(...) usage
with a direct ESM import like `import { REDACTED, sanitizeTraceArtifacts } from
'../scripts/ci/sanitize-trace-artifacts.js'` (keep the .js extension even though
the source is .ts); if you must retain CommonJS, ensure the require path uses
the .js extension and adjust module interop accordingly for createRequire,
referencing the symbols REDACTED and sanitizeTraceArtifacts.
---
Outside diff comments:
In `@scripts/ci/sanitize-trace-artifacts.js`:
- Around line 1-104: Rename the file to sanitize-trace-artifacts.ts and convert
it from CommonJS to TypeScript: replace require(...) with TypeScript imports
(e.g. import * as fs from "node:fs"; import * as path from "node:path"), add
explicit type annotations for functions and variables (e.g. redactString(value:
string): string, sanitize(value: unknown, key = ""): unknown,
listJsonFiles(directory: string): string[],
sanitizeTraceArtifacts(sourceDirectory: string, outputDirectory: string): {
files: number; outputDirectory: string }), cast JSON.parse results to
unknown/any before sanitizing, and update exports to ESNamed exports (export {
REDACTED, sanitize, sanitizeTraceArtifacts }); keep the CLI block but ensure
TypeScript accepts require.main by either using if (require.main === module)
with a top-level declare const require: any; or convert to an import.meta.url
check, and adjust process.argv typing as string[] so the script compiles under
tsconfig.
---
Nitpick comments:
In @.github/actions/run-e2e-script/action.yaml:
- Line 87: Add a short documentation note in the repository explaining how CI
will run the sanitize-trace-artifacts script so a future .js→.ts rename won't
break the workflow: reference the action invocation that currently runs node
"$GITHUB_ACTION_PATH/../../../scripts/ci/sanitize-trace-artifacts.js" and state
the chosen strategy (either keep calling a compiled .js artifact produced by CI
builds or switch the action/workflow/tests to use a TS runner such as tsx in
place of node), and include instructions for updating
.github/actions/run-e2e-script/action.yaml and any workflows/tests that call
sanitize-trace-artifacts.js when the migration happens.
In @.github/workflows/nightly-e2e.yaml:
- Around line 633-2925: The scorecard job embeds a very large inline JS trace
analysis block (buildTraceTimingResult, readTraceSummaryFromRun,
resolvePriorReleaseTag, selectOnboardTrace, extractPhaseDurations,
formatTraceDelta, etc.), which makes the workflow hard to maintain; move that
logic into a dedicated script (e.g., scripts/scorecard/analyze-trace-timing.js)
and have the workflow call the script from the Generate nightly scorecard step
(invoke node or run the script via actions/github-script by loading its exported
functions), preserving the existing function names/behaviour and error handling
so the scorecard step still returns the same traceTimingLine and
traceSummaryLines outputs.
- Around line 2641-2652: Replace the hardcoded ONBOARD_PHASE_ORDER array with a
single source of truth: import the canonical phase ordering constant (e.g.,
export named constant like onboardPhaseOrder or ONBOARD_PHASE_ORDER) from the
product code that generates onboarding traces; if that constant doesn't exist
yet, add an exported constant in the product module used for trace generation
and update both the tracer code and this CI workflow to import it, or at minimum
add an inline comment pointing to the canonical definition (trace generation
module/function) so the ordering isn't duplicated.
In `@scripts/ci/sanitize-trace-artifacts.js`:
- Line 73: The JSON.parse call that creates the local variable parsed should be
wrapped in a try-catch to handle malformed JSON; locate the line using
JSON.parse(fs.readFileSync(file, "utf8")) and surround it with a try block,
catch the thrown error, and produce an informative message that includes the
filename (file) and the parser error (e.g., using processLogger.error or
console.error), then either rethrow or skip/continue depending on
sanitizeTraceArtifacts' desired behavior; ensure the catch preserves stack/error
details for debugging.
- Around line 69-71: Add a short security comment above the path-safety check
that documents the contract: explain that the check using
relativePath.startsWith("..") and path.isAbsolute(relativePath) (and the thrown
Error referencing file) intentionally rejects directory traversal and absolute
paths to prevent sanitized traces from being written outside the intended
output/source root; keep it concise and mention the expected guarantee that
resolved paths will remain inside the intended output directory.
- Around line 10-18: SENSITIVE_VALUE_RES currently misses several common
credential/token formats; update the array used alongside SENSITIVE_KEY_RE to
add regexes for AWS access keys (AKIA/ASIA followed by 16 chars) and AWS secret
access key-like base64 strings, generic JWT-looking tokens (three dot-separated
Base64URL segments), Azure/AD tokens (eyJ0eXAi... plus long Base64URL), and
other common cloud prefixes (e.g., GCP service account keys, long hex API keys);
modify the SENSITIVE_VALUE_RES constant to include these additional regex
patterns and ensure they are case-insensitive or global where appropriate so the
sanitizer will match and redact these tokens during trace processing (refer to
SENSITIVE_VALUE_RES and SENSITIVE_KEY_RE to locate and update the patterns).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: bc202847-24b8-4ce2-bb33-5e23d0b0466b
📒 Files selected for processing (10)
.github/actions/run-e2e-script/action.yaml.github/workflows/e2e-script.yaml.github/workflows/nightly-e2e.yamlscripts/ci/sanitize-trace-artifacts.jsscripts/scorecard/build-slack-blocks.tstest/e2e-script-workflow.test.tstest/e2e/README.mdtest/fixtures/sensitive-trace-artifact.jsontest/sanitize-trace-artifacts.test.tstest/scorecard-blocks.test.ts
| sparse-checkout: | | ||
| .github/actions/run-e2e-script | ||
| scripts/ci/sanitize-trace-artifacts.js |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Sparse checkout includes sanitizer script as required.
The sparse-checkout list correctly includes both the action directory and the sanitizer script path, ensuring the sanitizer is available when the sanitize step runs. This matches the coding guideline contract.
When the sanitizer is converted to TypeScript (per the pipeline failure), update this path to reference the compiled output or the TypeScript source with appropriate execution strategy.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/e2e-script.yaml around lines 145 - 147, The
sparse-checkout currently includes the sanitizer script entry
'scripts/ci/sanitize-trace-artifacts.js' and the action directory
'.github/actions/run-e2e-script'; when the sanitizer is converted to TypeScript,
update this sparse-checkout entry to point to the correct executable artifact
(for example the compiled JS output under scripts/ci/dist or the TypeScript
source and adjust the execution step to use ts-node) so the sanitize step can
find and run the sanitizer; ensure the workflow step that invokes the sanitizer
(the sanitize step) is also updated to run the chosen target.
Sources: Coding guidelines, Pipeline failures
|
@amata-human mind addressing the PR feedback comments and CI failures, please? |
Signed-off-by: Angel Mata <amata@nvidia.com>
…le-nemoclaw-e2e-trace-timing-slack-summaries-in-gitlab-ci
Signed-off-by: Angel Mata <amata@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/nightly-e2e.yaml (1)
2772-2823:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftRemove the legacy prior-day
Trend:block from the scorecard.This workflow still computes and emits the older prior-day trend comparison, so the scorecard now carries two different baselines: the old day-over-day
Trend:line and the new prior-release trace timing analysis. That makes the summary internally inconsistent and keeps pushing stale data intotrendLineconsumers.As per coding guidelines, "Remove/avoid older “Trend” comparison logic inside
scorecardand rely on the trace-timing analyzer output instead."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/nightly-e2e.yaml around lines 2772 - 2823, Remove the entire legacy prior-day "Trend:" computation: delete the else branch that contains the try/catch and loop (the block which populates priorRuns and sets trendLine based on priorRun.conclusion) and keep only the selective-dispatch short-circuit; instead initialize trendLine to an empty string (or leave it to the trace-timing analyzer) and do not reference WORKFLOW_FILE/priorRuns or e.message anywhere—i.e., remove the code around the symbols trendLine, isSelectiveDispatch, priorRuns, WORKFLOW_FILE and the try/catch so the scorecard relies solely on the new trace-timing analyzer output.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/nightly-e2e.yaml:
- Around line 2520-2527: The ternary that sets status incorrectly labels runs
with both passed and cancelled jobs as "✅ All requested jobs passed"; update the
conditional logic around the status assignment (the block that computes status
using failed, missingRequested, cancelled, passed, skipped) to detect mixed
outcomes—specifically ensure a branch like "cancelled.length > 0 &&
passed.length > 0" (or a combined condition checking cancelled.length > 0 &&
passed.length > 0 && failed.length === 0) returns a different headline (e.g.,
"⚠️ Some jobs cancelled — partial pass") before the final "All requested jobs
passed" case so mixed pass/cancel runs are not reported as all passed.
---
Outside diff comments:
In @.github/workflows/nightly-e2e.yaml:
- Around line 2772-2823: Remove the entire legacy prior-day "Trend:"
computation: delete the else branch that contains the try/catch and loop (the
block which populates priorRuns and sets trendLine based on priorRun.conclusion)
and keep only the selective-dispatch short-circuit; instead initialize trendLine
to an empty string (or leave it to the trace-timing analyzer) and do not
reference WORKFLOW_FILE/priorRuns or e.message anywhere—i.e., remove the code
around the symbols trendLine, isSelectiveDispatch, priorRuns, WORKFLOW_FILE and
the try/catch so the scorecard relies solely on the new trace-timing analyzer
output.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e021c607-f78e-41e7-9aaa-52096d6adfb3
📒 Files selected for processing (3)
.github/workflows/nightly-e2e.yamlscripts/scorecard/analyze-trace-timing.tstest/e2e-script-workflow.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- test/e2e-script-workflow.test.ts
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/nightly-e2e.yaml (1)
2772-2823:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftRemove the legacy prior-day
Trend:block from the scorecard.This workflow still computes and emits the older prior-day trend comparison, so the scorecard now carries two different baselines: the old day-over-day
Trend:line and the new prior-release trace timing analysis. That makes the summary internally inconsistent and keeps pushing stale data intotrendLineconsumers.As per coding guidelines, "Remove/avoid older “Trend” comparison logic inside
scorecardand rely on the trace-timing analyzer output instead."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/nightly-e2e.yaml around lines 2772 - 2823, Remove the entire legacy prior-day "Trend:" computation: delete the else branch that contains the try/catch and loop (the block which populates priorRuns and sets trendLine based on priorRun.conclusion) and keep only the selective-dispatch short-circuit; instead initialize trendLine to an empty string (or leave it to the trace-timing analyzer) and do not reference WORKFLOW_FILE/priorRuns or e.message anywhere—i.e., remove the code around the symbols trendLine, isSelectiveDispatch, priorRuns, WORKFLOW_FILE and the try/catch so the scorecard relies solely on the new trace-timing analyzer output.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/nightly-e2e.yaml:
- Around line 2520-2527: The ternary that sets status incorrectly labels runs
with both passed and cancelled jobs as "✅ All requested jobs passed"; update the
conditional logic around the status assignment (the block that computes status
using failed, missingRequested, cancelled, passed, skipped) to detect mixed
outcomes—specifically ensure a branch like "cancelled.length > 0 &&
passed.length > 0" (or a combined condition checking cancelled.length > 0 &&
passed.length > 0 && failed.length === 0) returns a different headline (e.g.,
"⚠️ Some jobs cancelled — partial pass") before the final "All requested jobs
passed" case so mixed pass/cancel runs are not reported as all passed.
---
Outside diff comments:
In @.github/workflows/nightly-e2e.yaml:
- Around line 2772-2823: Remove the entire legacy prior-day "Trend:"
computation: delete the else branch that contains the try/catch and loop (the
block which populates priorRuns and sets trendLine based on priorRun.conclusion)
and keep only the selective-dispatch short-circuit; instead initialize trendLine
to an empty string (or leave it to the trace-timing analyzer) and do not
reference WORKFLOW_FILE/priorRuns or e.message anywhere—i.e., remove the code
around the symbols trendLine, isSelectiveDispatch, priorRuns, WORKFLOW_FILE and
the try/catch so the scorecard relies solely on the new trace-timing analyzer
output.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e021c607-f78e-41e7-9aaa-52096d6adfb3
📒 Files selected for processing (3)
.github/workflows/nightly-e2e.yamlscripts/scorecard/analyze-trace-timing.tstest/e2e-script-workflow.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- test/e2e-script-workflow.test.ts
🛑 Comments failed to post (1)
.github/workflows/nightly-e2e.yaml (1)
2520-2527:
⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDon't label mixed pass/cancel runs as “all passed.”
If one requested job passes and another is cancelled, this branch still emits
✅ All requested jobs passed. That headline contradicts the summary you added on Line 2536 and hides the fact that the run produced incomplete signal.Suggested adjustment
const status = failed.length > 0 || missingRequested.length > 0 ? '❌ Some jobs failed' - : cancelled.length > 0 && passed.length === 0 - ? '⚠️ Run cancelled — no signal' + : cancelled.length > 0 + ? passed.length === 0 + ? '⚠️ Run cancelled — no signal' + : '⚠️ Some jobs were cancelled' : skipped.length > 0 && passed.length === 0 ? '⚠️ No requested jobs ran' : '✅ All requested jobs passed';Also applies to: 2536-2536
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/nightly-e2e.yaml around lines 2520 - 2527, The ternary that sets status incorrectly labels runs with both passed and cancelled jobs as "✅ All requested jobs passed"; update the conditional logic around the status assignment (the block that computes status using failed, missingRequested, cancelled, passed, skipped) to detect mixed outcomes—specifically ensure a branch like "cancelled.length > 0 && passed.length > 0" (or a combined condition checking cancelled.length > 0 && passed.length > 0 && failed.length === 0) returns a different headline (e.g., "⚠️ Some jobs cancelled — partial pass") before the final "All requested jobs passed" case so mixed pass/cancel runs are not reported as all passed.
Signed-off-by: Angel Mata <amata@nvidia.com>
This comment was marked as outdated.
This comment was marked as outdated.
Selective E2E Results — ✅ All requested jobs passedRun: 27374885261
|
…iming-slack-summaries-in-gitlab-ci
Signed-off-by: Angel Mata <amata@nvidia.com>
…iming-slack-summaries-in-gitlab-ci
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Selective E2E Results — ✅ All requested jobs passedRun: 27377814019
|
…iming-slack-summaries-in-gitlab-ci
This comment was marked as outdated.
This comment was marked as outdated.
…iming-slack-summaries-in-gitlab-ci
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
…iming-slack-summaries-in-gitlab-ci
|
✨ |
…iming-slack-summaries-in-gitlab-ci
…iming-slack-summaries-in-gitlab-ci
…iming-slack-summaries-in-gitlab-ci
Only upload trace artifacts after trusted sanitization succeeds. Guard against target-created output symlinks. Signed-off-by: Angel Mata <amata@nvidia.com>
Limit trace timing uploads and parsing to trusted summaries. Add regression coverage for trace redaction behavior. Signed-off-by: Angel Mata <amata@nvidia.com>
Upload only sanitizer-created trace summary files. Cover trace timing fallback and symlink cases. Signed-off-by: Angel Mata <amata@nvidia.com>
Terminate target-ref background processes before trusted trace upload. Signed-off-by: Angel Mata <amata@nvidia.com>
…ort, function or class' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…iming-slack-summaries-in-gitlab-ci
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Summary
Adds nightly
cloud-onboard-e2etrace collection, sanitized artifact upload, and Slack/GitHub scorecard timing summaries so maintainers can spot onboard timing regressions without exposing raw trace secrets.Related Issue
Closes #5090
Changes
NEMOCLAW_TRACE_DIRfor nightlycloud-onboard-e2eand uploads sanitized trace artifacts ascloud-onboard-traces.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesnpm run docsbuilds without warnings (doc changes only)Signed-off-by: Angel Mata amata@nvidia.com
Summary by CodeRabbit
New Features
Documentation
Security
Tests