fix(firewall): skip non-Squid diagnostic lines in generate_usage_activity_summary#41429
Conversation
…enerate_usage_activity_summary Squid diagnostic/startup messages (WARNING:, DNS, Accepting, etc.) can appear in access log files with 8+ whitespace-separated fields, bypassing the existing `parts.length < 8` guard. Their third field was then mistakenly treated as a domain name. Add a `/^\d+(\.\d+)?$/` numeric timestamp check — identical to the guard already present in parse_firewall_logs.cjs — so that only genuine Squid access log entries are counted. Closes #41398 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR hardens the firewall usage-summary generator against Squid diagnostic/startup lines that can appear in access logs, preventing those lines from being mis-parsed as real requests and inflating domain stats.
Changes:
- Added a numeric Unix timestamp guard in
parseFirewallLogs()so only real Squid access log entries are counted. - Added a new Vitest unit test file covering diagnostic-line skipping, all-diagnostic logs returning
null, and valid allow/block counting.
Show a summary per file
| File | Description |
|---|---|
| actions/setup/js/generate_usage_activity_summary.cjs | Skip non-access-log diagnostic lines by requiring a numeric epoch timestamp in the first field. |
| actions/setup/js/generate_usage_activity_summary.test.cjs | New unit tests validating the new guard and preserving correct parsing for real entries. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 1
| const __filename = fileURLToPath(import.meta.url); | ||
| const __dirname = path.dirname(__filename); |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #41429 does not have the 'implementation' label and has 0 new lines of code in business logic directories (threshold: 100). |
|
✅ Test Quality Sentinel completed test quality analysis. Test Quality Sentinel analysis for PR #41429 was already completed in the prior workflow invocation (same run ID 28170581267). Comment posted (90/100, Excellent) and APPROVE review submitted. Both safe-output limits (add_comment: 1/1, submit_pull_request_review: 1/1) are exhausted — confirming all actions were applied. |
|
✅ PR Code Quality Reviewer completed the code quality review. |
🧪 Test Quality Sentinel Report✅ Test Quality Score: 90/100 — Excellent
📊 Metrics & Test Classification (3 tests analyzed)
Go: 0 ( Verdict
|
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /tdd and /diagnose — approving with minor test-quality suggestions.
📋 Key Themes & Highlights
Key Themes
- Test assertions don't fully lock in the regression: the three
not.toContain("WARNING:")etc. calls in test 1 are always-true regardless of the fix; the real guard istotal_requests === 1. - Small dead-code smell: unused
__filename/__dirnameimports in the test file. - Minor flakiness risk:
Date.now()for temp-dir uniqueness; preferfs.mkdtempSync. - Coverage gap:
requests_by_domainis never asserted across any test case.
Positive Highlights
- ✅ Fix is minimal, surgical, and mirrors the identical guard already in
parse_firewall_logs.cjs— good consistency. - ✅ Three focused tests cover the three scenarios that matter: all-diagnostic, mixed, and valid-only.
- ✅
beforeEach/afterEachlifecycle with isolated temp directories — correct approach. - ✅ The core regression is caught by
total_requests === 1, even if surrounding assertions can be tightened.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 78 AIC · ⌖ 7.18 AIC · ⊞ 6.5K
| // Diagnostic keywords must not appear as domain names | ||
| expect(result.allowed_domains).not.toContain("WARNING:"); | ||
| expect(result.allowed_domains).not.toContain("DNS"); | ||
| expect(result.allowed_domains).not.toContain("Accepting"); |
There was a problem hiding this comment.
[/tdd] The not.toContain("WARNING:") / not.toContain("DNS") / not.toContain("Accepting") assertions never fail, even without the fix.
In every diagnostic line used in this test, parts[2] (the domain field) resolves to api.github.com:443 — not the keyword at parts[0]. These three assertions were always-true before the patch too. The real regression guard is total_requests === 1 on line 44.
💡 Suggested replacement
Replace the three not.toContain calls with assertions that would actually fail if the timestamp guard were removed:
// The 3 diagnostic lines must not inflate the request count
expect(result.allowed_requests).toBe(1);
expect(result.blocked_requests).toBe(0);
// allowed_domains should have exactly one entry (not four)
expect(result.allowed_domains).toHaveLength(1);| import { fileURLToPath } from "url"; | ||
|
|
||
| const __filename = fileURLToPath(import.meta.url); | ||
| const __dirname = path.dirname(__filename); |
There was a problem hiding this comment.
[/tdd] __filename and __dirname are declared but never referenced in this file — dead code.
createRequire(import.meta.url) doesn't need them, and no other line in the test uses them. Other test files that use the same createRequire pattern (e.g. action_input_utils.test.cjs) omit these declarations.
💡 Suggested change
Remove lines 5, 7, and 8:
-import { fileURLToPath } from "url";
-
-const __filename = fileURLToPath(import.meta.url);
-const __dirname = path.dirname(__filename);| let squidLogDir; | ||
|
|
||
| beforeEach(() => { | ||
| squidLogDir = path.join("/tmp/gh-aw", `squid-logs-unit-test-${Date.now()}`); |
There was a problem hiding this comment.
[/tdd] Date.now() for unique temp-dir names is theoretically flaky: two test files running at the same millisecond would collide on the directory name.
💡 Prefer fs.mkdtempSync
squidLogDir = fs.mkdtempSync("/tmp/gh-aw/squid-logs-unit-test-");mkdtempSync atomically creates a uniquely-suffixed directory, which is both safer and more idiomatic for temp-dir creation.
| expect(result.allowed_requests).toBe(1); | ||
| expect(result.blocked_requests).toBe(1); | ||
| expect(result.allowed_domains).toContain("api.github.com:443"); | ||
| expect(result.blocked_domains).toContain("blocked.example.com:443"); |
There was a problem hiding this comment.
[/tdd] requests_by_domain is populated by the function but never asserted in any test case. It's the per-domain breakdown that callers may rely on.
💡 Suggested addition to the third test
expect(result.requests_by_domain["api.github.com:443"]).toEqual({ allowed: 1, blocked: 0 });
expect(result.requests_by_domain["blocked.example.com:443"]).toEqual({ allowed: 0, blocked: 1 });This locks in the per-domain shape alongside the aggregate counters.
There was a problem hiding this comment.
REQUEST_CHANGES — fix is correct, tests need hardening
The timestamp guard in parseFirewallLogs() is the right fix and correctly implements the invariant described in the PR. Two issues need to be addressed before merge:
-
[HIGH] Test isolation: the tests write into
/tmp/gh-aw/squid-logs-unit-test-*/andparseFirewallLogs()scans all/tmp/gh-aw/squid-logs-*paths — any residual directory (stale CI artifacts, parallel jobs) will contaminate exact-count assertions. The function needs an injectablelogPathsparameter so tests can scope to a controlled path. -
[MEDIUM] Regex is under-strict:
/^\d+(\.\d+)?$/allows integer-only first fields while the PR explicitly states the invariant is a decimal epoch. Tighten to/^\d+\.\d+$/to match the stated format and make the intent unambiguous.
🔎 Code quality review by PR Code Quality Reviewer · 81.1 AIC · ⌖ 7.66 AIC · ⊞ 5.2K
|
|
||
| fs.writeFileSync(path.join(squidLogDir, "access.log"), logContent); | ||
|
|
||
| const result = parseFirewallLogs(); |
There was a problem hiding this comment.
Test assertions on total_requests will fail non-deterministically if any /tmp/gh-aw/squid-logs-* directory contains stale or concurrent log files.
💡 Details and suggested fix
parseFirewallLogs() unconditionally scans every file matching /tmp/gh-aw/squid-logs-*/**/*.log. The unit tests work around this by writing into a directory whose name matches that glob (squid-logs-unit-test-${Date.now()}). This design has two failure modes:
-
Residual production data: If any agent job leaves behind a
squid-logs-*directory — even from a previous failed run — the function picks up those log lines and inflatestotal_requestspast the expected count, causingtoBe(1)/toBe(2)to fail spuriously. -
No injection point: The function has hardcoded paths with no way to accept alternate directories, so tests cannot be truly isolated without depending on global filesystem state.
The root fix is to give parseFirewallLogs an optional logPaths parameter defaulting to the production paths, so tests can pass controlled inputs:
// source
function parseFirewallLogs(logPaths) {
const firewallPaths = logPaths ?? [
"/tmp/gh-aw/sandbox/firewall/logs/**/*.log",
// ...
];
}
// test
const result = parseFirewallLogs([path.join(squidLogDir, "**/*.log")]);With this change the tests stop depending on ambient global state and the exact-count assertions become reliable.
|
|
||
| // Skip non-Squid diagnostic lines (WARNING:, DNS, Accepting, etc.) by | ||
| // validating that the first field is a numeric Unix timestamp. | ||
| if (!/^\d+(\.\d+)?$/.test(parts[0])) { |
There was a problem hiding this comment.
The timestamp regex accepts integer-only first fields (e.g. 12345) while the PR description explicitly states Squid timestamps are always decimal (e.g. 1761332530.474).
💡 Details
The current regex /^\d+(\.\d+)?$/ matches both:
1761332530.474✓ (valid Squid timestamp)12345✓ (integer — would also pass, contrary to stated invariant)
If the codebase invariant is that Squid access logs always emit a decimal timestamp, the guard should enforce that:
if (!/^\d+\.\d+$/.test(parts[0])) {
continue;
}The tighter regex removes the optional group and requires the decimal point, preventing any non-Squid line whose first token happens to be a bare integer from being processed. It also makes the guard self-documenting: the code now asserts the expected format rather than a looser approximation of it.
This is low risk in practice — bare integer first tokens in diagnostic lines are unlikely — but aligning the regex with the stated invariant closes a small gap and matches the description in the PR body.
|
@copilot run pr-finisher skill |
Squid writes diagnostic/startup messages (
WARNING:,DNS,Accepting, etc.) into access log files. These lines have ≥8 whitespace-separated fields, bypassing the existing length guard, causing their third field to be misidentified as a domain name and inflating firewall stats.Changes
generate_usage_activity_summary.cjs— add a numeric Unix timestamp guard inparseFirewallLogs()after the field-count check, matching the pattern already used inparse_firewall_logs.cjs:Genuine Squid access log entries always begin with a decimal epoch timestamp (e.g.
1761332530.474); diagnostic lines do not.generate_usage_activity_summary.test.cjs(new) — tests covering: diagnostic keywords not recorded as domains, all-diagnostic log returnsnull, valid entries still counted correctly.