Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions actions/setup/js/generate_usage_activity_summary.cjs
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,12 @@ function parseFirewallLogs() {
continue;
}

// Skip non-Squid diagnostic lines (WARNING:, DNS, Accepting, etc.) by
// validating that the first field is a numeric Unix timestamp.
if (!/^\d+(\.\d+)?$/.test(parts[0])) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timestamp regex accepts integer-only first fields (e.g. 12345) while the PR description explicitly states Squid timestamps are always decimal (e.g. 1761332530.474).

💡 Details

The current regex /^\d+(\.\d+)?$/ matches both:

  • 1761332530.474 ✓ (valid Squid timestamp)
  • 12345 ✓ (integer — would also pass, contrary to stated invariant)

If the codebase invariant is that Squid access logs always emit a decimal timestamp, the guard should enforce that:

if (!/^\d+\.\d+$/.test(parts[0])) {
  continue;
}

The tighter regex removes the optional group and requires the decimal point, preventing any non-Squid line whose first token happens to be a bare integer from being processed. It also makes the guard self-documenting: the code now asserts the expected format rather than a looser approximation of it.

This is low risk in practice — bare integer first tokens in diagnostic lines are unlikely — but aligning the regex with the stated invariant closes a small gap and matches the description in the PR body.

continue;
}

const domain = parts[SQUID_DOMAIN_INDEX];
const dest = parts[SQUID_DEST_INDEX];
const client = parts[SQUID_CLIENT_INDEX] || "";
Expand Down
84 changes: 84 additions & 0 deletions actions/setup/js/generate_usage_activity_summary.test.cjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
import { afterEach, beforeEach, describe, expect, it } from "vitest";
import fs from "fs";
import path from "path";
import { createRequire } from "module";
import { fileURLToPath } from "url";

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
Comment on lines +7 to +8

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] __filename and __dirname are declared but never referenced in this file — dead code.

createRequire(import.meta.url) doesn't need them, and no other line in the test uses them. Other test files that use the same createRequire pattern (e.g. action_input_utils.test.cjs) omit these declarations.

💡 Suggested change

Remove lines 5, 7, and 8:

-import { fileURLToPath } from "url";
-
-const __filename = fileURLToPath(import.meta.url);
-const __dirname = path.dirname(__filename);


const req = createRequire(import.meta.url);
const { parseFirewallLogs } = req("./generate_usage_activity_summary.cjs");

describe("generate_usage_activity_summary.cjs", () => {
/** Unique directory for each test to avoid cross-test interference */
let squidLogDir;

beforeEach(() => {
squidLogDir = path.join("/tmp/gh-aw", `squid-logs-unit-test-${Date.now()}`);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Date.now() for unique temp-dir names is theoretically flaky: two test files running at the same millisecond would collide on the directory name.

💡 Prefer fs.mkdtempSync
squidLogDir = fs.mkdtempSync("/tmp/gh-aw/squid-logs-unit-test-");

mkdtempSync atomically creates a uniquely-suffixed directory, which is both safer and more idiomatic for temp-dir creation.

fs.mkdirSync(squidLogDir, { recursive: true });
});

afterEach(() => {
if (fs.existsSync(squidLogDir)) {
fs.rmSync(squidLogDir, { recursive: true, force: true });
}
});

describe("parseFirewallLogs", () => {
it("skips Squid diagnostic lines (WARNING:, DNS, Accepting) and does not treat them as domain names", () => {
const logContent = [
// Squid startup/diagnostic messages that should be skipped
'WARNING: 172.30.0.20:35288 api.github.com:443 140.82.112.22:443 1.1 CONNECT 200 TCP_TUNNEL:HIER_DIRECT api.github.com:443 "-"',
'DNS 172.30.0.20:35288 api.github.com:443 140.82.112.22:443 1.1 CONNECT 200 TCP_TUNNEL:HIER_DIRECT api.github.com:443 "-"',
'Accepting 172.30.0.20:35288 api.github.com:443 140.82.112.22:443 1.1 CONNECT 200 TCP_TUNNEL:HIER_DIRECT api.github.com:443 "-"',
// A valid access log entry that should be counted
'1761332530.474 172.30.0.20:35288 api.github.com:443 140.82.112.22:443 1.1 CONNECT 200 TCP_TUNNEL:HIER_DIRECT api.github.com:443 "-"',
].join("\n");

fs.writeFileSync(path.join(squidLogDir, "access.log"), logContent);

const result = parseFirewallLogs();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test assertions on total_requests will fail non-deterministically if any /tmp/gh-aw/squid-logs-* directory contains stale or concurrent log files.

💡 Details and suggested fix

parseFirewallLogs() unconditionally scans every file matching /tmp/gh-aw/squid-logs-*/**/*.log. The unit tests work around this by writing into a directory whose name matches that glob (squid-logs-unit-test-${Date.now()}). This design has two failure modes:

  1. Residual production data: If any agent job leaves behind a squid-logs-* directory — even from a previous failed run — the function picks up those log lines and inflates total_requests past the expected count, causing toBe(1) / toBe(2) to fail spuriously.

  2. No injection point: The function has hardcoded paths with no way to accept alternate directories, so tests cannot be truly isolated without depending on global filesystem state.

The root fix is to give parseFirewallLogs an optional logPaths parameter defaulting to the production paths, so tests can pass controlled inputs:

// source
function parseFirewallLogs(logPaths) {
  const firewallPaths = logPaths ?? [
    "/tmp/gh-aw/sandbox/firewall/logs/**/*.log",
    // ...
  ];
}

// test
const result = parseFirewallLogs([path.join(squidLogDir, "**/*.log")]);

With this change the tests stop depending on ambient global state and the exact-count assertions become reliable.


expect(result).not.toBeNull();
expect(result.total_requests).toBe(1);
expect(result.allowed_domains).toContain("api.github.com:443");
// Diagnostic keywords must not appear as domain names
expect(result.allowed_domains).not.toContain("WARNING:");
expect(result.allowed_domains).not.toContain("DNS");
expect(result.allowed_domains).not.toContain("Accepting");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The not.toContain("WARNING:") / not.toContain("DNS") / not.toContain("Accepting") assertions never fail, even without the fix.

In every diagnostic line used in this test, parts[2] (the domain field) resolves to api.github.com:443 — not the keyword at parts[0]. These three assertions were always-true before the patch too. The real regression guard is total_requests === 1 on line 44.

💡 Suggested replacement

Replace the three not.toContain calls with assertions that would actually fail if the timestamp guard were removed:

// The 3 diagnostic lines must not inflate the request count
expect(result.allowed_requests).toBe(1);
expect(result.blocked_requests).toBe(0);
// allowed_domains should have exactly one entry (not four)
expect(result.allowed_domains).toHaveLength(1);

});

it("returns null when only non-Squid diagnostic lines are present", () => {
const logContent = [
'WARNING: 172.30.0.20:35288 api.github.com:443 140.82.112.22:443 1.1 CONNECT 200 TCP_TUNNEL:HIER_DIRECT api.github.com:443 "-"',
"DNS resolver ready - some extra fields here to pass length check x y z",
"Accepting connections on port 3128 x y z",
].join("\n");

fs.writeFileSync(path.join(squidLogDir, "access.log"), logContent);

const result = parseFirewallLogs();

expect(result).toBeNull();
});

it("counts valid Squid access log entries correctly", () => {
const logContent = [
'1761332530.474 172.30.0.20:35288 api.github.com:443 140.82.112.22:443 1.1 CONNECT 200 TCP_TUNNEL:HIER_DIRECT api.github.com:443 "-"',
'1761332531.000 172.30.0.20:35289 blocked.example.com:443 1.2.3.4:443 1.1 CONNECT 403 NONE_NONE:HIER_NONE blocked.example.com:443 "-"',
].join("\n");

fs.writeFileSync(path.join(squidLogDir, "access.log"), logContent);

const result = parseFirewallLogs();

expect(result).not.toBeNull();
expect(result.total_requests).toBe(2);
expect(result.allowed_requests).toBe(1);
expect(result.blocked_requests).toBe(1);
expect(result.allowed_domains).toContain("api.github.com:443");
expect(result.blocked_domains).toContain("blocked.example.com:443");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] requests_by_domain is populated by the function but never asserted in any test case. It's the per-domain breakdown that callers may rely on.

💡 Suggested addition to the third test
expect(result.requests_by_domain["api.github.com:443"]).toEqual({ allowed: 1, blocked: 0 });
expect(result.requests_by_domain["blocked.example.com:443"]).toEqual({ allowed: 0, blocked: 1 });

This locks in the per-domain shape alongside the aggregate counters.

});
});
});
Loading