[code-scanning-fix] Fix js/http-to-file-access: validate Content-Type and size for LFS PDF download by github-actions[bot] · Pull Request #41635 · github/gh-aw

github-actions · 2026-06-26T08:16:06Z

Generated by PR Description Updater for #41635 · 33.9 AIC · ⌖ 7.62 AIC · ⊞ 4.6K · ◷

Add response Content-Type validation and a 50 MB size limit to the network-to-file path in ensure-docs-slide-pdf.js, addressing the js/http-to-file-access CodeQL alert (#636). Changes: - Reject responses whose Content-Type is not application/pdf or application/octet-stream before reading the body. - Enforce a 50 MB ceiling via both the Content-Length header (early rejection) and the actual downloaded byte count (double-check). - The existing PDF magic-bytes check (isPdf) is preserved as a third layer of validation. - All new checks throw errors that are caught by the surrounding try/catch block, so the script still falls back to the placeholder PDF rather than crashing in sandboxed environments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-26T10:09:34Z

Nice defence-in-depth fix for the LFS slide deck download path 🔒 — layering Content-Type validation, a pre-download Content-Length cap, and a post-read byte-count check on top of the existing magic-bytes guard is exactly the right approach for this kind of network-to-disk flow.

One thing that would round this out before merge:

Add tests — the three new guard branches (content-type rejection, content-length pre-rejection, post-read size check) aren’t exercised by any test file yet. Covering each failure path and the happy path would give reviewers confidence and protect against regressions.

If you’d like to generate that coverage, assign this prompt to your coding agent:

Add unit tests for the new validation logic in scripts/ensure-docs-slide-pdf.js.

The readPdfBytes function now has three guard conditions to cover:
1. Content-Type rejection — mock fetch returning `Content-Type: text/html`; should throw an error containing "Unexpected content-type" and must NOT write to disk.
2. Content-Length pre-rejection — mock response with `Content-Length` header set to a value > 52428800 (50 MB); should throw before the body is consumed.
3. Post-download size check — mock response whose arrayBuffer() resolves to a buffer larger than 50 MB; should throw after the body is read.
4. Happy path — valid `Content-Type: application/pdf`, Content-Length within limit, body that passes isPdf() magic-bytes check; should write the file normally.

Use the project’s existing test framework (Jest or equivalent) and mock the global fetch. Place tests in scripts/ensure-docs-slide-pdf.test.js.

Generated by ✅ Contribution Check · 217.8 AIC · ⌖ 19 AIC · ⊞ 6K · ◷

github-actions · 2026-06-26T11:33:12Z

✅ Test Quality Sentinel completed test quality analysis.

No test files were added or modified in this PR. The only changed file is scripts/ensure-docs-slide-pdf.js (a production script). Test Quality Sentinel skipped.

github-actions · 2026-06-26T11:33:14Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions · 2026-06-26T11:33:20Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR does not have the 'implementation' label and has ≤100 new lines of code in business logic directories (default_business_additions=0).

github-actions · 2026-06-26T11:33:32Z

✅ PR Code Quality Reviewer completed the code quality review.

Copilot

Pull request overview

This PR addresses CodeQL alert js/http-to-file-access in scripts/ensure-docs-slide-pdf.js by adding additional validation gates before writing a network-fetched slide-deck PDF to docs/public/….

Changes:

Added Content-Type validation for the LFS download response.
Added a max-size guard using Content-Length plus a post-download size check.
Preserved the existing PDF magic-bytes check (%PDF-) and placeholder fallback behavior.

Show a summary per file

File	Description
`scripts/ensure-docs-slide-pdf.js`	Adds header and size validations intended to reduce the risk of writing unexpected network data to disk when resolving the slide deck PDF.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 1/1 changed files
Comments generated: 2
Review effort level: Low

+    // Validate the Content-Type header before consuming the body to ensure we
+    // are actually receiving a PDF and not arbitrary data.
+    const contentType = response.headers.get("content-type") ?? "";
+    if (!contentType.startsWith("application/pdf") && !contentType.startsWith("application/octet-stream")) {
+      throw new Error(`Unexpected content-type for slide deck: ${contentType}`);
+    }


+    // Guard against unexpectedly large downloads.
+    const MAX_BYTES = 50 * 1024 * 1024; // 50 MB
+    const contentLength = response.headers.get("content-length");
+    if (contentLength !== null && Number(contentLength) > MAX_BYTES) {
+      throw new Error(`Slide deck download size ${contentLength} exceeds limit of ${MAX_BYTES} bytes`);
+    }
+
    const downloadedBytes = Buffer.from(await response.arrayBuffer());
+    if (downloadedBytes.length > MAX_BYTES) {
+      throw new Error(`Downloaded slide deck size ${downloadedBytes.length} exceeds limit of ${MAX_BYTES} bytes`);
+    }


github-actions

Skills-Based Review 🧠

Applied /diagnose and /tdd — commenting with improvements, no blocking issues.

📋 Key Themes & Highlights

Key Themes

Test coverage gap: Three new validation branches have no unit tests, despite other scripts in this repo following a .test.js pattern
NaN edge case: Number(contentLength) > MAX_BYTES silently bypasses the Content-Length guard when the header is non-numeric (mitigated by the post-download check)
Loose MIME matching: startsWith("application/pdf") is slightly imprecise; parsing split(";")[0].trim() is the idiomatic approach

Positive Highlights

✅ Excellent defense-in-depth: three independent validation layers (header, Content-Length, post-download byte count, magic bytes)
✅ All new error paths are caught by the existing try/catch, so failures still produce the placeholder PDF rather than a crash
✅ Early Content-Length rejection avoids downloading the body before failing — good resource hygiene
✅ PR description is thorough and self-documenting; the security rationale is clear

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 46.2 AIC · ⌖ 8.27 AIC · ⊞ 6.5K

github-actions · 2026-06-26T11:40:32Z


+    // Validate the Content-Type header before consuming the body to ensure we
+    // are actually receiving a PDF and not arbitrary data.
+    const contentType = response.headers.get("content-type") ?? "";


[/tdd] The three new validation paths (Content-Type mismatch, oversized Content-Length, oversized post-download body) each throw distinct errors, but no regression tests cover them. The project has a .test.js pattern for scripts (changeset.test.js, generate-schema-docs.test.js) that a companion ensure-docs-slide-pdf.test.js could follow.

💡 Sketch of missing test cases

The core function would need to export readPdfBytes (or a testable inner function), then:

// Test: unexpected Content-Type falls back to placeholder, not crash globalThis.fetch = async () => ({ ok: true, headers: { get: (k) => k === 'content-type' ? 'text/html' : null }, arrayBuffer: async () => new ArrayBuffer(0), }); // assert: returns placeholder PDF bytes, console.warn called // Test: Content-Length > 50 MB falls back to placeholder globalThis.fetch = async () => ({ ok: true, headers: { get: (k) => k === 'content-type' ? 'application/pdf' : k === 'content-length' ? String(100 * 1024 * 1024) : null }, arrayBuffer: async () => new ArrayBuffer(0), }); // assert: returns placeholder PDF bytes

These tests confirm that each guard correctly triggers the fallback rather than writing invalid data.

github-actions · 2026-06-26T11:40:32Z

+    // Guard against unexpectedly large downloads.
+    const MAX_BYTES = 50 * 1024 * 1024; // 50 MB
+    const contentLength = response.headers.get("content-length");
+    if (contentLength !== null && Number(contentLength) > MAX_BYTES) {


[/diagnose] Number(contentLength) returns NaN when the header value is non-numeric, and NaN > MAX_BYTES evaluates to false, silently bypassing this guard. A malformed or crafted Content-Length header would slip past without triggering an error or warning. The post-download byte-count check on line 130 still catches oversized payloads, so the risk is mitigated — but the Content-Length early-rejection is weakened.

💡 Suggested fix

const parsedLength = contentLength !== null ? Number(contentLength) : NaN; if (Number.isFinite(parsedLength) && parsedLength > MAX_BYTES) { throw new Error(`Slide deck download size ${parsedLength} exceeds limit of ${MAX_BYTES} bytes`); }

Using Number.isFinite() ensures non-numeric headers are treated as "unknown size" (safe fallback to the post-download check) rather than silently ignored as if the value were 0.

github-actions · 2026-06-26T11:40:32Z

+    // Validate the Content-Type header before consuming the body to ensure we
+    // are actually receiving a PDF and not arbitrary data.
+    const contentType = response.headers.get("content-type") ?? "";
+    if (!contentType.startsWith("application/pdf") && !contentType.startsWith("application/octet-stream")) {


[/diagnose] startsWith("application/pdf") will also match hypothetical types like application/pdf-exploit or application/pdfx. In practice this isn't exploitable since a real server wouldn't return such types, but parsing the MIME type properly (stripping parameters) is the more defensive and idiomatic approach.

💡 Suggested fix

const mimeType = (response.headers.get("content-type") ?? "").split(";")[0].trim(); if (mimeType !== "application/pdf" && mimeType !== "application/octet-stream") { throw new Error(`Unexpected content-type for slide deck: ${mimeType}`); }

This correctly handles application/pdf; charset=binary (strips params) and prevents any unintended prefix matches.

github-actions

REQUEST_CHANGES — two correctness bugs in the new validation code

The security-fix direction is sound, but two of the three new guards have implementation defects that undermine the protection they are meant to provide.

🔍 Blocking issues

NaN bypass in Content-Length pre-check (line 125) — Number("abc") is NaN; NaN > MAX_BYTES is false, so any malformed Content-Length silently skips the early rejection. The post-download check still exists, but only after the full body is buffered.
Full response body buffered before size check (line 129) — response.arrayBuffer() reads the entire response into memory before the check on line 130 fires. A server omitting or lying about Content-Length can exhaust heap before the guard throws, potentially OOM-crashing the CI runner. The fix requires streaming the response with a running byte count.

i️ Non-blocking observations

application/octet-stream acceptance (line 118) reduces the Content-Type gate to filtering non-binary MIME types only; the magic-bytes check on line 134 is already doing the real validation work. If LFS media always returns application/octet-stream, document that so the layering isn't misleading.

🔎 Code quality review by PR Code Quality Reviewer · 70.7 AIC · ⌖ 6.89 AIC · ⊞ 5.2K

github-actions · 2026-06-26T11:43:01Z

+    // Guard against unexpectedly large downloads.
+    const MAX_BYTES = 50 * 1024 * 1024; // 50 MB
+    const contentLength = response.headers.get("content-length");
+    if (contentLength !== null && Number(contentLength) > MAX_BYTES) {


Content-Length guard silently skipped for malformed header values: Number("abc") returns NaN, and NaN > MAX_BYTES is false, so any non-numeric Content-Length bypasses the pre-download size check without throwing.

💡 Suggested fix

Replace with an explicit finite-number check:

const contentLengthNum = contentLength !== null ? Number(contentLength) : null; if (contentLengthNum !== null && (!Number.isFinite(contentLengthNum) || contentLengthNum > MAX_BYTES)) { throw new Error(`Slide deck download size ${contentLength} exceeds limit of ${MAX_BYTES} bytes`); }

This rejects both non-numeric headers and values exceeding the cap. Without this, a server returning Content-Length: chunked or any non-numeric token silently skips the guard — the post-download check on line 130 still fires, but only after the full body is already buffered in memory.

github-actions · 2026-06-26T11:43:01Z

+      throw new Error(`Slide deck download size ${contentLength} exceeds limit of ${MAX_BYTES} bytes`);
+    }
+
    const downloadedBytes = Buffer.from(await response.arrayBuffer());


50 MB guard fires only after the full body is already in memory: response.arrayBuffer() buffers the entire response synchronously before the size check on line 130 runs. A server that omits Content-Length (bypassing the header pre-check) or provides a malformed value can stream an unbounded body and exhaust the process's heap before the guard throws — potentially OOM-crashing the CI runner.

💡 Suggested fix

Stream the response and count bytes as chunks arrive:

const chunks = []; let totalBytes = 0; const reader = response.body.getReader(); while (true) { const { done, value } = await reader.read(); if (done) break; totalBytes += value.length; if (totalBytes > MAX_BYTES) { await reader.cancel(); throw new Error(`Downloaded slide deck size exceeds limit of ${MAX_BYTES} bytes`); } chunks.push(value); } const downloadedBytes = Buffer.concat(chunks.map(c => Buffer.from(c)));

This caps memory usage at MAX_BYTES regardless of whether Content-Length is absent, lying, or malformed, which is the actual intent of the guard comment on line 122.

github-actions · 2026-06-26T11:43:01Z

+    // Validate the Content-Type header before consuming the body to ensure we
+    // are actually receiving a PDF and not arbitrary data.
+    const contentType = response.headers.get("content-type") ?? "";
+    if (!contentType.startsWith("application/pdf") && !contentType.startsWith("application/octet-stream")) {


application/octet-stream allowance largely neutralises the Content-Type gate: any arbitrary binary file — executables, archives, scripts — arrives with application/octet-stream, so the check filters only responses with a non-binary MIME type (e.g., text/html). The isPdf() magic-bytes check on line 134 is therefore the only meaningful content-type guard. The comment above this block states the intent is to ensure we are receiving a PDF, which application/octet-stream does not satisfy.

💡 Notes

This may be intentional if media.githubusercontent.com consistently serves LFS objects as application/octet-stream. If so, document that explicitly with a comment; otherwise the combination of header validation + isPdf() check creates a false sense of defence-in-depth since the header gate adds no additional content-type assurance over what the magic-bytes check already provides.

github-actions Bot added agentic-campaign automated-fix security z_campaign_security-alert-burndown labels Jun 26, 2026

github-actions Bot mentioned this pull request Jun 26, 2026

[Contribution Check Report] Contribution Check — 2026-06-26 #41648

Open

pelikhan marked this pull request as ready for review June 26, 2026 11:32

Copilot AI review requested due to automatic review settings June 26, 2026 11:32

pelikhan merged commit 2d9eb0e into main Jun 26, 2026
28 checks passed

pelikhan deleted the fix/alert-636-http-to-file-access-40d60f037fef5932 branch June 26, 2026 11:32

Copilot AI reviewed Jun 26, 2026

View reviewed changes

github-actions Bot commented Jun 26, 2026

View reviewed changes

Uh oh!

Conversation

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Review details

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Uh oh!

github-actions Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

REQUEST_CHANGES — two correctness bugs in the new validation code

Uh oh!

github-actions Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 26, 2026 •

edited

Loading

github-actions Bot commented Jun 26, 2026 •

edited

Loading

github-actions Bot commented Jun 26, 2026 •

edited

Loading

github-actions Bot commented Jun 26, 2026 •

edited

Loading

github-actions Bot commented Jun 26, 2026 •

edited

Loading