Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/nix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ jobs:
uses: cachix/install-nix-action@8aa03977d8d733052d78f4e008a241fd1dbf36b3 # v31.10.6
with:
extra_nix_config: |
extra-trusted-substituters = https://nix-community.cachix.org
extra-substituters = https://nix-community.cachix.org
extra-trusted-public-keys = nix-community.cachix.org-1:mB9FSh9qf2dCimDSUo8Zy7bkq5CX+/rkCWyvRCYg3Fs=

- name: Verify Nix dependency lockfile
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ All notable user-visible changes to Hunk are documented in this file.

### Fixed

- Preserved Git log ANSI colors when `hunk pager` falls back to a plain-text terminal pager for non-diff output.
- Capped inline context expansion source reads so huge files cannot freeze or exhaust memory when expanding unchanged lines.
- Hardened plain-text pager startup so `PAGER` and `HUNK_TEXT_PAGER` shell metacharacters are passed as arguments instead of being evaluated implicitly.
- Hardened terminal rendering against control-sequence injection from diffs, file paths, notes, expanded context, copied selections, and pager fallback output.
Expand Down
2 changes: 1 addition & 1 deletion flake.nix
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
bun2nix.inputs.nixpkgs.follows = "nixpkgs";
};
nixConfig = {
extra-trusted-substituters = [
extra-substituters = [
"https://nix-community.cachix.org"
];
extra-trusted-public-keys = [
Expand Down
26 changes: 26 additions & 0 deletions src/core/pager.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,32 @@ describe("plain text pager fallback", () => {
expectNoUnsafeTerminalControls(written);
});

test("preserves Git log SGR colors when sending plain text to a terminal pager", async () => {
const pager = new EventEmitter() as EventEmitter & { stdin: PassThrough };
pager.stdin = new PassThrough();
let written = "";
pager.stdin.on("data", (chunk) => {
written += String(chunk);
});

await pagePlainText(
`* \x1b[1;34mabc1234\x1b[m - \x1b[1;32m(HEAD -> main)\x1b[m${CSI_CLEAR_SCREEN}`,
{ PAGER: "less -R" },
createPagerDeps({
spawnImpl() {
queueMicrotask(() => {
pager.emit("close", 0);
});
return pager as never;
},
}),
);

expect(written).toContain("\x1b[1;34mabc1234\x1b[m");
expect(written).toContain("\x1b[1;32m(HEAD -> main)\x1b[m");
expect(written).not.toContain(CSI_CLEAR_SCREEN);
});

test("passes shell metacharacters as pager arguments instead of evaluating them", async () => {
await pagePlainText(
"plain text",
Expand Down
6 changes: 3 additions & 3 deletions src/core/pager.ts
Original file line number Diff line number Diff line change
Expand Up @@ -150,13 +150,13 @@ export async function pagePlainText(
spawnImpl: spawn,
},
) {
const safeText = sanitizeTerminalText(text);

if (!deps.stdout.isTTY) {
deps.stdout.write(safeText);
deps.stdout.write(sanitizeTerminalText(text));
return;
}

const safeText = sanitizeTerminalText(text, { preserveAnsiStyle: true });

const pagerSpec = resolveTextPagerSpec(env);
const pagerCommand = pagerSpec.displayCommand;

Expand Down
25 changes: 25 additions & 0 deletions src/lib/terminalText.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,31 @@ describe("sanitizeTerminalText", () => {
expect(sanitizeTerminalText("alpha\n\tbeta")).toBe("alpha\n\tbeta");
});

test("can preserve ANSI SGR styling while removing unsafe controls", () => {
const output = sanitizeTerminalText(
`plain\x1b[1;34mblue\x1b[m${OSC52_CLIPBOARD}${CSI_CLEAR_SCREEN}\x1b[2Kdone`,
{ preserveAnsiStyle: true },
);

expect(output).toBe("plain\x1b[1;34mblue\x1b[mdone");
});

test("does not preserve non-SGR CSI sequences as ANSI styling", () => {
const output = sanitizeTerminalText("safe\x1b[2J\x1b[H\x1b[?25ltext", {
preserveAnsiStyle: true,
});

expect(output).toBe("safetext");
});

test("removes crafted style placeholder delimiters before restoring ANSI styling", () => {
const output = sanitizeTerminalText("safe\u{f0000}0\u{f0001}\x1b[31mred\x1b[m", {
preserveAnsiStyle: true,
});

expect(output).toBe("safe0\x1b[31mred\x1b[m");
});

test("sanitizes span text while preserving styling metadata", () => {
const spans = [
{ text: `before${OSC52_CLIPBOARD}`, fg: "#fff" },
Expand Down
33 changes: 30 additions & 3 deletions src/lib/terminalText.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ export interface SanitizeTerminalTextOptions {
preserveNewlines?: boolean;
/** Preserve horizontal tabs for text fields that intentionally support them. Defaults to true. */
preserveTabs?: boolean;
/** Preserve ANSI SGR style sequences such as Git color output. Defaults to false. */
preserveAnsiStyle?: boolean;
}

const controlCodeRegex = /[\x00-\x1f\x7f-\x9f]/;
Expand All @@ -12,11 +14,16 @@ const sevenBitControlStrings =
/\x1b(?:\][\s\S]*?(?:\x07|\x1b\\|\x9c)|[PX^_][\s\S]*?(?:\x1b\\|\x9c)|\[[0-?]*[ -/]*[@-~])/g;
const c1ControlStrings = /[\x90\x98\x9d\x9e\x9f][\s\S]*?(?:\x07|\x1b\\|\x9c)/g;
const c1Csi = /\x9b[0-?]*[ -/]*[@-~]/g;
const preservedStyleTokenDelimiters = /[\u{f0000}\u{f0001}]/gu;

/** Normalize untrusted terminal-bound text before rendering it in Hunk UI surfaces. */
export function sanitizeTerminalText(
text: string,
{ preserveNewlines = true, preserveTabs = true }: SanitizeTerminalTextOptions = {},
{
preserveNewlines = true,
preserveTabs = true,
preserveAnsiStyle = false,
}: SanitizeTerminalTextOptions = {},
) {
if (!controlCodeRegex.test(text)) {
return text;
Expand All @@ -29,12 +36,32 @@ export function sanitizeTerminalText(
: preserveTabs
? /[\x00-\x08\x0a-\x1f\x7f-\x9f]/g
: /[\x00-\x1f\x7f-\x9f]/g;
const preservedStyles: string[] = [];
const preserveStyle = (sequence: string) => {
if (!preserveAnsiStyle || !/^\x1b\[[0-9;:]*m$/.test(sequence)) {
return "";
}

const token = `\u{f0000}${preservedStyles.length}\u{f0001}`;
preservedStyles.push(sequence);
return token;
};

return text
.replace(sevenBitControlStrings, "")
// Strip placeholder delimiters from untrusted input so authored text cannot spoof
// an internal token that later restores an ANSI sequence at the wrong location.
const tokenSafeText = preserveAnsiStyle ? text.replace(preservedStyleTokenDelimiters, "") : text;

let sanitized = tokenSafeText
.replace(sevenBitControlStrings, preserveStyle)
Comment on lines +39 to +55
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 The token delimiters \u{f0000} and \u{f0001} are Unicode Supplementary Private Use Area characters and are not covered by any of the sanitiser's control-character regexes (\x00-\x1f, \x7f-\x9f). If the untrusted input already contains these characters — for example, in a git commit message whose author embedded U+F0000 intentionally — the restoration loop will replace them with the ANSI SGR sequences collected from legitimate colour codes elsewhere in the same string, effectively injecting escape sequences that were not at that position in the original input. Since the function is explicitly documented as handling "untrusted" text, stripping the token-delimiter characters before tokenisation closes this gap without any visible difference for real-world input.

Suggested change
const preservedStyles: string[] = [];
const preserveStyle = (sequence: string) => {
if (!preserveAnsiStyle || !/^\x1b\[[0-9;:]*m$/.test(sequence)) {
return "";
}
const token = `\u{f0000}${preservedStyles.length}\u{f0001}`;
preservedStyles.push(sequence);
return token;
};
return text
.replace(sevenBitControlStrings, "")
let sanitized = text
.replace(sevenBitControlStrings, preserveStyle)
const preservedStyles: string[] = [];
const preserveStyle = (sequence: string) => {
if (!preserveAnsiStyle || !/^\x1b\[[0-9;:]*m$/.test(sequence)) {
return "";
}
const token = `\u{f0000}${preservedStyles.length}\u{f0001}`;
preservedStyles.push(sequence);
return token;
};
// Strip the token-delimiter characters before tokenising so that crafted input
// cannot produce token strings that the restoration loop would later replace
// with ANSI escape sequences from elsewhere in the same string.
const tokenSafeText = preserveAnsiStyle
? text.replace(/[\u{f0000}\u{f0001}]/gu, "")
: text;
let sanitized = tokenSafeText
.replace(sevenBitControlStrings, preserveStyle)
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/terminalText.ts
Line: 38-50

Comment:
The token delimiters `\u{f0000}` and `\u{f0001}` are Unicode Supplementary Private Use Area characters and are not covered by any of the sanitiser's control-character regexes (`\x00-\x1f`, `\x7f-\x9f`). If the untrusted input already contains these characters — for example, in a git commit message whose author embedded U+F0000 intentionally — the restoration loop will replace them with the ANSI SGR sequences collected from legitimate colour codes elsewhere in the same string, effectively injecting escape sequences that were not at that position in the original input. Since the function is explicitly documented as handling "untrusted" text, stripping the token-delimiter characters before tokenisation closes this gap without any visible difference for real-world input.

```suggestion
  const preservedStyles: string[] = [];
  const preserveStyle = (sequence: string) => {
    if (!preserveAnsiStyle || !/^\x1b\[[0-9;:]*m$/.test(sequence)) {
      return "";
    }

    const token = `\u{f0000}${preservedStyles.length}\u{f0001}`;
    preservedStyles.push(sequence);
    return token;
  };

  // Strip the token-delimiter characters before tokenising so that crafted input
  // cannot produce token strings that the restoration loop would later replace
  // with ANSI escape sequences from elsewhere in the same string.
  const tokenSafeText = preserveAnsiStyle
    ? text.replace(/[\u{f0000}\u{f0001}]/gu, "")
    : text;
  let sanitized = tokenSafeText
    .replace(sevenBitControlStrings, preserveStyle)
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed by stripping the internal placeholder delimiters from untrusted input before style-token restoration, and added a regression test covering the crafted-token case.

Also updated the Nix workflow/flake cache config so CI actually uses the nix-community Cachix substituter instead of trying to build bun2nix from crates.io.

This comment was generated by Pi using GPT-5

.replace(c1ControlStrings, "")
.replace(c1Csi, "")
.replace(controlCharacters, "");

for (const [index, sequence] of preservedStyles.entries()) {
sanitized = sanitized.replaceAll(`\u{f0000}${index}\u{f0001}`, sequence);
}

return sanitized;
}

/** Sanitize a single terminal row or cell where newlines must never be preserved. */
Expand Down
Loading