fix(pager): preserve git log colors#381
Conversation
Greptile SummaryThis PR fixes a regression where
Confidence Score: 3/5Safe to merge for the vast majority of real-world git output, but the token scheme has a gap for crafted input in the security-sensitive sanitiser. The core logic and pager integration are correct. The one concern is in terminalText.ts: the PUA characters chosen as token delimiters (U+F0000, U+F0001) are not removed before tokenisation, so a git commit message containing those characters could cause the restoration step to inject SGR sequences derived from other parts of the same output. This is a real defect in code that is explicitly designed to handle untrusted terminal content — the function's own docstring says 'Normalize untrusted terminal-bound text' — even though the practical impact is limited to cosmetic colour injection. A one-line pre-filter closes the gap completely. src/lib/terminalText.ts — specifically the token placeholder scheme and restoration loop in sanitizeTerminalText. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[pagePlainText input] --> B{stdout.isTTY?}
B -- No --> C[sanitizeTerminalText\nstrip ALL escapes]
C --> D[write to stdout]
B -- Yes --> E[sanitizeTerminalText\npreserveAnsiStyle=true]
E --> F{contains control codes?}
F -- No --> G[return unchanged]
F -- Yes --> H[sevenBitControlStrings replace]
H --> I{SGR sequence?}
I -- Yes --> J[store + emit token]
I -- No --> K[emit empty string]
J --> L[strip c1 controls\nand control chars]
K --> L
L --> M[restore tokens to\noriginal SGR sequences]
M --> N[safeText with SGR colors]
N --> O[spawn pager]
O --> P[write safeText to pager stdin]
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
src/lib/terminalText.ts:38-50
The token delimiters `\u{f0000}` and `\u{f0001}` are Unicode Supplementary Private Use Area characters and are not covered by any of the sanitiser's control-character regexes (`\x00-\x1f`, `\x7f-\x9f`). If the untrusted input already contains these characters — for example, in a git commit message whose author embedded U+F0000 intentionally — the restoration loop will replace them with the ANSI SGR sequences collected from legitimate colour codes elsewhere in the same string, effectively injecting escape sequences that were not at that position in the original input. Since the function is explicitly documented as handling "untrusted" text, stripping the token-delimiter characters before tokenisation closes this gap without any visible difference for real-world input.
```suggestion
const preservedStyles: string[] = [];
const preserveStyle = (sequence: string) => {
if (!preserveAnsiStyle || !/^\x1b\[[0-9;:]*m$/.test(sequence)) {
return "";
}
const token = `\u{f0000}${preservedStyles.length}\u{f0001}`;
preservedStyles.push(sequence);
return token;
};
// Strip the token-delimiter characters before tokenising so that crafted input
// cannot produce token strings that the restoration loop would later replace
// with ANSI escape sequences from elsewhere in the same string.
const tokenSafeText = preserveAnsiStyle
? text.replace(/[\u{f0000}\u{f0001}]/gu, "")
: text;
let sanitized = tokenSafeText
.replace(sevenBitControlStrings, preserveStyle)
```
Reviews (1): Last reviewed commit: "fix(pager): preserve git log colors" | Re-trigger Greptile |
| const preservedStyles: string[] = []; | ||
| const preserveStyle = (sequence: string) => { | ||
| if (!preserveAnsiStyle || !/^\x1b\[[0-9;:]*m$/.test(sequence)) { | ||
| return ""; | ||
| } | ||
|
|
||
| const token = `\u{f0000}${preservedStyles.length}\u{f0001}`; | ||
| preservedStyles.push(sequence); | ||
| return token; | ||
| }; | ||
|
|
||
| return text | ||
| .replace(sevenBitControlStrings, "") | ||
| let sanitized = text | ||
| .replace(sevenBitControlStrings, preserveStyle) |
There was a problem hiding this comment.
The token delimiters
\u{f0000} and \u{f0001} are Unicode Supplementary Private Use Area characters and are not covered by any of the sanitiser's control-character regexes (\x00-\x1f, \x7f-\x9f). If the untrusted input already contains these characters — for example, in a git commit message whose author embedded U+F0000 intentionally — the restoration loop will replace them with the ANSI SGR sequences collected from legitimate colour codes elsewhere in the same string, effectively injecting escape sequences that were not at that position in the original input. Since the function is explicitly documented as handling "untrusted" text, stripping the token-delimiter characters before tokenisation closes this gap without any visible difference for real-world input.
| const preservedStyles: string[] = []; | |
| const preserveStyle = (sequence: string) => { | |
| if (!preserveAnsiStyle || !/^\x1b\[[0-9;:]*m$/.test(sequence)) { | |
| return ""; | |
| } | |
| const token = `\u{f0000}${preservedStyles.length}\u{f0001}`; | |
| preservedStyles.push(sequence); | |
| return token; | |
| }; | |
| return text | |
| .replace(sevenBitControlStrings, "") | |
| let sanitized = text | |
| .replace(sevenBitControlStrings, preserveStyle) | |
| const preservedStyles: string[] = []; | |
| const preserveStyle = (sequence: string) => { | |
| if (!preserveAnsiStyle || !/^\x1b\[[0-9;:]*m$/.test(sequence)) { | |
| return ""; | |
| } | |
| const token = `\u{f0000}${preservedStyles.length}\u{f0001}`; | |
| preservedStyles.push(sequence); | |
| return token; | |
| }; | |
| // Strip the token-delimiter characters before tokenising so that crafted input | |
| // cannot produce token strings that the restoration loop would later replace | |
| // with ANSI escape sequences from elsewhere in the same string. | |
| const tokenSafeText = preserveAnsiStyle | |
| ? text.replace(/[\u{f0000}\u{f0001}]/gu, "") | |
| : text; | |
| let sanitized = tokenSafeText | |
| .replace(sevenBitControlStrings, preserveStyle) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/terminalText.ts
Line: 38-50
Comment:
The token delimiters `\u{f0000}` and `\u{f0001}` are Unicode Supplementary Private Use Area characters and are not covered by any of the sanitiser's control-character regexes (`\x00-\x1f`, `\x7f-\x9f`). If the untrusted input already contains these characters — for example, in a git commit message whose author embedded U+F0000 intentionally — the restoration loop will replace them with the ANSI SGR sequences collected from legitimate colour codes elsewhere in the same string, effectively injecting escape sequences that were not at that position in the original input. Since the function is explicitly documented as handling "untrusted" text, stripping the token-delimiter characters before tokenisation closes this gap without any visible difference for real-world input.
```suggestion
const preservedStyles: string[] = [];
const preserveStyle = (sequence: string) => {
if (!preserveAnsiStyle || !/^\x1b\[[0-9;:]*m$/.test(sequence)) {
return "";
}
const token = `\u{f0000}${preservedStyles.length}\u{f0001}`;
preservedStyles.push(sequence);
return token;
};
// Strip the token-delimiter characters before tokenising so that crafted input
// cannot produce token strings that the restoration loop would later replace
// with ANSI escape sequences from elsewhere in the same string.
const tokenSafeText = preserveAnsiStyle
? text.replace(/[\u{f0000}\u{f0001}]/gu, "")
: text;
let sanitized = tokenSafeText
.replace(sevenBitControlStrings, preserveStyle)
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Good catch — fixed by stripping the internal placeholder delimiters from untrusted input before style-token restoration, and added a regression test covering the crafted-token case.
Also updated the Nix workflow/flake cache config so CI actually uses the nix-community Cachix substituter instead of trying to build bun2nix from crates.io.
This comment was generated by Pi using GPT-5
Summary
hunk pagerfalls back to a terminal text pager for non-diff contentTesting
bun testbun run typecheckhunk pageras gitcore.pagerstrips colors from other git commands #379 in tmux and confirmedgit -c core.pager='bun run /home/bentlegen/Projects/hunk-a/src/main.tsx -- pager' log ...now preserves colorsFixes #379
This PR description was generated by Pi using GPT-5