Skip to content

fix(downloads): Use random generated name for files (backport #540)#541

Merged
ssiyad merged 5 commits into
masterfrom
mergify/bp/master/pr-540
Jun 10, 2026
Merged

fix(downloads): Use random generated name for files (backport #540)#541
ssiyad merged 5 commits into
masterfrom
mergify/bp/master/pr-540

Conversation

@mergify

@mergify mergify Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

This is an automatic backport of pull request #540 done by [Mergify](https://mergify.com).

@mergify mergify Bot assigned ssiyad Jun 9, 2026
@greptile-apps

greptile-apps Bot commented Jun 9, 2026

Copy link
Copy Markdown

Greptile Summary

This backport replaces URL-derived filenames with random secrets.token_urlsafe names in download_file, adds a pure-Python _safe_extract_tar that rejects symlinks, absolute paths, and traversal sequences, and deprecates string-form commands in run_subprocess by switching to shell=False for list commands.

  • utils.py: filename is now a random token with a hardcoded-known extension, closing path traversal via crafted download URLs.
  • site.py: shell tar calls replaced by _safe_extract_tar; symlinks, hardlinks, absolute paths, and .. components are all rejected before extraction.
  • base.py: run_subprocess emits a DeprecationWarning for string commands and raises TypeError if executable is passed alongside a list command — a narrow breaking change for callers that previously combined those two arguments.

Confidence Score: 4/5

Safe to merge; the core security fixes are sound and the only finding is a minor inefficiency inside _safe_extract_tar.

The three changed files all address real security concerns (path traversal via filenames, unsafe tar extraction, shell injection via string commands). The logic is correct: symlinks, absolute paths, and .. traversal are all rejected before extraction, and filenames are now unpredictable. The one issue is os.path.realpath(dest) being re-evaluated on every loop iteration rather than once before the loop — wasteful but not unsafe.

agent/site.py — the _safe_extract_tar implementation warrants a second look for the dest_real placement; agent/base.py — verify no existing callers pass both a list command and a non-None executable, as that now raises TypeError.

Important Files Changed

Filename Overview
agent/utils.py Replaces URL-derived filename with secrets.token_urlsafe(16) plus a hardcoded-known extension, eliminating path traversal via crafted URLs.
agent/site.py Adds _safe_extract_tar to replace shell tar invocations; rejects symlinks, hardlinks, absolute paths, and parent-traversal. Minor: os.path.realpath(dest) is re-evaluated inside the loop.
agent/base.py Deprecates string-form commands in run_subprocess (switches shell=True→False for list commands); raises TypeError when executable is passed with a list, which is a minor breaking change for any existing callers relying on that combination.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[download_file URL] --> B[Parse URL path basename]
    B --> C{Known extension?}
    C -- yes --> D[ext = matched suffix]
    C -- no --> E[ext = empty string]
    D --> F[filename = token_urlsafe + ext]
    E --> F
    F --> G[Save to prefix/filename]

    G --> H[restore_files]
    H --> I[_safe_extract_tar]
    I --> J[tarfile.open path]
    J --> K[For each member]
    K --> L{issym or islnk?}
    L -- yes --> M[raise ExtractError]
    L -- no --> N{isabs name?}
    N -- yes --> M
    N -- no --> O{'.. in parts?}
    O -- yes --> M
    O -- no --> P{strip > 0?}
    P -- yes --> Q[Strip N leading components]
    Q --> R{empty after strip?}
    R -- yes --> S[skip member]
    R -- no --> T[Check target within dest_real]
    P -- no --> T
    T --> U{outside dest?}
    U -- yes --> M
    U -- no --> V[Add to valid list]
    V --> W[tar.extractall valid members]
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
agent/site.py:147-168
`dest_real` is recomputed via `os.path.realpath` on every loop iteration. Since `dest` never changes within the loop, this should be moved above the loop to avoid repeated syscalls proportional to archive member count.

```suggestion
        with tarfile.open(path) as tar:
            members = tar.getmembers()
            valid = []
            dest_real = os.path.realpath(dest)
            for member in members:
                if member.issym() or member.islnk():
                    raise tarfile.ExtractError(f"Refusing to extract link: {member.name}")
                if os.path.isabs(member.name):
                    raise tarfile.ExtractError(f"Refusing absolute path: {member.name}")
                parts = member.name.split("/")
                if ".." in parts:
                    raise tarfile.ExtractError(f"Refusing parent traversal: {member.name}")
                if strip:
                    stripped = "/".join(parts[strip:])
                    if not stripped:
                        continue
                    member.name = stripped
                target = os.path.realpath(os.path.join(dest, member.name))
                if not target.startswith(dest_real + os.sep):
                    raise tarfile.ExtractError(f"Refusing path outside destination: {member.name}")
                valid.append(member)
            tar.extractall(path=dest, members=valid)
```

Reviews (1): Last reviewed commit: "fix(downloads): Only extract valid paths" | Re-trigger Greptile

Comment thread agent/site.py
Comment on lines +147 to +168
with tarfile.open(path) as tar:
members = tar.getmembers()
valid = []
for member in members:
if member.issym() or member.islnk():
raise tarfile.ExtractError(f"Refusing to extract link: {member.name}")
if os.path.isabs(member.name):
raise tarfile.ExtractError(f"Refusing absolute path: {member.name}")
parts = member.name.split("/")
if ".." in parts:
raise tarfile.ExtractError(f"Refusing parent traversal: {member.name}")
if strip:
stripped = "/".join(parts[strip:])
if not stripped:
continue
member.name = stripped
dest_real = os.path.realpath(dest)
target = os.path.realpath(os.path.join(dest, member.name))
if not target.startswith(dest_real + os.sep):
raise tarfile.ExtractError(f"Refusing path outside destination: {member.name}")
valid.append(member)
tar.extractall(path=dest, members=valid)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 dest_real is recomputed via os.path.realpath on every loop iteration. Since dest never changes within the loop, this should be moved above the loop to avoid repeated syscalls proportional to archive member count.

Suggested change
with tarfile.open(path) as tar:
members = tar.getmembers()
valid = []
for member in members:
if member.issym() or member.islnk():
raise tarfile.ExtractError(f"Refusing to extract link: {member.name}")
if os.path.isabs(member.name):
raise tarfile.ExtractError(f"Refusing absolute path: {member.name}")
parts = member.name.split("/")
if ".." in parts:
raise tarfile.ExtractError(f"Refusing parent traversal: {member.name}")
if strip:
stripped = "/".join(parts[strip:])
if not stripped:
continue
member.name = stripped
dest_real = os.path.realpath(dest)
target = os.path.realpath(os.path.join(dest, member.name))
if not target.startswith(dest_real + os.sep):
raise tarfile.ExtractError(f"Refusing path outside destination: {member.name}")
valid.append(member)
tar.extractall(path=dest, members=valid)
with tarfile.open(path) as tar:
members = tar.getmembers()
valid = []
dest_real = os.path.realpath(dest)
for member in members:
if member.issym() or member.islnk():
raise tarfile.ExtractError(f"Refusing to extract link: {member.name}")
if os.path.isabs(member.name):
raise tarfile.ExtractError(f"Refusing absolute path: {member.name}")
parts = member.name.split("/")
if ".." in parts:
raise tarfile.ExtractError(f"Refusing parent traversal: {member.name}")
if strip:
stripped = "/".join(parts[strip:])
if not stripped:
continue
member.name = stripped
target = os.path.realpath(os.path.join(dest, member.name))
if not target.startswith(dest_real + os.sep):
raise tarfile.ExtractError(f"Refusing path outside destination: {member.name}")
valid.append(member)
tar.extractall(path=dest, members=valid)
Prompt To Fix With AI
This is a comment left during a code review.
Path: agent/site.py
Line: 147-168

Comment:
`dest_real` is recomputed via `os.path.realpath` on every loop iteration. Since `dest` never changes within the loop, this should be moved above the loop to avoid repeated syscalls proportional to archive member count.

```suggestion
        with tarfile.open(path) as tar:
            members = tar.getmembers()
            valid = []
            dest_real = os.path.realpath(dest)
            for member in members:
                if member.issym() or member.islnk():
                    raise tarfile.ExtractError(f"Refusing to extract link: {member.name}")
                if os.path.isabs(member.name):
                    raise tarfile.ExtractError(f"Refusing absolute path: {member.name}")
                parts = member.name.split("/")
                if ".." in parts:
                    raise tarfile.ExtractError(f"Refusing parent traversal: {member.name}")
                if strip:
                    stripped = "/".join(parts[strip:])
                    if not stripped:
                        continue
                    member.name = stripped
                target = os.path.realpath(os.path.join(dest, member.name))
                if not target.startswith(dest_real + os.sep):
                    raise tarfile.ExtractError(f"Refusing path outside destination: {member.name}")
                valid.append(member)
            tar.extractall(path=dest, members=valid)
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@ssiyad ssiyad merged commit e6f18cb into master Jun 10, 2026
4 of 5 checks passed
@ssiyad ssiyad deleted the mergify/bp/master/pr-540 branch June 10, 2026 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant