π‘οΈ Sentinel: [CRITICAL] Fix RCE vulnerability in PDF compilation#291
π‘οΈ Sentinel: [CRITICAL] Fix RCE vulnerability in PDF compilation#291
Conversation
Added `-no-shell-escape` flags to `pdflatex` and `pandoc` compilation commands in `cli/pdf/converter.py` and `cli/generators/cover_letter_generator.py` to prevent arbitrary command execution via maliciously crafted or hallucinated LaTeX content. Also implemented bounded subprocess execution using `timeout=30` and proper process termination `kill()` to prevent potential DoS attacks. Co-authored-by: anchapin <[email protected]>
|
π Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a π emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Reviewer's GuideAdds hardened PDF compilation by disabling LaTeX shell escapes and enforcing time-bounded subprocess execution for both pdflatex and pandoc, plus documents the incident in Sentinel notes. Sequence diagram for hardened LaTeX PDF compilation with timeout and fallbacksequenceDiagram
participant Caller
participant CoverLetterGenerator
participant Subprocess
participant Pdflatex
participant Pandoc
Caller->>CoverLetterGenerator: _compile_pdf(output_path, tex_content)
CoverLetterGenerator->>CoverLetterGenerator: write tex_content to tex_path
Note over CoverLetterGenerator: Primary compilation via pdflatex
CoverLetterGenerator->>Subprocess: Popen(pdflatex -interaction=nonstopmode -no-shell-escape)
Subprocess->>Pdflatex: execute tex_path
alt completes within 30s
CoverLetterGenerator->>Subprocess: communicate(timeout=30)
Subprocess-->>CoverLetterGenerator: stdout, stderr
alt returncode == 0 or pdf exists
CoverLetterGenerator-->>Caller: pdf_created = True
else returncode != 0 and no pdf
CoverLetterGenerator->>CoverLetterGenerator: pdf_created remains False
end
else times out after 30s
CoverLetterGenerator->>Subprocess: process.kill()
CoverLetterGenerator->>Subprocess: communicate()
CoverLetterGenerator->>CoverLetterGenerator: raise RuntimeError
CoverLetterGenerator->>CoverLetterGenerator: enter exception handler
end
alt pdf_created is False and pdf does not exist
Note over CoverLetterGenerator: Fallback compilation via pandoc
CoverLetterGenerator->>Subprocess: Popen(pandoc --pdf-engine=xelatex --pdf-engine-opt=-no-shell-escape)
Subprocess->>Pandoc: execute tex_path
alt completes within 30s
CoverLetterGenerator->>Subprocess: communicate(timeout=30)
Subprocess-->>CoverLetterGenerator: stdout, stderr
alt returncode == 0 or pdf exists
CoverLetterGenerator-->>Caller: pdf_created = True
else returncode != 0 and no pdf
CoverLetterGenerator->>CoverLetterGenerator: pdf_created remains False
end
else times out after 30s
CoverLetterGenerator->>Subprocess: process.kill()
CoverLetterGenerator->>Subprocess: communicate()
CoverLetterGenerator->>CoverLetterGenerator: raise RuntimeError
CoverLetterGenerator->>CoverLetterGenerator: exception handler ignores
end
else pdf already created
CoverLetterGenerator-->>Caller: pdf_created = True
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The new timeout/kill logic for
pdflatexandpandocis duplicated in multiple places; consider extracting a shared helper (e.g.,run_with_timeout(cmd, cwd, timeout=30)) or at least centralizing the timeout constant to avoid divergence in future changes. - By catching
RuntimeErrortogether with other exceptions and then treating an existing output file as success, timeouts will be silently downgraded to success if a partial PDF exists; consider either surfacing timeout failures distinctly or validating the resulting file before treating it as successful.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new timeout/kill logic for `pdflatex` and `pandoc` is duplicated in multiple places; consider extracting a shared helper (e.g., `run_with_timeout(cmd, cwd, timeout=30)`) or at least centralizing the timeout constant to avoid divergence in future changes.
- By catching `RuntimeError` together with other exceptions and then treating an existing output file as success, timeouts will be silently downgraded to success if a partial PDF exists; consider either surfacing timeout failures distinctly or validating the resulting file before treating it as successful.
## Individual Comments
### Comment 1
<location path="cli/pdf/converter.py" line_range="103" />
<code_context>
if process.returncode == 0 or output_path.exists():
pdf_created = True
- except (subprocess.CalledProcessError, FileNotFoundError):
+ except (subprocess.CalledProcessError, FileNotFoundError, RuntimeError):
# Check if PDF was created anyway
if output_path.exists():
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Timeout and other failures are fully suppressed, which makes diagnosing compilation issues difficult.
In this pandoc path, `RuntimeError` (including timeouts) and `FileNotFoundError` are now silently swallowed, and the caller only sees `False`. Please at least log the exception type and a brief message (and possibly stderr) so timeouts vs. missing binaries/misconfigurations can be distinguished in production without modifying code.
Suggested implementation:
```python
except (subprocess.CalledProcessError, FileNotFoundError, RuntimeError) as exc:
# Log the failure so timeouts, missing binaries, etc. can be diagnosed
logger.error(
"PDF compilation failed (%s): %s",
type(exc).__name__,
str(exc),
exc_info=True,
)
# Log stderr if it was captured
if "stderr" in locals() and stderr:
logger.error("PDF compilation stderr:\n%s", stderr)
# Check if PDF was created anyway (pdflatex returns non-zero for warnings)
if output_path.exists():
return True
```
1. Ensure there is a module-level logger defined in `cli/pdf/converter.py`, for example:
`logger = logging.getLogger(__name__)`.
2. If not already present, import the `logging` module at the top of the file:
`import logging`.
</issue_to_address>Help me be more useful! Please click π or π on each comment and I'll use the feedback to improve your reviews.
| if process.returncode == 0 or output_path.exists(): | ||
| return True | ||
| except (subprocess.CalledProcessError, FileNotFoundError): | ||
| except (subprocess.CalledProcessError, FileNotFoundError, RuntimeError): |
There was a problem hiding this comment.
suggestion (bug_risk): Timeout and other failures are fully suppressed, which makes diagnosing compilation issues difficult.
In this pandoc path, RuntimeError (including timeouts) and FileNotFoundError are now silently swallowed, and the caller only sees False. Please at least log the exception type and a brief message (and possibly stderr) so timeouts vs. missing binaries/misconfigurations can be distinguished in production without modifying code.
Suggested implementation:
except (subprocess.CalledProcessError, FileNotFoundError, RuntimeError) as exc:
# Log the failure so timeouts, missing binaries, etc. can be diagnosed
logger.error(
"PDF compilation failed (%s): %s",
type(exc).__name__,
str(exc),
exc_info=True,
)
# Log stderr if it was captured
if "stderr" in locals() and stderr:
logger.error("PDF compilation stderr:\n%s", stderr)
# Check if PDF was created anyway (pdflatex returns non-zero for warnings)
if output_path.exists():
return True- Ensure there is a module-level logger defined in
cli/pdf/converter.py, for example:
logger = logging.getLogger(__name__). - If not already present, import the
loggingmodule at the top of the file:
import logging.
π¨ Severity: CRITICAL
π‘ Vulnerability: The application used
subprocess.Popento compile LaTeX files usingpdflatexandpandocwithout the-no-shell-escapeflag and without process timeouts.π― Impact: This allowed malicious
.texcontent (e.g., injected via unsanitized user input or AI hallucinations) to execute arbitrary shell commands via the\write18feature or similar LaTeX exploits. Furthermore, the lack of timeouts could lead to Denial of Service (DoS) via infinite compilation loops.π§ Fix: Added
-no-shell-escapeflags to allpdflatexandpandoccalls. Implementedtimeout=30handling viasubprocess.communicate()with proper process cleanup (process.kill()) for timeout exceptions.β Verification: Ensure tests pass locally (e.g.,
test_pdf_security.py).PR created automatically by Jules for task 15424413755351556693 started by @anchapin
Summary by Sourcery
Harden PDF compilation against remote code execution and denial-of-service by constraining LaTeX tool invocations and documenting the incident.
Bug Fixes:
Documentation: