Skip to content

Commit bca17b0

Browse files
authored
Fix Reported Bugs in the Crash Analyzer (#1154)
This PR addresses the three errors reported in [Issue 1153](#1153). ## Error 1: GDB complains that it cannot insert breakpoint because it cannot access memory at the breakpoint's address. To address this error, this PR modifies the oss-fuzz-checkout.prepare_project_image function to use the oss-fuzz project created by the crash analyzer using the Evaluator.create_ossfuzz_project_with_gdb function. In addition, this PR also disables caching by default. This is because when a cached project image is used, OSS-Fuzz-Gen edit the Dockerfile in the created oss-fuzz project and removes the commands modifying environment variables. A cleaner solution would be to also modify the compile flags in the cached image but I couldn't get this to work within the short time I worked on this. ## Error 2: GDB complains that the artifact directory was not found. Using the oss-fuzz project created using the Evaluator.create_ossfuzz_project_with_gdb function partially addresses this error. I also modified the tutorial of the GDB tool so that it references the correct path of the artifact in the project container used by the GDB tool. ## Error 3: The LLM hallucinates GDB interactions and uses this hallucinated interaction to derive its conclusion. Since I cannot prevent the LLM from hallucinating, I addressed this error by adding two validations to the LLM response. First, I added a check that the LLM response does not contain both gdb/bash commands and the LLM's conclusion. The reasoning is that the LLM should not be issuing tool commands and providing a conclusion at the same time. In the future, I expect that the use of function tools should also address this problem. I also added a second validation that ensures the gdb tool is used at least once before the LLM produces a response. I'm not sure if this can prevent hallucination, but from my experience, once the LLM uses the GDB tool the first time, it continues using the GDB tool. Finally, this PR also fixes a bug in the Crash Analyzer where the ProjectContainer tool is not terminated before the Crash Analyzer exits and adds necessary files for testing the Crash Analyzer directly. ## Evaluation Instruction The Crash Analyzer (before and after the changes provided by this PR) can be tested directly using the command below (change the oss-fuzz directory path): ``` python3 -m agent_tests.agent_test -y benchmark-sets/comparison/mosh.yaml -f _ZN8Terminal11Framebuffer6resizeEii -p CrashAnalyzer -pf agent_tests/prompt_files/crash-analyzer-mosh-01.txt -afp ./agent_tests/2025-07-16-1148-pamusuo-analyzer-tests-1/ -of [path/to/oss-fuzz] > result-test-01.txt 2>&1 ``` ## Expectation Without the contributed changes, the LLM response will exhibit one of the three errors described in [Issue 1153](#1153). With the contributed changes, the LLM should use the GDB tool at least once, and the GDB tool invocation should not fail.
1 parent 8881de0 commit bca17b0

File tree

10 files changed

+305
-10
lines changed

10 files changed

+305
-10
lines changed

agent/crash_analyzer.py

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -130,12 +130,27 @@ def _container_handle_conclusion(self, cur_round: int, response: str,
130130
def _container_tool_reaction(self, cur_round: int, response: str,
131131
crash_result: CrashResult) -> Optional[Prompt]:
132132
"""Validates LLM conclusion or executes its command."""
133+
extra_note = ''
134+
# If there's a conclusion tag and a tool usage tag, then there's an error
135+
prompt = prompt_builder.CrashAnalyzerTemplateBuilder(self.llm,
136+
None).build([])
137+
if self._parse_tag(response, 'conclusion') and (self._parse_tag(
138+
response, 'gdb') or self._parse_tag(response, 'bash')):
139+
extra_note = 'NOTE: You cannot provide both tool commands and conclusion in the same response.'
140+
return self._container_handle_invalid_tool_usage(
141+
[self.gdb_tool, self.bash_tool], cur_round, response, prompt,
142+
extra_note)
143+
133144
if self._parse_tag(response, 'conclusion'):
145+
if not self.gdb_tool_used:
146+
extra_note = 'NOTE: You MUST use the provided GDB tool to analyze the crash before providing a conclusion.'
147+
return self._container_handle_invalid_tool_usage(
148+
[self.gdb_tool, self.bash_tool], cur_round, response, prompt,
149+
extra_note)
134150
return self._container_handle_conclusion(cur_round, response,
135151
crash_result)
136-
prompt = prompt_builder.CrashAnalyzerTemplateBuilder(self.llm,
137-
None).build([])
138152
if self._parse_tag(response, 'gdb'):
153+
self.gdb_tool_used = True
139154
return self._container_handle_gdb_command(response, self.gdb_tool, prompt)
140155
if self._parse_tag(response, 'bash'):
141156
return self._container_handle_bash_command(response, self.bash_tool,
@@ -201,6 +216,8 @@ def execute(self, result_history: list[Result]) -> AnalysisResult:
201216
self.gdb_tool.execute(f'screen -dmS gdb_session -L '
202217
f'-Logfile /tmp/gdb_log.txt '
203218
f'gdb /out/{last_result.benchmark.target_name}')
219+
# Define variable to keep track of gdb tool usage
220+
self.gdb_tool_used = False
204221
self.bash_tool = ProjectContainerTool(
205222
benchmark, name='check', project_name=generated_oss_fuzz_project)
206223
self.bash_tool.compile(extra_commands=' && rm -rf /out/* > /dev/null')
@@ -227,10 +244,12 @@ def execute(self, result_history: list[Result]) -> AnalysisResult:
227244
self._sleep_random_duration(trial=self.trial)
228245
finally:
229246
# Cleanup: stop the container
230-
logger.debug('Stopping the crash analyze container %s',
247+
logger.debug('Stopping the crash analyze containers: %s, %s',
231248
self.gdb_tool.container_id,
249+
self.bash_tool.container_id,
232250
trial=self.trial)
233251
self.gdb_tool.terminate()
252+
self.bash_tool.terminate()
234253

235254
analysis_result = AnalysisResult(
236255
author=self,

agent_tests/2025-07-16-1148-pamusuo-analyzer-tests-1/10.build_script

Whitespace-only changes.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
#include <fuzzer/FuzzedDataProvider.h>
2+
#include <cstddef>
3+
#include <cstdint>
4+
5+
// Per instruction, include this specific header.
6+
#include "/usr/include/c++/9/bits/basic_string.h"
7+
8+
// Headers for the classes under test.
9+
#include "src/terminal/terminal.h"
10+
#include "src/terminal/parseraction.h"
11+
12+
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
13+
FuzzedDataProvider provider(data, size);
14+
15+
// Use reasonable limits to avoid excessive memory allocation which would cause timeouts.
16+
const size_t init_width = provider.ConsumeIntegralInRange<size_t>(1, 1024);
17+
const size_t init_height = provider.ConsumeIntegralInRange<size_t>(1, 1024);
18+
19+
Terminal::Emulator emulator(init_width, init_height);
20+
21+
const size_t resize_width = provider.ConsumeIntegralInRange<size_t>(0, 1024);
22+
const size_t resize_height = provider.ConsumeIntegralInRange<size_t>(0, 1024);
23+
24+
// Create a Resize action object, which is a friend of Emulator.
25+
const Parser::Resize resize_action(resize_width, resize_height);
26+
27+
// Call the public method that will, in turn, call the private resize method.
28+
resize_action.act_on_terminal(&emulator);
29+
30+
return 0;
31+
}

agent_tests/2025-07-16-1148-pamusuo-analyzer-tests-1/crash-da39a3ee5e6b4b0d3255bfef95601890afd80709

Whitespace-only changes.

agent_tests/agent_test.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -192,13 +192,24 @@ def get_result_list_for_agent(
192192
return agent_test_instance.setup_initial_result_list(benchmark, args.prompt)
193193

194194

195+
def json_set_converter(obj):
196+
"""Converts a set to a list for JSON serialization."""
197+
if isinstance(obj, set):
198+
return list(obj)
199+
raise TypeError(
200+
f"Object of type {obj.__class__.__name__} is not JSON serializable")
201+
202+
195203
def write_result(args: argparse.Namespace, trial: int,
196204
result: List[Result]) -> None:
197205
"""Writes the result to a file in the work directory."""
198206

199207
result_file = os.path.join(args.work_dirs.base, f'{trial}_result.json')
200208
with open(result_file, 'w') as file:
201-
json.dump([r.to_dict() for r in result], file, indent=2)
209+
json.dump([r.to_dict() for r in result],
210+
file,
211+
indent=2,
212+
default=json_set_converter)
202213

203214
logger.info('Result written to %s', result_file, trial=trial)
204215

Lines changed: 227 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,227 @@
1+
Given the following crash report, fuzz driver code and relevant project function code, analyze the cause of the crash using GDB tool step by step.
2+
First, make a conclusion, ONLY ANSWER "False" if the crash is caused by bug in fuzz driver OR ONLY ANSWER "True" if the crash is caused by bug in project. Second, offer succinct and to-the-point analyses and suggestions.
3+
4+
Below is crash report:
5+
<log>
6+
AddressSanitizer: ABRT on unknown address 0x000000000012 (pc 0x7fbf92cc900b bp 0x7fbf92e3e588 sp 0x7ffce9619330 T0)
7+
SCARINESS: 10 (signal)
8+
#0 0x7fbf92cc900b in raise (/lib/x86_64-linux-gnu/libc.so.6+0x4300b) (BuildId: 5792732f783158c66fb4f3756458ca24e46e827d)
9+
#1 0x7fbf92ca8858 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x22858) (BuildId: 5792732f783158c66fb4f3756458ca24e46e827d)
10+
#2 0x7fbf92ca8728 (/lib/x86_64-linux-gnu/libc.so.6+0x22728) (BuildId: 5792732f783158c66fb4f3756458ca24e46e827d)
11+
#3 0x7fbf92cb9fd5 in __assert_fail (/lib/x86_64-linux-gnu/libc.so.6+0x33fd5) (BuildId: 5792732f783158c66fb4f3756458ca24e46e827d)
12+
#4 0x555a8d236679 in Terminal::Framebuffer::resize(int, int) /src/mosh/src/terminal/terminalframebuffer.cc:398:3
13+
#5 0x555a8d21fda4 in LLVMFuzzerTestOneInput /src/mosh/src/fuzz/terminal_parser_fuzzer.cc:28:17
14+
#6 0x555a8d0d4430 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:614:13
15+
#7 0x555a8d0d5941 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::__Fuzzer::vector<fuzzer::SizedFile, std::__Fuzzer::allocator<fuzzer::SizedFile>>&) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:807:3
16+
#8 0x555a8d0d5ed2 in fuzzer::Fuzzer::Loop(std::__Fuzzer::vector<fuzzer::SizedFile, std::__Fuzzer::allocator<fuzzer::SizedFile>>&) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:867:3
17+
#9 0x555a8d0c500b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:914:6
18+
#10 0x555a8d0f03e2 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
19+
#11 0x7fbf92caa082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 5792732f783158c66fb4f3756458ca24e46e827d)
20+
#12 0x555a8d0b788d in _start (out/libfuzzer-address-x86_64/terminal_parser_fuzzer+0x5288d)
21+
22+
DEDUP_TOKEN: raise--abort--
23+
AddressSanitizer can not provide additional info.
24+
</log>
25+
26+
Below is driver code:
27+
<code>
28+
Line 1 - 28:
29+
#include <fuzzer/FuzzedDataProvider.h>
30+
#include <cstddef>
31+
#include <cstdint>
32+
33+
// Per instruction, include this specific header.
34+
#include "/usr/include/c++/9/bits/basic_string.h"
35+
36+
// Headers for the classes under test.
37+
#include "src/terminal/terminal.h"
38+
#include "src/terminal/parseraction.h"
39+
40+
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
41+
FuzzedDataProvider provider(data, size);
42+
43+
// Use reasonable limits to avoid excessive memory allocation which would cause timeouts.
44+
const size_t init_width = provider.ConsumeIntegralInRange<size_t>(1, 1024);
45+
const size_t init_height = provider.ConsumeIntegralInRange<size_t>(1, 1024);
46+
47+
Terminal::Emulator emulator(init_width, init_height);
48+
49+
const size_t resize_width = provider.ConsumeIntegralInRange<size_t>(0, 1024);
50+
const size_t resize_height = provider.ConsumeIntegralInRange<size_t>(0, 1024);
51+
52+
// Create a Resize action object, which is a friend of Emulator.
53+
const Parser::Resize resize_action(resize_width, resize_height);
54+
55+
// Call the public method that will, in turn, call the private resize method.
56+
resize_action.act_on_terminal(&emulator);
57+
</code>
58+
59+
Below is relevant project function code:
60+
<code>
61+
{PROJECT_FUNCTION_CODE}
62+
</code>
63+
64+
To help analyze the root cause behind the runtime crash, you can leverage GDB tool and BASH tool to obtain information.
65+
66+
Instructions:
67+
1. ALWAYS use the provided GDB or BASH tools to locate the program lines mentioned in the crash report.
68+
2. DO NOT TRY TO ANALYZE OR COUNT THE LINES OF CODE IN THE PROGRAM YOURSELF.
69+
<tool>
70+
**GDB tool Guide**
71+
You can leverage GDB by iteractively sending me a GDB command, and I will provide you with the output of the command. The path of fuzz driver binary is '/out/terminal_parser_fuzzer'. The testcase that triggers runtime crash is stored at '/experiment/results/output-mosh-_zn8terminal8emulator6resizeemm/artifacts/10.fuzz_target-F0-10/crash-da39a3ee5e6b4b0d3255bfef95601890afd80709'.
72+
73+
<interaction protocols>
74+
1. I have executed 'gdb /out/terminal_parser_fuzzer'. You are now in GDB session, NOT in shell session. DO NOT run 'gdb /out/terminal_parser_fuzzer' again! DO NOT run shell commands!
75+
2. Strictly ONE GDB command at a time!
76+
3. Each message you send should first explain the reason why you want to run the command wrapped by <reason></reason>, then provide the command to run wrapped in <gdb></gdb> in this format:
77+
<reason>
78+
Reasons here.
79+
</reason>
80+
<gdb>
81+
One gdb command here.
82+
</gdb>
83+
4. Each reponse I send will repeat the command you sent wrapped in <gdb command></gdb command> for you to double-check, followed by the command standard output wrapped in <gdb output></gdb output> and stderr wrapped in <stderr></stderr> in this format:
84+
<gdb command>
85+
The command I executed, copied from the command you sent.
86+
</gdb command>
87+
<gdb output>
88+
The standard output of the command.
89+
</gdb output>
90+
<stderr>
91+
The standard error of the command.
92+
</stderr>
93+
5. The final goal is to answer questions about runtime crash, executed fuzz driver and project under test: a) ‘False’(if the crash is caused by bug in fuzz driver) or ‘True'(if the crash is caused by bug in project)? b) If the crash is caused by bug in fuzz driver, provide analyses, and are there any suggestions for modifying the fuzz driver? c) If the crash is caused by bug in project, provide analyses, and are there any suggestions for patching the project?
94+
6. If you have a conclusion on above questions, output the conclusion wrapped by <conclusion></conclusion> followed by the analysis and suggestion wrapped in <analysis and suggestion></analysis and suggestion>:
95+
<conclusion>
96+
‘False’ or ‘True’
97+
</conclusion>
98+
<analysis and suggestion>
99+
Analysis and suggestion
100+
</analysis and suggestion>
101+
</interaction protocols>
102+
103+
<general rules>
104+
1. DO NOT wrap code snippets with ```, using the XML-style tags above will suffice.
105+
2. DO NOT Compile or Run Code!
106+
3. Strictly ONE GDB command at a time!
107+
4. DO NOT run 'gdb /out/terminal_parser_fuzzer' again!
108+
5. DO NOT run shell commands!
109+
</general rules>
110+
</tool>
111+
<tool>
112+
**Bash tool Guide**
113+
Use the bash tool to investigate files in the fuzz target's build environment. This will help you understand the project source code, the function under test, its dependencies, and any compilation requirements.
114+
115+
<interaction protocols>
116+
1. STRICTLY Only One Bash Command per message:
117+
* **DO NOT** send multiple bash commands in each message.
118+
2. Execute Bash Command Message Structure:
119+
* Reason for the Command:
120+
* Explain the reason for running the command.
121+
* Wrap this explanation within <reason> and </reason> tags.
122+
* Bash Command:
123+
* Provide the bash command to execute.
124+
* Wrap the command with <bash> and </bash> tags.
125+
* Format Example:
126+
<reason>
127+
I want to locate the source file containing the definition of the function-under-test to examine its implementation.
128+
</reason>
129+
<bash>
130+
grep -rn 'function_name(' /src/project-name/
131+
</bash>
132+
3. Receiving Bash Command Output Message Structure:
133+
* Bash execution outputs will be returned in the following format:
134+
<bash>
135+
[The command you executed.]
136+
</bash>
137+
<stdout>
138+
[Standard output of the command.]
139+
</stdout>
140+
<stderr>
141+
[Standard error of the command.]
142+
</stderr>
143+
<interaction protocols>
144+
145+
<general rules>
146+
1 .File Access and Modification Restrictions:
147+
* Allowed Actions:
148+
* View any files and environment variables in the build environment.
149+
* Prohibited Actions:
150+
* Do not modify, rename, or create new files.
151+
* All modifications will not be preserved when building the fuzz target.
152+
</general rules>
153+
154+
<tool guidelines>
155+
1 .Purposeful Commands:
156+
* Each bash command should have a clear purpose related to your investigation toward the final goals.
157+
2. Careful Interpretation:
158+
* Analyze the output of each command thoroughly to inform your next steps.
159+
* Keep notes of important findings that will help in modifying the fuzz target and build script.
160+
4. Clarity and Compliance:
161+
* Adhere strictly to the interaction protocols and formatting requirements.
162+
* Ensure your messages are clear and properly formatted.
163+
5. No Unauthorized Actions:
164+
* Do not modify files.
165+
6. Avoid using `pkg-config`:
166+
* Use bash commands to manually identify the correct file paths
167+
* Explore the project's directory hierarchy (`/src/<project-name>`) to learn headerfiles locations, library's naming conventions, and build system.
168+
</tool guidelines>
169+
170+
<example usages>
171+
Command 1. Start by locating the function's definition and understand its parameters, e.g.:
172+
<reason>
173+
To find the definition of `my_function` in the project directory and understand its implementation details.
174+
</reason>
175+
<bash>
176+
grep -rn 'my_function(' /src/project/
177+
</bash>
178+
Command 2. Identify Required Headers:
179+
<reason>
180+
To identify the header files in the project directory that declare `my_function`.
181+
</reason>
182+
<bash>
183+
grep -rn 'my_function' /src/project/ --include=*.h
184+
</bash>
185+
Command 3. Locate Custom Type Definitions:
186+
<reason>
187+
To find the definition of the custom type `CustomType` used by `my_function`.
188+
</reason>
189+
<bash>
190+
grep -rn 'typedef.*CustomType' /src/project/
191+
</bash>
192+
Command 4. Examine Existing Fuzz Targets:
193+
<reason>
194+
To see how existing fuzz targets include headers and initialize variables in the `LLVMFuzzerTestOneInput` function.
195+
</reason>
196+
<bash>
197+
cat /src/mosh/src/fuzz/terminal_parser_fuzzer.cc
198+
</bash>
199+
* Remember you can use the same command on other example fuzz targets under the same parent directory as `/src/mosh/src/fuzz/terminal_parser_fuzzer.cc`.
200+
Command 5. Check Build Script for Compilation Flags and Libraries:
201+
<reason>
202+
To check which compiler flags and libraries are used in the build script.
203+
</reason>
204+
<bash>
205+
cat /src/build.bk.sh
206+
</bash>
207+
Command 6. Verify Available Libraries:
208+
<reason>
209+
To list the built libraries to verify that the necessary libraries are available.
210+
</reason>
211+
<bash>
212+
ls /src/project/build/libs/
213+
</bash>
214+
Command 7. Understand Environment Variables:
215+
<reason>
216+
To check if any environment variables related to the project are set.
217+
</reason>
218+
<bash>
219+
printenv | grep 'PROJECT_VARIABLE'
220+
</bash>
221+
</example usages>
222+
223+
<final reminder>
224+
1. Do Not Compile or Run Code:
225+
* Your investigation is limited to reading and interpreting information using bash commands.
226+
</final reminder>
227+
</tool>

experiment/oss_fuzz_checkout.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232

3333
BUILD_DIR: str = 'build'
3434
GLOBAL_TEMP_DIR: str = ''
35-
ENABLE_CACHING = bool(int(os.getenv('OFG_USE_CACHING', '1')))
35+
ENABLE_CACHING = bool(int(os.getenv('OFG_USE_CACHING', '0')))
3636
# Assume OSS-Fuzz is at repo root dir by default.
3737
# This will change if temp_dir is used.
3838
OSS_FUZZ_DIR: str = os.path.join(
@@ -436,12 +436,13 @@ def create_ossfuzz_project(benchmark: benchmarklib.Benchmark,
436436
return generated_project_path
437437

438438

439-
def prepare_project_image(benchmark: benchmarklib.Benchmark) -> str:
439+
def prepare_project_image(benchmark: benchmarklib.Benchmark,
440+
project_name: str = '') -> str:
440441
"""Prepares original image of the |project|'s fuzz target build container."""
441442
project = benchmark.project
442-
image_name = f'gcr.io/oss-fuzz/{project}'
443-
generated_oss_fuzz_project = f'{benchmark.id}-{uuid.uuid4().hex}'
443+
generated_oss_fuzz_project = project_name or f'{benchmark.id}-{uuid.uuid4().hex}'
444444
generated_oss_fuzz_project = rectify_docker_tag(generated_oss_fuzz_project)
445+
image_name = f'gcr.io/oss-fuzz/{generated_oss_fuzz_project}'
445446
create_ossfuzz_project(benchmark, generated_oss_fuzz_project)
446447

447448
if not ENABLE_CACHING:

prompts/agent/crash_analyzer-priming.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,7 @@ Below is relevant project function code:
1717
</code>
1818

1919
To help analyze the root cause behind the runtime crash, you can leverage GDB tool and BASH tool to obtain information.
20+
21+
Instructions
22+
1. You MUST use the GDB tool to analyze the crash before making a conclusion.
23+
2. DO NOT hallucinate the output of the provided tools. You must use the tools and use only results provided by the tools.

tool/container_tool.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,8 @@ def tutorial(self) -> str:
4545
def _prepare_project_image(self, project_name: str) -> str:
4646
"""Prepares the project's OSS-Fuzz docker image and returns the image name.
4747
"""
48-
image_name = oss_fuzz_checkout.prepare_project_image(self.benchmark)
48+
image_name = oss_fuzz_checkout.prepare_project_image(
49+
self.benchmark, project_name)
4950
if image_name:
5051
return image_name
5152
raise Exception(f'Failed to build image for {project_name}')

0 commit comments

Comments
 (0)