|
| 1 | +#+title: Reverse Engineering |
| 2 | + |
| 3 | +* Start here |
| 4 | +** General tips |
| 5 | +- figure out what the goal is |
| 6 | + - there is usually a clear "win condition", such as printing a flag |
| 7 | +- figure out what the input is |
| 8 | + - some parts of the program don't change depending on the input |
| 9 | + - it might not matter what the input is! |
| 10 | + - how does the input get used? |
| 11 | +** A note about past meetings |
| 12 | +SIGPwny has already ran two meetings on this topic! Check out [[https://sigpwny.com/meetings/fa2023/2023-09-17/][Reverse Engineering Setup]] and [[https://sigpwny.com/meetings/fa2023/2023-09-21/][Reverse Engineering I]]. We have slides and recorded meeting presentations, which you may prefer more than these notes. |
| 13 | +* Basics |
| 14 | +** What it is |
| 15 | +Reverse engineering is the process of understanding computer programs. The goal is to figure out what the program does. Usually, programs are difficult to understand, either intentionally or unintentionally. |
| 16 | +** Main types of analysis |
| 17 | +- Static analysis: reading code, using tools to understand code /without running it/ |
| 18 | + - Good place to start, not great if there's a lot of code |
| 19 | +- Dynamic analysis: running code, inspecting or modifying the program as it's running |
| 20 | + - Generally faster, captures entire program environment |
| 21 | +** A word on abstractions |
| 22 | +- Abstract (higher level) programs are easier to understand |
| 23 | +- Languages like Python and JavaScript are higher level |
| 24 | +- Languages like assembly and C are lower level |
| 25 | +- As you modify a program to become more abstract (to better understand it), you lose some information in the process |
| 26 | +* Tools |
| 27 | +** Bytecode viewer |
| 28 | +*** Installation |
| 29 | +- see https://github.com/Konloch/bytecode-viewer |
| 30 | +*** When to use |
| 31 | +This program is used to decompile Java files, which usually have the .jar extension |
| 32 | +*** How to use |
| 33 | +Simply import the java jar program into the bytecode viewer and see the decompiled java code! This works by recovering the java code from the compiled java bytecode. |
| 34 | +** Ghidra |
| 35 | +*** Installation |
| 36 | +- see [[https://sigpwny.com/meetings/fa2023/2023-09-17/][Reverse Engineering Setup]] |
| 37 | +- or, just read the [[https://ghidra-sre.org/InstallationGuide.html][installation guide]] |
| 38 | +*** When to use |
| 39 | +Use this tool for binaries, not python scripts. Ghidra "decompiles", or simplifies, binary programs into more human-readable "pseudo-C" code. |
| 40 | + |
| 41 | +Ghidra is a *static analysis* tool. |
| 42 | +*** Interface |
| 43 | +[[./images/ghidra1.png]] |
| 44 | + |
| 45 | +Once you open a program in Ghidra, click "OK" for all the auto analyze popups (there should be several). Now, the interface should look like the above image. |
| 46 | + |
| 47 | +(1) is the decompiled code output. This is what you will be looking at for the most part. You can rename variables by clicking a variable and pressing =L=. Change the type by right clicking and selecting =Retype Variable=. |
| 48 | + |
| 49 | +(2) is the assembly instructions. This won't be very helpful if you don't know assembly, and can be mostly ignored for the challenges at Fall CTF. |
| 50 | + |
| 51 | +(3) is the "symbol tree". This shows you different named values that are present in the file. Click =Functions= and scroll down to select the =main= function. This shows you the first function that runs. |
| 52 | + |
| 53 | +[[./images/ghidra2.png]] |
| 54 | + |
| 55 | +Here we can see the =main= function in the symbol tree. If there is no =main=, click =_start= and see what that function calls. |
| 56 | + |
| 57 | +[[./images/ghdira3.png]] |
| 58 | + |
| 59 | +Above is a picture of the decompilation (disclaimer: this is not a challenge from Fall CTF). Almost every function you see will have an if statement with =__stack_chk_fail= at the bottom. This is a check for the "stack canary", which is not relevant to any challenges here. It may be of more interest in pwn challenge. The ~local_10 = *(long *)(in_FS_OFFSET + 0x28);~ line at the top sets up the stack canary and can also be ignored. |
| 60 | + |
| 61 | +Note that the variables are named with undescriptive names, such as =iVar1= and =local_28=. This is because the decompiler does not know the details of variables in the original function. As a result, it has to generate variable names. |
| 62 | +** GDB |
| 63 | +*** Installation |
| 64 | +- see [[https://sigpwny.com/meetings/fa2023/2023-09-17/][Reverse Engineering Setup]] |
| 65 | +*** When to use |
| 66 | +Similarly to Ghidra, use this tool for binaries, not python scripts. GDB is a debugger that runs programs, giving you the ability to stop, inspect, and modify code as it is executing. |
| 67 | + |
| 68 | +GDB is a *dynamic analysis* tool. |
| 69 | +*** Basics |
| 70 | +Run =gdb ./chal= on the command line, where =chal= is the name of the program. Note that you must be on Linux (WSL works too). This will not work for Apple Silicon Mac users. |
| 71 | + |
| 72 | +GDB will launch you into a program with a different terminal prompt, where each line starts with =(gdb)=. You interact with the program by typing in commands |
| 73 | +*** Commands |
| 74 | +- misc |
| 75 | + - =help <command>=: get help about any of the commands listed here |
| 76 | +- running |
| 77 | + - =run=: run the program from the start |
| 78 | + - =quit=: exit GDB |
| 79 | + - =start=: start the program and break on the =main= function |
| 80 | +- breakpoints |
| 81 | + - =break <func>+<offset>=: set a breakpoint at the function =<func>= with an offset =<offset>=. Useful to get the offset from the =disas= command |
| 82 | +- inspecting program |
| 83 | + - =disas <func>=: disassemble the =<func>= function |
| 84 | + - =info reg=: print all the registers |
| 85 | + - =x=: print data (see =help x= for more info) |
| 86 | + - =x/4gx 0x1234=: print 4 QWORDS (64-bit values) in hex starting at address =0x1234= |
| 87 | + - =x/10i $rip=: print 10 instructions starting at =$rip= (current instruction pointer) |
| 88 | + - =x/7wx $rsp=: print 7 WORDS (32-bit values) in hex starting at =$rsp= (stack pointer) |
| 89 | + - =x/8bd $rdi=: print 8 bytes in decimal starting at the address in =$rdi= |
| 90 | + - =set=: set values |
| 91 | + - ~set $rax=23~: sets =$rax= to 23 |
| 92 | + - ~set $rip+=4~: adds 4 to =$rip= |
| 93 | + - this skips the current instruction, if it is 4 bytes long |
| 94 | +*** General workflow |
| 95 | +- first, identify interesting places to set a breakpoint in Ghidra |
| 96 | +- use the assembly instructions window in Ghidra to see the offset to break at |
| 97 | +- run the program in GDB and set a breakpoint |
| 98 | +- modify or print values as desired |
| 99 | +- repeat until solved |
0 commit comments