emit number(0) (offset(0)??) for instructions like "XOR EAX, EAX" #2622

mike-hunhoff · 2025-03-10T16:05:37Z

williballenthin · 2025-03-10T17:01:55Z

good idea. i tend to think 'number: 0' we should definitely add. offset i'm not so sure.

dhruvak001 · 2025-03-12T00:51:28Z

As capa doesn’t track register values, so number: 0 isn’t detected for indirect zeroing. I had tried adding directly bytes pattern - bytes: 33C050 in the mentioned pr, but it still wont work. We can try to match the semantic intent using mnemonics and operands but the output would be same most prolly. Any other leads on how i can proceed with this ?

mr-tz · 2025-03-12T09:58:59Z

Adding number(0) makes sense. As a downside this will create lots of additional features potentially affecting performance?!

Maybe we can create stats/benchmarks for a few samples to verify.

williballenthin · 2025-03-18T12:18:06Z

As a downside this will create lots of additional features potentially affecting performance?!

I suspect we'd see this feature about once per function, since I've noticed many compilers will reserve a general purpose register for the zero value, when possible. Sometimes there might be a couple instances per function.

imho, this probably won't be a noticable amount of garbage, but I agree we should figure out a way to benchmark and verify.

v1bh475u · 2025-03-18T12:52:02Z

Hey, @williballenthin, @mike-hunhoff and @mr-tz . I would like to work on this issue. I have explored how a few capa backends extract features from the file. I am thinking of using cProfile and snakeviz for benchmarking and analyzing. Could you please give your suggestions/opinions for the same? Also, let me know if you recommend using some other tool for profiling/benchmarking.

williballenthin · 2025-03-18T15:57:58Z

I would start by using something like hyperfine to benchmark the invocation of capa against a fairly complex sample, like mimikatz. Only if there's a measurable difference before/after the changes should we dive into the line-level profiling. I think its likely that the performance delta is so small it is hidden by random noise.

v1bh475u · 2025-03-18T19:06:54Z

The first one in each image is from current master branch and second one is from my implementation for xor <reg>, <reg> in vivisect backend.
My implementation:

    if insn.mnem == "xor" and insn.opers[0].isReg() and insn.opers[1].isReg() and insn.opers[0].reg == insn.opers[1].reg:
        # for pattern like:
        #
        #     xor eax, eax
        #
        yield Number(0), ih.address
    # this is for both x32 and x64
    if not isinstance(oper, (envi.archs.i386.disasm.i386ImmOper, envi.archs.i386.disasm.i386ImmMemOper)):
        return

williballenthin · 2025-03-18T19:56:50Z

great @v1bh475u! the implementation looks nice and straightforward - good job.

i'm not sure how to read the screenshots, since there's two of them :-)
the second one looks like there's no real change in performance, but the first one might show something? what's the difference between the two screenshots?

v1bh475u · 2025-03-18T20:17:24Z

I think in the first screenshot, the 2nd test might have been influenced by other processes running on my machine. Let me retry to ensure that.

williballenthin · 2025-03-18T20:34:22Z

one thing that might also come into play is CPU throttling, which may penalize whatever test is going second (when CPU is warmer so might be throttled). so, i'd recommend doing the testing on dedicated non-laptop hardware (if possible), or alternating the tests more so that both cases have a chance to run "second", or letting the system cool down before running new tests.

v1bh475u · 2025-03-18T21:03:36Z

I think in the first screenshot, the 2nd test might have been influenced by other processes running on my machine. Let me retry to ensure that.

The average time for each test has increased but the difference is still barely 1-2 seconds.

v1bh475u · 2025-03-18T21:04:37Z

one thing that might also come into play is CPU throttling, which may penalize whatever test is going second (when CPU is warmer so might be throttled). so, i'd recommend doing the testing on dedicated non-laptop hardware (if possible), or alternating the tests more so that both cases have a chance to run "second", or letting the system cool down before running new tests.

Agreed. I will run further tests and update them here soon.

v1bh475u · 2025-03-19T00:07:30Z

The above is the result on running the test multiple times on my server machine with following specs:

OS: Ubuntu 22.04.5 LTS x86_64
Kernel: Linux 6.8.0-52-generic
CPU: Intel(R) Core(TM) i7-7700 (8) @ 4.20 GHz
Memory: 46.91 GiB
The 1st test was a bit skewed due to some heavy process going on the server. After that, all tests are stable. On average, the branch with new implementation took $63.184 \pm 0.352$ s while the current master branch took $62.821 \pm 0.285$ s.
Hence, in conclusion, I feel the addition of this feature increases the overhead barely by 1-2 seconds in the worst case.

v1bh475u · 2025-03-20T12:50:07Z

@williballenthin @mike-hunhoff @mr-tz , shall I start working on modifications similar to above one for the backends which I can test for?

williballenthin · 2025-03-20T13:32:43Z

yes, please do!

and, please add a test case to demonstrate the new functionality.

v1bh475u · 2025-03-20T18:49:34Z

@williballenthin , I have an older version of binaryninja given by one of my friends (version: 3.5.4526 Personal) and it is giving ImportError: cannot import name 'ILException' from 'binaryninja'. Is the current backend not meant to support this version binaryninja APIs?

mr-tz · 2025-03-20T18:58:15Z

Most likely, @xusheng6 did various updates around feature extraction.

v1bh475u · 2025-03-20T19:05:39Z

Also, we want the tests to run as fast as possible, right?

xusheng6 · 2025-03-21T07:57:02Z

@williballenthin , I have an older version of binaryninja given by one of my friends (version: 3.5.4526 Personal) and it is giving ImportError: cannot import name 'ILException' from 'binaryninja'. Is the current backend not meant to support this version binaryninja APIs?

Yeah version 3.5 is a bit old. Could you please test it against version 4.2 and see if it works?

williballenthin · 2025-03-21T07:58:05Z

@v1bh475u if you don't have a license for Binary Ninja, don't worry about it. Once you have a unit test and an implementation for IDA, I'd be happy to handle the Binary Ninja side. I expect it's only a few lines, like for IDA.

v1bh475u · 2025-03-21T10:20:43Z

@williballenthin I have made the PR and also added test case for xor eax, eax instruction. For the binaryninja backend, due to instruction matching occuring in LLIL, xor eax, eax transforms into eax = 0 which is handled by the existing logic. I am currently working on IDA backend.

zdwg42 · 2025-05-12T17:59:33Z

+1 for this feature, just ran into this while working on mandiant/capa-rules#1046. i want to match on 3 arguments to a function being 0 but MSVC is emitting xor's to zero out the registers:

xor     r9d, r9d
xor     r8d, r8d
xor     edx, edx
mov     rcx, [rbp+230h+arg_0]
call    cs:NtFsControlFile

mike-hunhoff mentioned this issue Mar 11, 2025

improve TCP/UDP socket creation matching #964 mandiant/capa-rules#993

Merged

v1bh475u linked a pull request Mar 21, 2025 that will close this issue

emit number(0) for insn like "xor eax, eax" #2639

Open

3 tasks

zdwg42 mentioned this issue May 12, 2025

add first set of exploit detection rules mandiant/capa-rules#1046

Merged

emit number(0) (offset(0)??) for instructions like "XOR EAX, EAX" #2622

emit number(0) (offset(0)??) for instructions like "XOR EAX, EAX" #2622

Comments

mike-hunhoff commented Mar 10, 2025

williballenthin commented Mar 10, 2025

Uh oh!

dhruvak001 commented Mar 12, 2025

Uh oh!

mr-tz commented Mar 12, 2025

Uh oh!

williballenthin commented Mar 18, 2025

Uh oh!

v1bh475u commented Mar 18, 2025

Uh oh!

williballenthin commented Mar 18, 2025

Uh oh!

v1bh475u commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

williballenthin commented Mar 18, 2025

Uh oh!

v1bh475u commented Mar 18, 2025

Uh oh!

williballenthin commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

v1bh475u commented Mar 18, 2025

Uh oh!

v1bh475u commented Mar 18, 2025

Uh oh!

v1bh475u commented Mar 19, 2025

Uh oh!

v1bh475u commented Mar 20, 2025

Uh oh!

williballenthin commented Mar 20, 2025

Uh oh!

v1bh475u commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mr-tz commented Mar 20, 2025

Uh oh!

v1bh475u commented Mar 20, 2025

Uh oh!

xusheng6 commented Mar 21, 2025

Uh oh!

williballenthin commented Mar 21, 2025

Uh oh!

v1bh475u commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zdwg42 commented May 12, 2025

Uh oh!

v1bh475u commented Mar 18, 2025 •

edited

Loading

williballenthin commented Mar 18, 2025 •

edited

Loading

v1bh475u commented Mar 20, 2025 •

edited

Loading

v1bh475u commented Mar 21, 2025 •

edited

Loading