Revert "Refactor and fix config parsing and utility functions"#40
Revert "Refactor and fix config parsing and utility functions"#40jeFF0Falltrades merged 1 commit intomasterfrom
Conversation
Summary of ChangesHello @jeFF0Falltrades, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request reverts all changes introduced in PR #37, which aimed to refactor and fix config parsing and utility functions. The revert restores the codebase to its state prior to those modifications, effectively undoing changes related to type hints, error handling, type checking mechanisms, parameter naming conventions, logging, and custom attribute parsing logic. This action typically indicates that the previous changes either introduced regressions or did not function as intended. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request reverts a previous refactoring, which unfortunately re-introduces several bugs, performance issues, and deviations from Python best practices. The reverted code contains critical issues that will lead to runtime errors, such as TypeError and AttributeError, as well as significant performance degradation in one of the utility functions. I strongly recommend addressing the issues in the original pull request rather than reverting it, as the reverted code is less correct and less performant. I have detailed the specific issues in the comments below.
| if pd.Name.value.startswith( | ||
| ( | ||
| "Boolean_", | ||
| "BorderStyle_", | ||
| "Color_", | ||
| "Byte", | ||
| "Int32_", | ||
| "SizeF_", | ||
| "String_", | ||
| ) | ||
| "Boolean_", | ||
| "BorderStyle_", | ||
| "Color_", | ||
| "Byte", | ||
| "Int32_", | ||
| "SizeF_", | ||
| "String_", | ||
| ): |
There was a problem hiding this comment.
The startswith() method is being called with multiple string arguments. This is incorrect syntax and will cause a TypeError at runtime. The startswith() method accepts a single string or a tuple of strings as its first argument.
| if pd.Name.value.startswith( | |
| ( | |
| "Boolean_", | |
| "BorderStyle_", | |
| "Color_", | |
| "Byte", | |
| "Int32_", | |
| "SizeF_", | |
| "String_", | |
| ) | |
| "Boolean_", | |
| "BorderStyle_", | |
| "Color_", | |
| "Byte", | |
| "Int32_", | |
| "SizeF_", | |
| "String_", | |
| ): | |
| if pd.Name.value.startswith( | |
| ( | |
| "Boolean_", | |
| "BorderStyle_", | |
| "Color_", | |
| "Byte", | |
| "Int32_", | |
| "SizeF_", | |
| "String_", | |
| ) | |
| ): |
| key_hit = search(key_pattern, self._payload.data) | ||
| if not key_hit: | ||
| continue | ||
| key_rva = bytes_to_int(key_hit.groups()[0]) | ||
| raw_key_field = self._payload.field_name_from_rva(key_rva) | ||
| if raw_key_field in encrypted_strings: | ||
| key = encrypted_strings[raw_key_field] | ||
| self.key = self._derive_key(key) | ||
| break | ||
| key = encrypted_strings[raw_key_field] | ||
| self.key = self._derive_key(key) | ||
| break |
There was a problem hiding this comment.
This change introduces a critical bug by removing necessary safety checks. Without if not key_hit: continue, the code will raise an AttributeError if search() returns None. Similarly, removing if raw_key_field in encrypted_strings: will cause a KeyError if the key is not found. The previous implementation handled these cases correctly and should be restored to prevent runtime crashes.
| key_hit = search(key_pattern, self._payload.data) | |
| if not key_hit: | |
| continue | |
| key_rva = bytes_to_int(key_hit.groups()[0]) | |
| raw_key_field = self._payload.field_name_from_rva(key_rva) | |
| if raw_key_field in encrypted_strings: | |
| key = encrypted_strings[raw_key_field] | |
| self.key = self._derive_key(key) | |
| break | |
| key = encrypted_strings[raw_key_field] | |
| self.key = self._derive_key(key) | |
| break | |
| key_hit = search(key_pattern, self._payload.data) | |
| if not key_hit: | |
| continue | |
| key_rva = bytes_to_int(key_hit.groups()[0]) | |
| raw_key_field = self._payload.field_name_from_rva(key_rva) | |
| if raw_key_field in encrypted_strings: | |
| key = encrypted_strings[raw_key_field] | |
| self.key = self._derive_key(key) | |
| break |
| decoded_config.update(item_data) | ||
| # UrlHost is a marker of a special case until this can be standardized | ||
| if len(decoded_config) < min_config_len and "UrlHost" not in decoded_config: | ||
| if len(decoded_config) < min_config_len and "UrlHost" not in item_data: |
There was a problem hiding this comment.
This check is likely incorrect. It checks if "UrlHost" is in item_data, but item_data only contains the data for the last config item processed in the loop. The check should be against decoded_config, which accumulates all parsed config items throughout the loop.
if len(decoded_config) < min_config_len and "UrlHost" not in decoded_config:| for ca in self.dotnetpe.net.mdtables.CustomAttribute.rows: | ||
| if ( | ||
| ca.Parent.row_index == pd_row_index + 1 | ||
| ): # CustomAttribute Parent index is 1-based |
There was a problem hiding this comment.
This nested loop introduces a significant performance regression. By removing the pre-computed ca_map and instead iterating over all CustomAttribute.rows for each Property, the complexity of this section increases from roughly O(N+M) to O(N*M). For files with many properties and custom attributes, this will be noticeably slower. The previous implementation using a map was much more efficient and should be restored.
| self.report["key"] = ( | ||
| self._decryptor.key.hex() | ||
| if self._decryptor is not None and self._decryptor.key is not None | ||
| else "None" | ||
| ) |
There was a problem hiding this comment.
This logic for setting self.report["key"] is incorrect. It unconditionally calls .hex() on self._decryptor.key. However, self._decryptor.key can be a str (as defined in ConfigDecryptor), which does not have a .hex() method. This will lead to an AttributeError at runtime. The previous implementation correctly handled both bytes and other types and was more robust.
key_hex = "None"
if self._decryptor is not None and self._decryptor.key is not None:
if isinstance(self._decryptor.key, bytes):
key_hex = self._decryptor.key.hex()
else:
key_hex = str(self._decryptor.key)
self.report["key"] = key_hex| except ConfigParserException as e: | ||
| last_exc = e | ||
| logger.debug(f"Error decrypting {k}: {e}") | ||
| print("error", e) |
There was a problem hiding this comment.
Using print() for error reporting is not a good practice in a library or application. It bypasses the configured logging system, making it difficult to control output (e.g., verbosity, format, destination). It's better to use the logging module, as was done in the previous version of the code.
| print("error", e) | |
| logger.debug(f"Error decrypting {k}: {e}") |
| harcoded_host = self._payload.user_string_from_rva(bytes_to_int(rva)) | ||
| if harcoded_host != ".": | ||
| hardcoded_hosts.append(harcoded_host) |
There was a problem hiding this comment.
There is a typo in the variable name harcoded_host. It should be hardcoded_host. This typo is repeated on the following lines.
| harcoded_host = self._payload.user_string_from_rva(bytes_to_int(rva)) | |
| if harcoded_host != ".": | |
| hardcoded_hosts.append(harcoded_host) | |
| hardcoded_host = self._payload.user_string_from_rva(bytes_to_int(rva)) | |
| if hardcoded_host != ".": | |
| hardcoded_hosts.append(hardcoded_host) |
| raise ConfigParserException("All decryptors failed") | ||
|
|
||
| elif isinstance(item, config_item.ByteArrayConfigItem): | ||
| elif type(item) is config_item.ByteArrayConfigItem: |
There was a problem hiding this comment.
| item_data[field_name] = v | ||
| if len(item_data) > 0: | ||
| if isinstance(item, config_item.EncryptedStringConfigItem): | ||
| if type(item) is config_item.EncryptedStringConfigItem: |
There was a problem hiding this comment.
Using type(item) is ... is generally discouraged in Python in favor of isinstance(item, ...). isinstance correctly handles inheritance, so if a subclass of EncryptedStringConfigItem were ever used, this check would still work. Using type makes the code less robust and extensible.
if isinstance(item, config_item.EncryptedStringConfigItem):| try: | ||
| self._dnpp = DotNetPEPayload(file_path, yara_rule, data) | ||
| except Exception as e: | ||
| raise e |
Reverts #37