-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
♻️ refactor scanner.py to reduce repetative if statements. #228
♻️ refactor scanner.py to reduce repetative if statements. #228
Conversation
I've created this PR to improve the way we express the idea. Most of the crawlers have already been implemented, but we still have a few more to go, specifically #229, #230, and #231. However, we won't be able to completely remove the old crawl.py just yet. We need to have a discussion about the following points:
|
What if we add |
For GKE we can move the specific credential logic into a |
make sense 👍 |
Yep optional makes sense |
Ok, let's discuss questions one by one. Starting with the last one. |
I am a bit fallen behind of refactoring, apologizing if that was already discussed.
|
|
Hi @mshudrak thanks for your response 🙏 You might already be up to speed with the conversation, but I'll attempt to provide further clarification in this comment. Hopefully, it will be helpful 💪 The Crawler interface is defined in the interface_crawler.py. Now if we look at get_bucket_names, get_gke_clusters, get_gke_images, get_sas_for_impersonation in the crawl.py these methods have slightly different signature than the interface. So we are discussing about that 🙂 If we can address above-mentioned function the only leftover will be #231 addresses the |
Thanks for the explanation. All right, I think I understood it right. It makes sense to have optional flag for storage related calls and making specific implementation in get_service for get_gke_clusters. As for the get_gke_images, I think we can easily refactor it to receive credentials instead of access_token. Access_token can easily be fetched from credentials. We actually do it here gcp_scanner/src/gcp_scanner/scanner.py Line 327 in ba8dd9a
before passing into this function. get_sas_for_impersonation is a bit different. It might not actually be a good fit for the crawler interface since it is just parsing iam_policy. The best place would be in the future misc.py but for now we can move it into the scanner.py
|
@under-hill , As I was refactoring Edit: I have created this PR #234 with a possible refactoring attempt. Pleae take a look @under-hill @0xDeva @mshudrak |
@sudiptob2 I'd relocate it to Edit: Ok, I see you created a separate PR with refactoring. No strong preference on that. Let's keep it as you implemented, then. |
…factor-repetative-ifs
Hello everyone, this PR is getting quite large. As part of my work, I have moved the At this point, I believe it's appropriate to mark this PR as ready for review. I would love to hear your thoughts on this approach. If there are no bugs and merged, we can address any refinement suggestions through separate tickets. What do you think? mentions: @mshudrak @under-hill @0xDeva |
Makes sense, I will review this one :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Thanks Sudipto. I left some minor comments. PTAL.
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# Licensed under the Apache License, Version 2.0 (the "License"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this extra space here and below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
# limitations under the License. | ||
|
||
|
||
"""The module to query GCP resources via RestAPI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add new description for this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description added
import requests | ||
from google.cloud import container_v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to change order of these two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conventioanlly imports are sorted more generic to least gneric order. Basically I have used the following convention.
Standard library imports
Third-party imports
Application-specific imports
This is also described in google's python styleguide here. [Refer to the example code block also]
previous_response=response | ||
) | ||
while request is not None: | ||
response = request.execute() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious why pylint did not catch it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, interesting, 4 spaces shouldn't be there in the first place.
crawler_config = scan_config.get(crawler_name) | ||
# add gcs output path to the config. | ||
# this path is used by the storage bucket crawler as of now. | ||
crawler_config['gcs_output_path'] = gcs_output_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is a better solution rather than sending it into every crawler. We can think about that in the future but for now it is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this would be a good refinement ticket. Will create once once this PR is complete.
@@ -435,14 +257,14 @@ def crawl_loop(initial_sa_tuples: List[Tuple[str, Credentials, List[str]]], | |||
if impers is not None and impers.get('impersonate', False) is True: | |||
iam_client = iam_client_for_credentials(credentials) | |||
if is_set(scan_config, 'iam_policy') is False: | |||
iam_policy = crawl.get_iam_policy( | |||
iam_policy = CrawlerFactory.create_crawler('iam_policy').crawl( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add a short comment on why this crawler is used outside of new loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the additional crawlers just after the new loop and added a section comment. Now I think it is more understandable why they are outside of the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
closes #227
related to #153
I've created this PR to improve the way we express the idea. Most of the crawlers have already been implemented, but we still have a few more to go, specifically #229, #230, and #231.
However, we won't be able to completely remove the old crawl.py just yet. We need to have a discussion about the following points:
crawl.py
?project_list
crawler does not match with the interface, how to refactor this?Also, let's brainstorm ways to polish and enhance this draft.
mentions: @mshudrak @0xDeva @peb-peb and others.