Skip to content

Fix dict get #91

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 13 additions & 10 deletions src/graph_populator.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def construct_version_query(cls, input_json):
version = input_json.get('version')
description = ''
try:
if len(input_json.get('analyses', {}).get('metadata', {}).get('details')) > 0:
if len(input_json.get('analyses', {}).get('metadata', {}).get('details', [])) > 0:
description = input_json.get('analyses').get('metadata').get('details')[0].get(
'description', '')
description = cls._sanitize_text_for_query(description)
Expand Down Expand Up @@ -75,18 +75,18 @@ def construct_version_query(cls, input_json):
if 'code_metrics' in input_json.get('analyses', {}):
count = 0
tot_complexity = 0.0
languages = input_json.get('analyses').get('code_metrics').get('details', {}) \
languages = input_json.get('analyses').get('code_metrics', {}).get('details', {}) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you add a default {} to all the input_json.get('analyses').get(...) ?
If that's because you expect the 'analyses' to always be there, then it's actually better to use input_json['analyses'] because:

>>> {}.get('analyses').get('something')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'get'

>>> {}['analyses'].get('something')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'analyses'

Which of the above two error messages tells you more about what's happened ? (the second, right ?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. The second one is more explicit, so you suggest we replace all occurences of analyses.get() with ['analyses'] ?
That will require modifications in the if statements as well, or should we keep the if statements as is, and only change if code inside the condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my rule of thumb is to use dict['key'] if you are sure the 'key' is there (because you've already checked) or it makes no sense to continue if it wasn't there. Because you want to see an explicit exception it happens to not be there.
And you use dict.get('key', default_value) if your code can continue even without it.
I think you can leave the if statements as they are for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. We can make that check for analyses, because if analysis is not present then out data gathering did not happen properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be input_json.get('analyses', {}) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

.get('languages', [])
for lang in languages:
if lang.get('metrics', {}).get('functions', {}).get(
'average_cyclomatic_complexity'):
count += 1
tot_complexity += lang['metrics']['functions']['average_cyclomatic_complexity']
cm_avg_cyclomatic_complexity = str(tot_complexity / count) if count > 0 else '-1'
cm_loc = str(input_json.get('analyses').get('code_metrics').get('summary', {})
cm_loc = str(input_json.get('analyses').get('code_metrics', {}).get('summary', {})
.get('total_lines', -1))

cm_num_files = str(input_json.get('analyses').get('code_metrics').get('summary', {})
cm_num_files = str(input_json.get('analyses').get('code_metrics', {}).get('summary', {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be input_json.get('analyses', {}) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

.get('total_files', -1))
prp_version += "ver.property('cm_num_files',{cm_num_files});" \
"ver.property('cm_avg_cyclomatic_complexity', " \
Expand All @@ -107,16 +107,18 @@ def construct_version_query(cls, input_json):

# Add license details
if 'source_licenses' in input_json.get('analyses', {}):
licenses = input_json.get('analyses').get('source_licenses').get('summary', {}) \
.get('sure_licenses', [])
prp_version += " ".join(["ver.property('licenses', '{}');".format(l) for l in licenses])
licenses = input_json.get('analyses').get('source_licenses', {}).get('summary', {}) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be input_json.get('analyses', {}) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

.get('sure_licenses', [])

# Add CVE property if it exists
if 'security_issues' in input_json.get('analyses', {}):
cves = []
for cve in input_json.get('analyses', {}).get('security_issues', {}).get('details', []):
cves.append(cve.get('id') + ":" + str(cve.get('cvss', {}).get('score')))
for cve in input_json.get('analyses').get('security_issues', {}).get('details', []):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be input_json.get('analyses', {}) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tuxdna that condition is checked in the if condition above the for loop if 'security_issues' in input_json.get('analyses', {}): . So I did not check that again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should extract analyses and corresponding properties into local variables, and use those variables instead:

analyses = input_json.get('analyses', {})
security_issues = analyses.get('security_issues', {})
if security_issues:
    pass
    # your logic here

This way you will always catch the missing keys early on, and also reduce some errors due to duplicated code and typos.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will make that change.

cves.append(cve.get('id', '') + ":" +
str(cve.get('cvss', {}).get('score', '')))
prp_version += " ".join(["ver.property('cve_ids', '{}');".format(c) for c in cves])


# Get Metadata Details
if 'metadata' in input_json.get('analyses', {}):
Expand Down Expand Up @@ -187,7 +189,7 @@ def construct_version_query(cls, input_json):
@classmethod
def construct_package_query(cls, input_json):
"""Construct the query to retrieve detailed information of given package."""
pkg_name = input_json.get('package')
pkg_name = input_json.get('package', '')
Copy link
Contributor

@jpopelka jpopelka Feb 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it's OK, but, well, the question is 'what is better' ?
a) To somehow mask a missing piece to continue in any case (even if that means ingesting incorrect/incomplete data) ?
b) Or to fail early so that someone can check what's happened, fill an issue and fix the cause (even if that means more work) ?

:-)

Someone who prefers (b) would actually do (the same as in my first comment)
pkg_name = input_json['package'] so that you'd get an explicit KeyError: 'package' in case the input data happens to be incomplete.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made that change, because in the next line we are using the string contained in pkg_name to generate tokens in it.
Even better would be to first get the value and then check that it should not be None or Empty string, because either way it will be useless to ingest, and then throw an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly.
You have couple options how to do the 'checking' part, I'd probably use simple assert:

assert input_json.get('package'), "no or empty 'package'"
pkg_name = input_json['package']

This assert raises an AssertionError if the 'package' is not in input_json or the value there is None or ''

You can even have it in a function as we do in worker/utils or worker/base

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this PR still relevant?

ecosystem = input_json.get('ecosystem')
pkg_name_tokens = re.split('\W+', pkg_name)
prp_package = ""
Expand All @@ -206,7 +208,8 @@ def construct_package_query(cls, input_json):

# Get Github Details
if 'github_details' in input_json.get('analyses', {}):
gh_details = input_json.get('analyses').get('github_details').get('details', {})
gh_details = input_json.get('analyses').get(
'github_details', {}).get('details', {})
gh_prs_last_year_opened = str(gh_details.get('updated_pull_requests', {})
.get('year', {}).get('opened', -1))
gh_prs_last_month_opened = str(gh_details.get('updated_pull_requests', {})
Expand Down