-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve content audit [RHELDST-13955] #252
base: master
Are you sure you want to change the base?
Improve content audit [RHELDST-13955] #252
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #252 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 19 20 +1
Lines 1199 1185 -14
=========================================
- Hits 1199 1185 -14 ☔ View full report in Codecov by Sentry. |
7709314
to
03e01d2
Compare
tests/test_content_audit_task.py
Outdated
""" | ||
for log in expected_logs: | ||
assert log in caplog.text | ||
for bad_log in unexpected_logs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just thinking - what happens if there are some other unexpected logs in the logs? Like this you will not catch those, right? Do you think it would be possible to assert the whole log at once? In such case you could also define the whole expected log in a file under tests/data so it's easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially added unexpected log messages, thinking there would be more I could add, but figured it's unnecessary and I'm removing it.
I'd say the log now is not very long so it can be asserted by checking that each expected message is in the caplog.text (ordering of messages may vary I think and that's what could be tricky when asserting the whole log as a string, but I'm not 100% sure), but if we add the modular auditing later on, I have a feeling it's gonna blow up.
So what I would do is that when we add the modular auditing, we can create a log file in tests/data as you suggested, but perhaps have a more structured file (json probably) so it's easier to match log messages in relation to content, based on ids used as keys, for example.
I'm not sure, what do you think?
Refactor content audit celery task, reorganize and simplify the task by implementing non modular RPM auditing only.
03e01d2
to
56fa545
Compare
for in_repo in self.in_repos: | ||
future_rpm_units = search_units( | ||
in_repo, [Criteria.true()], RpmUnit, unit_fields=RPM_FIELDS | ||
) | ||
non_modular_rpms = set( | ||
get_n_latest_from_content( | ||
future_rpm_units.result(), modular_rpms=self.all_modular_filenames | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I think about it, the more I'm afraid this would not work correctly. I think we need to fetch all modular filenames from all input repos at the beginning as well.
That's because like this we will filter from the input repos only the modular rpms which are in the output repos, but it would leave us with possible other modular rpms, which are not in output repos, and if one of these other modular rpms would have higher version than a non-modular rpm with the same name, the get_n_latest_from_content
would return that modular rpm (instead of the non-modular rpm). The modular rpm would therefore have higher version than the non-modular rpm in output repo. And that would cause a false warning.
So, I think, in the content_audit_task()
, right after you fetch all_modular_filenames
from all out_repos, we then need to take the population_sources
repos from all binary out_repos, fetch them from pulp and search for modular filenames in them as well, and update the all_modular_filenames
to contain everything.
Refactor content audit celery task, reorganize and simplify the task by implementing non modular RPM auditing only.