Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve content audit [RHELDST-13955] #252

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Gdetrane
Copy link
Contributor

Refactor content audit celery task, reorganize and simplify the task by implementing non modular RPM auditing only.

@codecov-commenter
Copy link

codecov-commenter commented Feb 26, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (8e9e1aa) to head (56fa545).

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #252   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           19        20    +1     
  Lines         1199      1185   -14     
=========================================
- Hits          1199      1185   -14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Gdetrane Gdetrane force-pushed the improve-content-audit branch 2 times, most recently from 7709314 to 03e01d2 Compare February 26, 2025 18:18
"""
for log in expected_logs:
assert log in caplog.text
for bad_log in unexpected_logs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just thinking - what happens if there are some other unexpected logs in the logs? Like this you will not catch those, right? Do you think it would be possible to assert the whole log at once? In such case you could also define the whole expected log in a file under tests/data so it's easier to read.

Copy link
Contributor Author

@Gdetrane Gdetrane Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially added unexpected log messages, thinking there would be more I could add, but figured it's unnecessary and I'm removing it.
I'd say the log now is not very long so it can be asserted by checking that each expected message is in the caplog.text (ordering of messages may vary I think and that's what could be tricky when asserting the whole log as a string, but I'm not 100% sure), but if we add the modular auditing later on, I have a feeling it's gonna blow up.
So what I would do is that when we add the modular auditing, we can create a log file in tests/data as you suggested, but perhaps have a more structured file (json probably) so it's easier to match log messages in relation to content, based on ids used as keys, for example.
I'm not sure, what do you think?

Refactor content audit celery task, reorganize and simplify the task by
implementing non modular RPM auditing only.
@Gdetrane Gdetrane force-pushed the improve-content-audit branch from 03e01d2 to 56fa545 Compare February 27, 2025 13:47
Comment on lines +272 to +280
for in_repo in self.in_repos:
future_rpm_units = search_units(
in_repo, [Criteria.true()], RpmUnit, unit_fields=RPM_FIELDS
)
non_modular_rpms = set(
get_n_latest_from_content(
future_rpm_units.result(), modular_rpms=self.all_modular_filenames
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about it, the more I'm afraid this would not work correctly. I think we need to fetch all modular filenames from all input repos at the beginning as well.
That's because like this we will filter from the input repos only the modular rpms which are in the output repos, but it would leave us with possible other modular rpms, which are not in output repos, and if one of these other modular rpms would have higher version than a non-modular rpm with the same name, the get_n_latest_from_content would return that modular rpm (instead of the non-modular rpm). The modular rpm would therefore have higher version than the non-modular rpm in output repo. And that would cause a false warning.
So, I think, in the content_audit_task() , right after you fetch all_modular_filenames from all out_repos, we then need to take the population_sources repos from all binary out_repos, fetch them from pulp and search for modular filenames in them as well, and update the all_modular_filenames to contain everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants