Improve content audit [RHELDST-13955] #252

Gdetrane · 2025-02-26T15:44:48Z

Refactor content audit celery task, reorganize and simplify the task by implementing non modular RPM auditing only.

codecov-commenter · 2025-02-26T15:48:14Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (8e9e1aa) to head (56fa545).

Additional details and impacted files

@@            Coverage Diff            @@
##            master      #252   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           19        20    +1     
  Lines         1199      1185   -14     
=========================================
- Hits          1199      1185   -14

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tests/test_content_audit_task.py

drepelov · 2025-02-27T09:14:37Z

tests/test_content_audit_task.py

-    """
+            for log in expected_logs:
+                assert log in caplog.text
+            for bad_log in unexpected_logs:


I'm just thinking - what happens if there are some other unexpected logs in the logs? Like this you will not catch those, right? Do you think it would be possible to assert the whole log at once? In such case you could also define the whole expected log in a file under tests/data so it's easier to read.

I initially added unexpected log messages, thinking there would be more I could add, but figured it's unnecessary and I'm removing it.
I'd say the log now is not very long so it can be asserted by checking that each expected message is in the caplog.text (ordering of messages may vary I think and that's what could be tricky when asserting the whole log as a string, but I'm not 100% sure), but if we add the modular auditing later on, I have a feeling it's gonna blow up.
So what I would do is that when we add the modular auditing, we can create a log file in tests/data as you suggested, but perhaps have a more structured file (json probably) so it's easier to match log messages in relation to content, based on ids used as keys, for example.
I'm not sure, what do you think?

tests/test_content_audit_task.py

ubi_manifest/worker/tasks/auditing.py

Refactor content audit celery task, reorganize and simplify the task by implementing non modular RPM auditing only.

drepelov · 2025-02-27T15:41:09Z

ubi_manifest/worker/tasks/auditing.py

+        for in_repo in self.in_repos:
+            future_rpm_units = search_units(
+                in_repo, [Criteria.true()], RpmUnit, unit_fields=RPM_FIELDS
+            )
+            non_modular_rpms = set(
+                get_n_latest_from_content(
+                    future_rpm_units.result(), modular_rpms=self.all_modular_filenames
+                )
+            )


The more I think about it, the more I'm afraid this would not work correctly. I think we need to fetch all modular filenames from all input repos at the beginning as well.
That's because like this we will filter from the input repos only the modular rpms which are in the output repos, but it would leave us with possible other modular rpms, which are not in output repos, and if one of these other modular rpms would have higher version than a non-modular rpm with the same name, the get_n_latest_from_content would return that modular rpm (instead of the non-modular rpm). The modular rpm would therefore have higher version than the non-modular rpm in output repo. And that would cause a false warning.
So, I think, in the content_audit_task() , right after you fetch all_modular_filenames from all out_repos, we then need to take the population_sources repos from all binary out_repos, fetch them from pulp and search for modular filenames in them as well, and update the all_modular_filenames to contain everything.

Gdetrane requested review from drepelov and rbikar as code owners February 26, 2025 15:44

Gdetrane force-pushed the improve-content-audit branch 2 times, most recently from 7709314 to 03e01d2 Compare February 26, 2025 18:18

drepelov requested changes Feb 27, 2025

View reviewed changes

drepelov reviewed Feb 27, 2025

View reviewed changes

ubi_manifest/worker/tasks/auditing.py Outdated Show resolved Hide resolved

Improve content audit [RHELDST-13955]

56fa545

Refactor content audit celery task, reorganize and simplify the task by implementing non modular RPM auditing only.

Gdetrane force-pushed the improve-content-audit branch from 03e01d2 to 56fa545 Compare February 27, 2025 13:47

drepelov reviewed Feb 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve content audit [RHELDST-13955] #252

Improve content audit [RHELDST-13955] #252

Gdetrane commented Feb 26, 2025

codecov-commenter commented Feb 26, 2025 •

edited

Loading

drepelov Feb 27, 2025

Gdetrane Feb 27, 2025 •

edited

Loading

drepelov Feb 27, 2025

Improve content audit [RHELDST-13955] #252

Are you sure you want to change the base?

Improve content audit [RHELDST-13955] #252

Conversation

Gdetrane commented Feb 26, 2025

codecov-commenter commented Feb 26, 2025 • edited Loading

Codecov Report

drepelov Feb 27, 2025

Choose a reason for hiding this comment

Gdetrane Feb 27, 2025 • edited Loading

Choose a reason for hiding this comment

drepelov Feb 27, 2025

Choose a reason for hiding this comment

codecov-commenter commented Feb 26, 2025 •

edited

Loading

Gdetrane Feb 27, 2025 •

edited

Loading