-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
env_process: Refactor huge pages setup/cleanup steps #4054
Open
bgartzi
wants to merge
1
commit into
avocado-framework:master
Choose a base branch
from
bgartzi:env_process_refactoring-huge_pages
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+29
−33
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
from virttest import env_process, test_setup, utils_libvirtd | ||
from virttest.test_setup.core import Setuper | ||
|
||
|
||
class HugePagesSetup(Setuper): | ||
def setup(self): | ||
# If guest is configured to be backed by hugepages, setup hugepages in host | ||
if self.params.get("hugepage") == "yes": | ||
self.params["setup_hugepages"] = "yes" | ||
if self.params.get("setup_hugepages") == "yes": | ||
h = test_setup.HugePageConfig(self.params) | ||
env_process._pre_hugepages_surp = h.ext_hugepages_surp | ||
suggest_mem = h.setup() | ||
if suggest_mem is not None: | ||
self.params["mem"] = suggest_mem | ||
if not self.params.get("hugepage_path"): | ||
self.params["hugepage_path"] = h.hugepage_path | ||
if self.params.get("vm_type") == "libvirt": | ||
utils_libvirtd.Libvirtd().restart() | ||
|
||
def cleanup(self): | ||
if self.params.get("setup_hugepages") == "yes": | ||
h = test_setup.HugePageConfig(self.params) | ||
h.cleanup() | ||
if self.params.get("vm_type") == "libvirt": | ||
utils_libvirtd.Libvirtd().restart() | ||
env_process._post_hugepages_surp = h.ext_hugepages_surp |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_pre_hugepages_surp
and_post_hugepages_surp
are for hugepage leak check, with this Setuper, I would suggest dropping them. by returning a variable afterdo_cleanup
. soleak_num = _post_hugepages_surp - _pre_hugepages_surp
can short toleak_num = <new_var>
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood correctly, you are proposing to make the
cleanup
method of thisSetuper
return theleak_num
value?That would also involve updating the
SetupManager
behavior to meet the demands ofHugePagesSetup
. If that's the case, we would permit everySetuper
return a value, which goes against the current implementation. We would then have to update theSetupManager
do_cleanup
logic to handle that.In my opinion, if we were to do that, we would have to think of a protocol of some sort to make this implementable by each
Setuper
instead of adding specificSetuper
logic into the rather generalSetupManager
. Could something like adding apost_cleanup_check
function into the coreSetuper
and calling it after thecleanup
method has been called fromSetupManager.do_cleanup
be the answer to that issue?I also thought on other approaches, as implementing a core
Singleton
abstraction, so we would be able to reachSetuper
instances instead of classes from withinenv_process
so we could call extra functions on demand after the cleanup would have terminated. However, this approach sounds too complex for a workaround, and it could introduce further issues, asSetuper
instances "surviving" from one test case run to the next one.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The complex solution you mentioned is that Setuper can return something. I agree we can do this, but not now, the implementation can closely combine env_process and Setuper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bgartzi @PaulYuuu
First, I am confused about checking the huge page leak, whenever it will result in the leak error during the post process if the user does not deallocate the huge page by setting
hugepages_deallocate = no
.Because it just deallocates the huge page memory with
self.deallocate
is True.avocado-vt/virttest/test_setup/__init__.py
Lines 695 to 697 in 1d027e8
So I think that there is a potential logical issue about the leak checking with
self.deallocate
being False, sum it up, the checking leak needs to work with the deallocation of the huge page otherwise it will raise thehuge pages leaked!
error all the time.Hi, @PaulYuuu Please correct me if I misunderstood the implementation of the huge page configuration and the related post_process. Thanks.
Second, Is it possible that we integrate the checking leak part into the
cleanup()
instead of calling it independently in the post process? From my understanding of the code of the post process, we can see that each test case will check the huge page whether it is leaked, if the test case needs the huge page. Hi @PaulYuuu Could you share the reason for calling the checking leak here instead of thecleanup()
My reason is that we could use a unified way to raise all errors by the following:
avocado-vt/virttest/env_process.py
Lines 1608 to 1609 in 6ea7c8f
The
err
contains all the runtime errors during the postprocess. And I think the checking huge pages should be covered by theerr
avocado-vt/virttest/env_process.py
Lines 1610 to 1612 in 6ea7c8f
As a result, we could let the user know the error occurred in the post process, rather than it is a
exceptions.TestFail
.BTW, we hope that it could raise the related
env_process
error during the pre-process and post-process. The test case raises theexceptions.TestFail
So I suggest that we could refactor the previous implementation by integrating the checking leak part into the
cleanup
. And @bgartzi you could refactor theHugePagesSetup
in the normal way.Please let me know your opinions. @PaulYuuu @bgartzi
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, IIRC, if
hugepages_deallocate
is not set, we will never hit the error,HugePages_Surp
only changes when leaked or using THP, which is unrelated to setup/teardown. And yes we should moveself.ext_hugepages_surp = utils_memory.get_num_huge_pages_surp()
out from theif self.deallocate
in cleanup.Without the Setuper, cannot handle huge page leak checks in the cleanup in the current pre/post process context, pre and post will init a separate HugePageConfig class, so we cannot directly check
ext_hugepages_surp
cross in 2 HugePageConfig.For now, as we have each setuper, then check leak is possible, and we can do it, even in this PR or an individual one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following each discussion thread:
Deallocation + leak check
What should we do if deallocation doesn't happen? Should we check for leaks?
I see there isn't any test setting
hugepages_deallocte
. I see that it is set tohugepages_deallocate = yes
by default here:avocado-vt/virttest/shared/cfg/guest-hw.cfg
Lines 220 to 223 in 3fdae9b
Based on 5b4db50, I see the deallocate option was added due to some small-memory PPC systems. So, it could be set to
no
for that reason.I'm not a memory or hugepages expert, so I'm not sure if I understand this properly. But, based on what I understand by allocation/deallocation, here are my question:
utils_memory.get_num_huge_pages_surp
output value change? In other words, does the surp huge pages value change if we mount/unmount hugepages?ext_hugepages_surp
beforesetup
. So if we measure it aftersetup
but beforecleanup
, the obtained value would be different. If we would run the leak check (AKA what triggers the TestFail) betweensetup
andcleanup
I'd expect the test to fail. Is that assumption correct?deallocate
, we wouldn't runcleanup
at all, so I'd expect theutils_memory.get_num_huge_pages_surp
value to be the same we would obtain right after setup. That is, different to pre-setup
, where we measuredext_hugepages_surp
for the last time. If that's correct, wouldn't gettingself.ext_hugepages_surp = utils_memory.get_num_huge_pages_surp()
out from theif self.deallocate
clause cause all tests not deallocating (few, as far as I know) to fail?Again, this logical reasoning is based on my limited understanding on the topic, so please, correct me.
Leak check failure trigger
Where should we trigger the failure/error if we detect a leak?
@YongxueHong, as you suggest, I would refactor the piece of code running the check into
HugePagesSetup.cleanup
. However, that brings me some concerns about the raised error type:TestFail
. That sets the test status to "Failure".HugePagesSetup.cleanup
, theSetupManager
will gather the raised exception (together with any other sort of exception) and will raise aRuntimeError
. This would set the test status to "Error".I believe we should limit ourselves to just postprocessing the test environment here, not declaring the result of the test during cleanup. However, I also understand that if we only can check
ext_hugepages_surp
only after huge pagesumount
, test cases don't have a way of checking that by themself either. If the spec of the huge pages tests declare that they should fail if there is a leak, then I guess we can't decide whether an error will suffice or not. So long story short: I'm not sure about masking theTestFail
the leak check raises under aRuntimeError
.I'd like to hear your thoughts about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bgartzi
From my understanding of the env_proces, I think the purpose of raising the
RuntimeError
at the end of the post process is to collect all the errors and not stop to perform the nextcleanup
if there is one failed cleanup, so we can not raise each error type and in the meanwhile make sure all thecleanup
can be executed.In case we want to add another checkpoint to raise a TestFail later, how do we guarantee that all checkpoints will be executed? So I think we can let the
RuntimeError
(a unified error) cover the errors from eachcleanup
.I think the key point is to understand the main responsibility of the
post-process
, even thepre-process
.I think the preprocess is mainly responsible for preparing the environment for the test case, for example, setting it up, installing the package, and so on. Instead, the
post process
works for cleaning up and uninstalling the package to keep the environment pure for the next test case. (We can not maintain and guide others in developing it if we do not define both the preprocess and postprocess.)And if some errors occurred during the post process, we can not think it belongs to the TestFail or TestError, since those errors do not affect the result of the test case from the test case perspective, it just did some teardown steps finally.
Therefore, If this checkpoint is one step for the test case, I prefer to let the test case handle it, rather than in the post-process, otherwise, it should follow up the rule of the post-process which is to raise all the errors with a unified error type(Whatever the type of the error).
Sum it up: Let the test case raise the
TestError
,TestFail
, and so on, instead in the pre-process and post-process.Please let me know your thoughts.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's how
cleanup
works. There's one way that comes to my mind, see:However, that implies some sort of error/fail/warning sort of hierarchy, and that we support postprocess steps to raise
TestFails
as well as other sort of errors. However, we already agreed that that is not a behavior we want to support. And if we did, I think that would be worth of a whole new separate patch.As you mention, yes, I also agree we can just merge all errors into the
RuntimeError
that theSetuper
'scleanup
raises.The problem is that if we do mask the leak detection under the
RuntimeError
, the behavior users (hugepages tests) would expect from avocado-vt will change. If you, as avocado-vt maintainers agree on changing that behavior, I'm also okay with that. Which based on the comments below, I think is the case.Yes, I also agree that would be the ideal situation. However, based on my understanding, hugepage tests need the
cleanup
to be executed, that is they need to wait until hugepages are released until they check whether a leak happened or not. That's why theTestFail
in this case was written inpostprocess
instead of in the test case itself, I think.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @MiriamDeng @fbq815 @zhenyzha
How about your opinions from the feature owner's perspective?
Could we move the checking huge page leak to the
cleanup
or keep this checking in the post process?Please let me know your views. Thanks.
CC @mcasquer