-
Notifications
You must be signed in to change notification settings - Fork 905
bugfix: Setting OMPI_MPI_THREAD_LEVEL to a value different than requested crashes #13211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
bugfix: Setting OMPI_MPI_THREAD_LEVEL to a value different than requested crashes #13211
Conversation
b71461f
to
b481893
Compare
b481893
to
1a0d2f7
Compare
Hello! The Git Commit Checker CI bot found a few problems with this PR: 1a0d2f7: review comments
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
8ad7579
to
fe5d20e
Compare
Hello! The Git Commit Checker CI bot found a few problems with this PR: b056c23: OMPI_MPI_THREAD_LEVEL can now take 'multiple' 'MPI...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
b056c23
to
5059134
Compare
Hold on a little on this one, I have a fancier solution I just didn't had the time to integrate it. The code is attached below, and it allows for partial matching for as long as the match is unique. As an example "s" will not be accepted because int check_env_value(const char** valid_prepositions, const char** keywords, int nb_keywords, const char* value)
{
char *prep = NULL, *token = (char*)value /* full match */;
int pidx = 0, v = strtol(value, &prep, 10), found = -1;
if ((0 == v) && (prep != value))
return v;
while(NULL != (prep = (char*)valid_prepositions[pidx])) {
if( 0 == strncasecmp(prep, value, strlen(prep)) ) {
token = value + strlen(prep);
break; /* got a token let's find a match */
}
pidx++;
}
for(int i = 0; i < nb_keywords; i++) {
if( 0 == strncasecmp(keywords[i], token, strlen(token)) ) {
if( -1 != found ) { /* not the first match, bail out */
return -1;
}
found = i;
}
}
return found;
}
static const char* keywords[] = {"single", "serialized", "funneled", "multiple"};
static const char* prepositions[] = {"mpi_thread_", "thread_", NULL}; |
I think you have perms to push in my branch if you want to do that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're going to need to get this documented, too. It's cool new functionality, but if it doesn't appear in a man page and/or some other docs, no one will ever use it.
`requested` in `MPI_Init_thread` would invoke the error handler, even though it is an useful override in some threaded library use cases. Signed-off-by: Aurelien Bouteiller <[email protected]>
(single,etc) in addition to numeric 0-3 values Signed-off-by: Aurelien Bouteiller <[email protected]>
54682e4
to
0f4673c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the liberty of updating the MPI_Init* man pages. You're welcome. 😇
I made minor changes to some MPI_Session_* man pages, too, but most changes were to
0f4673c
to
027fda5
Compare
027fda5
to
1e66d4c
Compare
It occurred to me overnight that the updates I made in MPI_Init*(3) and MPI_Finalize(3) were indicative of the fact that none of our man pages had been updated to reflect the fact that the world model now exists (and is distinct from the MPI session model). So I updated even more text this morning to describe and clarify the MPI world model vs. the MPI session model. I.e., in documenting the Reviewers should read these man pages:
|
046983c
to
b3e2f68
Compare
I made all changes I wanted to the documentation. As far as I am concerned this is ready to go. |
LGTM as well |
ompi/runtime/ompi_mpi_init.c
Outdated
/* did we found a valid integer value? */ | ||
const int nb_keywords = sizeof(ompi_thread_level_values)/sizeof(ompi_thread_level_values[0]); | ||
if (found >= nb_keywords) { | ||
return OMPI_ERR_BAD_PARAM; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite grok this -- this works for the integer values 0, 1, 2, 3, but won't work if -- when ABI is merged in -- the user specifies the ABI integer value for MPI_THREAD_MULTIPLE
(4096), for example.
Wouldn't it be better to iterate through the values of the ompi_thread_level_values and see if the integer found
value matches one of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation states 0 to 3, the MCA help string states 0 to 3, the previous OMPI versions were also based on 0 to 3 values. If we add the fact that the internal values, ABI or not, are hard to know for users and are not necessarily monotonically increasing, I don't think that accepting the internal values makes sense or is usefull for users (except maybe nitpicking).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've previously allowed 0-3, which is probably "well known" (or at least easy to guess).
The ABI values are published in the MPI-5 document, so those fit at least some definition of "well known", too.
Meaning: if we're going to accept integer values, we should probably accept both 0-3 and the ABI values that are published in the MPI spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No sensible user should have to read Appendix A of the MPI standard in order to set the threading level.
This PR offers a sensible, backward compatible, solution. enough for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not saying that they have to read the appendix. I'm saying that if we offer the ability to use integer values, we should offer both sets of integer values because someone will try to set the ABI value and be surprised that it didn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion on the ABI values,
but I will need this in v5, where there is no ABI support planned anyway
Setting OMPI_MPI_THREAD_LEVEL to a value different than requested in MPI_Init_thread would invoke the error handler, even though it is an useful override in some threaded library use cases.
I would recommend deferring to a separate PR when we implement ABI for v6, if only to keep the cherry-picking easier.
b3e2f68
to
47c8adc
Compare
ompi/runtime/ompi_mpi_init.c
Outdated
/* did we found a valid integer value? */ | ||
const int nb_keywords = sizeof(ompi_thread_level_values)/sizeof(ompi_thread_level_values[0]); | ||
if (found >= nb_keywords) { | ||
return OMPI_ERR_BAD_PARAM; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've previously allowed 0-3, which is probably "well known" (or at least easy to guess).
The ABI values are published in the MPI-5 document, so those fit at least some definition of "well known", too.
Meaning: if we're going to accept integer values, we should probably accept both 0-3 and the ABI values that are published in the MPI spec.
47c8adc
to
91e9ec6
Compare
Aren't you the one who started with PR saying that we needed to do this better to handle the ABI values? 👿
That's a fair point. We have the hood up right now -- why not spend 15 more minutes and re-do the not-very-difficult logic to handle both 0-3 and the ABI values and be done with this? Why create technical debt that no one will remember to come back and fix? Just do it -- perhaps as one more commit on this PR, because @abouteiller makes a good point that we don't really want to cherry pick that to v5.0.x -- and let's be done with it. |
In case anyone is looking for an proper example of the use of the word tetrapyloctomy, here we have it. In the current context, aka. OMPI does not even have ABI support, why are we investing time in this as if it was the best thing since sliced bread ? |
We're investing time because:
|
This is the opposite of investing, aka. wasting. I will not enumerate yet again the reasons, but such insistance from some is counterproductive, and less likely to encourage meaningful contributions (at the opposite of just "innovating" unnecessary features). We have a real issue (which is actually the reason for this PR, not ABI stuff as you claim), and you're unreasonably holding it. |
@jsquyres would you consider this truly a blocker? I would have said that this something that can easily be addressed by a subsequent PR, there is no real benefit in insisting on having it as part of this PR. |
This PR is not a blocker, no. But I am opposed to creating technical debt. All problems can be addressed by creating a subsequent PR, but why do that when a problem has been identified during the code review? The problem was introduced when new functionality was introduced into this PR. Here's the options I see:
|
…ages Including, but not limited to: * Added much more description of and distinction between the MPI world model and the MPI session model. Updated a lot of old, pre-MPI-world-model/pre-MPI-session-model text that was now stale / outdated, especially in the following pages: * MPI_Init(3), MPI_Init_thread(3) * MPI_Initialized(3) * MPI_Finalize(3) * MPI_Finalized(3) * MPI_Session_init(3) * MPI_Session_finalize(3) * Numerous formatting updates * Slightly improve the C code examples * Describe the mathematical relationship between the various MPI_THREAD_* constants in MPI_Init_thread(3) * Note that the mathematical relationships render nicely in HTML, but don't render entirely properly in nroff. This commit author is of the opinion that the nroff rendering is currently "good enough", and some Sphinx maintainer will fix it someday. * Add descriptions about the $OMPI_MPI_THREAD_LEVEL env variable and how it is used in MPI_Init_thread(3) * Added more seealso links Signed-off-by: Jeff Squyres <[email protected]>
91e9ec6
to
aff3afd
Compare
I can't caution this kind of autocratic behavior. My code is gone, I hope this solve all blockers for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes.
Setting OMPI_MPI_THREAD_LEVEL to a value different than
requested
inMPI_Init_thread
would invoke the error handler, even though it is an useful override in some threaded library use cases.