-
Notifications
You must be signed in to change notification settings - Fork 2k
scheduler: don't sort reserved port ranges before adding to bitmap #26712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9f41577 to
cab99d3
Compare
| @@ -0,0 +1,3 @@ | |||
| ```release-note:improvement | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers: I'm open to calling this a bug so that we can backport it. Seems like a nice win for ENT customers on the LTS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to backporting
During a large volume dispatch load test, I discovered that a lot of the total
scheduling time is being spent calling `structs.ParsePortRanges` repeatedly, in
order to parse the reserved ports configuration of the node (ex. converting
`"80,8000-8001"` to `[]int{80, 8000, 8001}`). A close examination of the
profiles shows that the bulk of the time is being spent hashing the keys for the
map of ports we use for de-duplication, and then sorting the resulting slice.
The `(*NetworkIndex) SetNode` method that calls the offending `ParsePortRanges`
merges all the ports into the `UsedPorts` map of bitmaps at scheduling
time. Which means the consumer of the slice is already de-duplicating and
doesn't care about the order. The only other caller of `ParsePortRanges` is when
we validate the configuration file, and that throws away the slice entirely.
By skipping de-duplication and not sorting, we can cut down the runtime of this
function by 30x and memory usage by 3x.
Ref: https://github.com/hashicorp/nomad/blob/v1.10.4/nomad/structs/network.go#L201
Fixes: #26654
cab99d3 to
87bf05a
Compare
schmichael
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
chrisroberts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. Might be nice to call out that the result may now include duplicates in the function doc.
|
Going to move this back into draft, as I realized there's a minor DoS opportunity here with this approach because a misconfigured node could have a range like |
|
Using an iterator greatly improved the performance of this function... but didn't actually help further reduce CPU/memory usage because the caller needs to apply the results of the iterator multiple times over different bitmaps (at least one for each interface). I also tried a reusable iterator but that ended up needing to memoize the So I've updated with the docstring comments requested and added a guard on the total number of ports, as well as found an opportunity to use I'm also going to follow-up with a second PR that removes the |
gulducat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
|
I ran a similar high dispatch load test to the one I originally ran for #26654 (with a larger instance size because that's what I had handy). Without this patch:
With this patch:
|


During a large volume dispatch load test, I discovered that a lot of the total scheduling time is being spent calling
structs.ParsePortRangesrepeatedly, in order to parse the reserved ports configuration of the node (ex. converting"80,8000-8001"to[]int{80, 8000, 8001}). A close examination of the profiles shows that the bulk of the time is being spent hashing the keys for the map of ports we use for de-duplication, and then sorting the resulting slice.The
(*NetworkIndex) SetNodemethod that calls the offendingParsePortRangesmerges all the ports into theUsedPortsmap of bitmaps at scheduling time. Which means the consumer of the slice is already de-duplicating and doesn't care about the order. The only other caller ofParsePortRangesis when we validate the configuration file, and that throws away the slice entirely.By skipping de-duplication and not sorting, we can cut down the runtime of this function by 30x and memory usage by 4x.
See my comment here for why memoizing the result proved to be impractical. This changeset also deletes the unused
ParseReservedHostPortsmethod and moves its tests into the tests forParsePortRanges(which is really what it was testing anyways).Ref: https://github.com/hashicorp/nomad/blob/v1.10.4/nomad/structs/network.go#L201
Fixes: #26654
I'm going to reproduce the test I ran that I've described in #26654 but in the meantime, here's a microbenchmark:
Contributor Checklist
changelog entry using the
make clcommand.ensure regressions will be caught.
and job configuration, please update the Nomad website documentation to reflect this. Refer to
the website README for docs guidelines. Please also consider whether the
change requires notes within the upgrade guide.
Reviewer Checklist
backporting document.
in the majority of situations. The main exceptions are long-lived feature branches or merges where
history should be preserved.
within the public repository.
Changes to Security Controls
Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.