-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flush memory allocations when no longer needed #3571
Conversation
go tool pprof -http=:8081 http://localhost:6060/debug/pprof/heap Signed-off-by: Alban Crequy <[email protected]>
This is slower but it does not keep btf data in memory when no longer needed. Signed-off-by: Alban Crequy <[email protected]>
The tests depends on github.com/google/go-cmp/cmp which is not necessary to include in ig. The dependency compiles regexp which cause 0.5MB to be permanently used by the ig process. This is visible in the pprof flamegraph as 'regexp.MustCompile'. Signed-off-by: Alban Crequy <[email protected]>
Signed-off-by: Alban Crequy <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Gadget benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 2
.
Benchmark suite | Current: c82fc62 | Previous: d95ad0d | Ratio |
---|---|---|---|
BenchmarkAllGadgetsWithContainers/container0/-traceloop |
80274335 ns/op |
31646571 ns/op |
2.54 |
BenchmarkAllGadgetsWithContainers/container0/audit-seccomp |
240016891 ns/op |
53678316 ns/op |
4.47 |
BenchmarkAllGadgetsWithContainers/container0/snapshot-process |
72072834 ns/op |
14217072 ns/op |
5.07 |
BenchmarkAllGadgetsWithContainers/container0/top-file |
302471140 ns/op |
115081903 ns/op |
2.63 |
BenchmarkAllGadgetsWithContainers/container0/top-tcp |
312354018 ns/op |
119907983 ns/op |
2.60 |
BenchmarkAllGadgetsWithContainers/container0/trace-oomkill |
239542534 ns/op |
51334879 ns/op |
4.67 |
BenchmarkAllGadgetsWithContainers/container0/trace-signal |
131665979 ns/op |
61542188 ns/op |
2.14 |
BenchmarkAllGadgetsWithContainers/container0/trace-tcpconnect |
423131254 ns/op |
210475380 ns/op |
2.01 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @alban
For me pprof shows also like 6 MB of usage for
For me it has an RSS of around
As a test I put the time measurement into
So that function gets called 23 times for Total delayNow I put the time measurement into all functions where this PR put a In total without flushing the cache it took around Full log for time measurement in nanosecondsWithout flushing➭ go build ./cmd/ig && sudo -E ./ig run trace_dns NewCollectionWithOptions: 192381 LoadKernelSpec: 57076860 LoadKernelSpec: 3085 LoadKernelSpec: 1125 LoadAndAssign: 191065342 LoadKernelSpec: 3680 LoadKernelSpec: 2149 LoadKernelSpec: 1690 LoadKernelSpec: 1162 LoadAndAssign: 9728594 LoadKernelSpec: 923 LoadKernelSpec: 2599 LoadAndAssign: 157329 LoadKernelSpec: 3309 LoadKernelSpec: 1348 LoadKernelSpec: 1324 LoadKernelSpec: 5246 LoadAndAssign: 49973966 LoadKernelSpec: 1833 LoadKernelSpec: 1516 LoadKernelSpec: 2124 LoadKernelSpec: 1434 LoadKernelSpec: 2250 LoadKernelSpec: 1242 LoadKernelSpec: 1832 LoadKernelSpec: 2235 LoadKernelSpec: 1281 LoadKernelSpec: 2488 LoadAndAssign: 126387577 RUNTIME.CONTAINERNAME ... NewCollectionWithOptions: 3338233With flushing ➭ go build ./cmd/ig && sudo -E ./ig run trace_dns NewCollectionWithOptions: 179360 LoadKernelSpec: 57872574 LoadKernelSpec: 2945 LoadKernelSpec: 943 LoadAndAssign: 189445800 LoadKernelSpec: 56529667 LoadKernelSpec: 4545 LoadKernelSpec: 1309 LoadKernelSpec: 946 LoadAndAssign: 229693603 LoadKernelSpec: 80418891 LoadKernelSpec: 64317914 LoadAndAssign: 152349 LoadKernelSpec: 52153526 LoadKernelSpec: 1797 LoadKernelSpec: 879 LoadKernelSpec: 1501 LoadAndAssign: 97759962 LoadKernelSpec: 6806 LoadKernelSpec: 1258 LoadKernelSpec: 2924 LoadKernelSpec: 1197 LoadKernelSpec: 1886 LoadKernelSpec: 776 LoadKernelSpec: 769 LoadKernelSpec: 3480 LoadKernelSpec: 1580 LoadKernelSpec: 2573 LoadAndAssign: 309008974 RUNTIME.CONTAINERNAME ... NewCollectionWithOptions: 3427239 In the end it would be the best to run multiple gadgets in a single instance of |
Some stats about kallsyms: pprof shows the kallsyms takes 8MB of allocations in use, but 38 MB allocations in total (some of which will be released by the GC after initialisation).
fields := bytes.Fields(scanner.Bytes())
$ sudo cat /proc/kallsyms|wc -l
317927
$ sudo cat /proc/kallsyms |gawk -F " " 'NF == 4' |wc -l
107764
$ sudo cat /proc/kallsyms|wc -c
14743161
$ sudo cat /proc/kallsyms |gawk -F " " 'NF == 4' |wc -c
5957326 |
It gets reduced from 34MB to 12MB with this patch on cilium/ebpf: --- a/internal/kallsyms/kallsyms.go
+++ b/internal/kallsyms/kallsyms.go
@@ -56,7 +56,12 @@ func loadKernelModuleMapping(f io.Reader) (map[string]string, error) {
mods := make(map[string]string)
scanner := bufio.NewScanner(f)
for scanner.Scan() {
- fields := bytes.Fields(scanner.Bytes())
+ line := scanner.Bytes()
+ // bytes.Fields is expensive, so filter for '[' first
+ if !bytes.ContainsRune(line, '[') {
+ continue
+ }
+ fields := bytes.Fields(line)
if len(fields) < 4 {
continue
} xref: cilium/ebpf#1584 |
Instead of calling |
Flush btf kernel data from cache when no longer needed.
Also remove unnecessary regexp from the cri test.
Before: 45 MB:
After: 3 MB
How to use
No changes
Testing done
cc @burak-ok
Fixes: #3541