Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flush memory allocations when no longer needed #3571

Closed
wants to merge 4 commits into from
Closed

Conversation

alban
Copy link
Member

@alban alban commented Oct 10, 2024

Flush btf kernel data from cache when no longer needed.

Also remove unnecessary regexp from the cri test.

sudo -E ./ig trace exec
go tool pprof -http=:8081 http://localhost:6060/debug/pprof/heap

Before: 45 MB:

  • 7.8MB for kallsyms
  • 28.29MB for kernel BTF

pprof-ig-1

After: 3 MB

pprof-ig-4

How to use

No changes

Testing done

  • I have not checked how slower it becomes.
  • I have not checked with pmap or smem, only pprof.
  • The numbers above are with the builtin trace-exec gadget. With the image-based trace_exec gadget, there are still some btf allocations that are not flushed, so pprof shows 38MB in initEnumFormatters:

cc @burak-ok

Fixes: #3541

alban added 4 commits October 10, 2024 17:47
go tool pprof -http=:8081 http://localhost:6060/debug/pprof/heap

Signed-off-by: Alban Crequy <[email protected]>
This is slower but it does not keep btf data in memory when no longer
needed.

Signed-off-by: Alban Crequy <[email protected]>
The tests depends on github.com/google/go-cmp/cmp which is not necessary
to include in ig. The dependency compiles regexp which cause 0.5MB to be
permanently used by the ig process. This is visible in the pprof
flamegraph as 'regexp.MustCompile'.

Signed-off-by: Alban Crequy <[email protected]>
Copy link
Member Author

@alban alban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Gadget benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 2.

Benchmark suite Current: c82fc62 Previous: d95ad0d Ratio
BenchmarkAllGadgetsWithContainers/container0/-traceloop 80274335 ns/op 31646571 ns/op 2.54
BenchmarkAllGadgetsWithContainers/container0/audit-seccomp 240016891 ns/op 53678316 ns/op 4.47
BenchmarkAllGadgetsWithContainers/container0/snapshot-process 72072834 ns/op 14217072 ns/op 5.07
BenchmarkAllGadgetsWithContainers/container0/top-file 302471140 ns/op 115081903 ns/op 2.63
BenchmarkAllGadgetsWithContainers/container0/top-tcp 312354018 ns/op 119907983 ns/op 2.60
BenchmarkAllGadgetsWithContainers/container0/trace-oomkill 239542534 ns/op 51334879 ns/op 4.67
BenchmarkAllGadgetsWithContainers/container0/trace-signal 131665979 ns/op 61542188 ns/op 2.14
BenchmarkAllGadgetsWithContainers/container0/trace-tcpconnect 423131254 ns/op 210475380 ns/op 2.01

This comment was automatically generated by workflow using github-action-benchmark.

CC: @alban

@burak-ok
Copy link
Member

With the image-based trace_exec gadget, there are still some btf allocations that are not flushed, so pprof shows 38MB in initEnumFormatters

For me pprof shows also like 6 MB of usage for run trace_exec

I have not checked with pmap or smem, only pprof.

For me it has an RSS of around 80 MB, where the SHR is 60 MB. So this reduced the unique memory pages. Therefore effectivly every new ig instance only adds 10 MB - 20 MB memory consumption.
Still htop will show 80MB for every process even 60 MB of each is shared. We need to communicate that with the end users.

I have not checked how slower it becomes

As a test I put the time measurement into LoadKernelSpec:

{11:21}~/codes/inspektor-gadget:alban_memory ✗ ➭ go build ./cmd/ig && sudo -E ./ig run trace_dns
time: 56.016613ms
time: 4.435µs
time: 2.019µs
time: 56.936247ms
time: 5.104µs
time: 1.136µs
time: 1.076µs
time: 74.422792ms
time: 89.252146ms
time: 59.296577ms
time: 1.409µs
time: 1.189µs
time: 2.027µs
time: 5.353µs
time: 1.522µs
time: 2.027µs
time: 1.042µs
time: 2.273µs
time: 1.514µs
time: 1.465µs
time: 2.118µs
time: 1.881µs
time: 1.941µs
RUNTIME.CONTAINERNAME  ...

So that function gets called 23 times for run trace_dns. Most of the time its still cached in between some calls. But we already have over 250ms of additional delay until ig starts. And this is only from LoadKernelSpec()

Total delay

Now I put the time measurement into all functions where this PR put a FlushKernelSpec to: (LoadKernelSpec, LoadAndAssign, NewCollectionWithOptions). I tested again with ig run trace_dns:

In total without flushing the cache it took around 0.4s to get the header printed out.
With flushing the cache it took 1.14s.

Full log for time measurement in nanoseconds Without flushing
➭ go build ./cmd/ig && sudo -E ./ig run trace_dns
NewCollectionWithOptions: 192381
LoadKernelSpec: 57076860
LoadKernelSpec: 3085
LoadKernelSpec: 1125
LoadAndAssign: 191065342
LoadKernelSpec: 3680
LoadKernelSpec: 2149
LoadKernelSpec: 1690
LoadKernelSpec: 1162
LoadAndAssign: 9728594
LoadKernelSpec: 923
LoadKernelSpec: 2599
LoadAndAssign: 157329
LoadKernelSpec: 3309
LoadKernelSpec: 1348
LoadKernelSpec: 1324
LoadKernelSpec: 5246
LoadAndAssign: 49973966
LoadKernelSpec: 1833
LoadKernelSpec: 1516
LoadKernelSpec: 2124
LoadKernelSpec: 1434
LoadKernelSpec: 2250
LoadKernelSpec: 1242
LoadKernelSpec: 1832
LoadKernelSpec: 2235
LoadKernelSpec: 1281
LoadKernelSpec: 2488
LoadAndAssign: 126387577
RUNTIME.CONTAINERNAME  ...
NewCollectionWithOptions: 3338233
  
With flushing
➭ go build ./cmd/ig && sudo -E ./ig run trace_dns
NewCollectionWithOptions: 179360
LoadKernelSpec: 57872574
LoadKernelSpec: 2945
LoadKernelSpec: 943
LoadAndAssign: 189445800
LoadKernelSpec: 56529667
LoadKernelSpec: 4545
LoadKernelSpec: 1309
LoadKernelSpec: 946
LoadAndAssign: 229693603
LoadKernelSpec: 80418891
LoadKernelSpec: 64317914
LoadAndAssign: 152349
LoadKernelSpec: 52153526
LoadKernelSpec: 1797
LoadKernelSpec: 879
LoadKernelSpec: 1501
LoadAndAssign: 97759962
LoadKernelSpec: 6806
LoadKernelSpec: 1258
LoadKernelSpec: 2924
LoadKernelSpec: 1197
LoadKernelSpec: 1886
LoadKernelSpec: 776
LoadKernelSpec: 769
LoadKernelSpec: 3480
LoadKernelSpec: 1580
LoadKernelSpec: 2573
LoadAndAssign: 309008974
RUNTIME.CONTAINERNAME  ...
NewCollectionWithOptions: 3427239
  
---

In the end it would be the best to run multiple gadgets in a single instance of ig.

@alban
Copy link
Member Author

alban commented Oct 14, 2024

Some stats about kallsyms:

pprof shows the kallsyms takes 8MB of allocations in use, but 38 MB allocations in total (some of which will be released by the GC after initialisation).

bytes.Fields() is expensive (memory-wise): it costs 34MB of memory (that will later be released by the GC):

vendor/github.com/cilium/ebpf/internal/kallsyms/kallsyms.go:59:

fields := bytes.Fields(scanner.Bytes())
  • 300k symbols in total:
$ sudo cat /proc/kallsyms|wc -l
317927
  • 100k symbols from kernel modules:
$ sudo cat /proc/kallsyms |gawk -F " " 'NF == 4' |wc -l
107764
  • File size: 15MB
$ sudo cat /proc/kallsyms|wc -c
14743161
  • File size when only considering kernel modules symbols: 6MB
$ sudo cat /proc/kallsyms |gawk -F " " 'NF == 4' |wc -c
5957326

@alban
Copy link
Member Author

alban commented Oct 14, 2024

bytes.Fields() is expensive (memory-wise): it costs 34MB of memory (that will later be released by the GC):

It gets reduced from 34MB to 12MB with this patch on cilium/ebpf:

--- a/internal/kallsyms/kallsyms.go
+++ b/internal/kallsyms/kallsyms.go
@@ -56,7 +56,12 @@ func loadKernelModuleMapping(f io.Reader) (map[string]string, error) {
        mods := make(map[string]string)
        scanner := bufio.NewScanner(f)
        for scanner.Scan() {
-               fields := bytes.Fields(scanner.Bytes())
+               line := scanner.Bytes()
+               // bytes.Fields is expensive, so filter for '[' first
+               if !bytes.ContainsRune(line, '[') {
+                       continue
+               }
+               fields := bytes.Fields(line)
                if len(fields) < 4 {
                        continue
                }

xref: cilium/ebpf#1584

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Research] ig memory usage investigation
2 participants