Skip to content

Commit 8328bab

Browse files
committed
fix: ensure connmgr is smaller then autoscalled ressource limits
Fixes #9545
1 parent b84cd11 commit 8328bab

File tree

4 files changed

+71
-14
lines changed

4 files changed

+71
-14
lines changed

config/init.go

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,10 @@ const DefaultConnMgrGracePeriod = time.Second * 20
110110
// type.
111111
const DefaultConnMgrType = "basic"
112112

113+
// DefaultResourceMgrMinInboundConns is a MAGIC number that probably a good
114+
// enough number of inbound conns to be a good network citizen.
115+
const DefaultResourceMgrMinInboundConns = 800
116+
113117
func addressesConfig() Addresses {
114118
return Addresses{
115119
Swarm: []string{

core/node/libp2p/rcmgr_defaults.go

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,5 +186,24 @@ Run 'ipfs swarm limit all' to see the resulting limits.
186186

187187
defaultLimitConfig := scalingLimitConfig.Scale(int64(maxMemory), int(numFD))
188188

189+
// Simple checks to overide autoscaling ensuring limits make sense versus the connmgr values.
190+
// There are ways to break this, but this should catch most problems already.
191+
// We might improve this in the future.
192+
// See: https://github.com/ipfs/kubo/issues/9545
193+
if cfg.ConnMgr.Type.WithDefault(config.DefaultConnMgrType) != "none" {
194+
maxInboundConns := int64(defaultLimitConfig.System.ConnsInbound)
195+
if connmgrHighWaterTimesTwo := cfg.ConnMgr.HighWater.WithDefault(config.DefaultConnMgrHighWater) * 2; maxInboundConns < connmgrHighWaterTimesTwo {
196+
maxInboundConns = connmgrHighWaterTimesTwo
197+
}
198+
199+
if maxInboundConns < config.DefaultResourceMgrMinInboundConns {
200+
maxInboundConns = config.DefaultResourceMgrMinInboundConns
201+
}
202+
203+
// Scale System.StreamsInbound as well, but use the existing ratio of StreamsInbound to ConnsInbound
204+
defaultLimitConfig.System.StreamsInbound = int(maxInboundConns * int64(defaultLimitConfig.System.StreamsInbound) / int64(defaultLimitConfig.System.ConnsInbound))
205+
defaultLimitConfig.System.ConnsInbound = int(maxInboundConns)
206+
}
207+
189208
return defaultLimitConfig, nil
190209
}

docs/libp2p-resource-management.md

Lines changed: 21 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -40,19 +40,19 @@ libp2p's resource manager provides tremendous flexibility but also adds complexi
4040
1. "The user who does nothing" - In this case Kubo attempts to give some sane defaults discussed below
4141
based on the amount of memory and file descriptors their system has.
4242
This should protect the node from many attacks.
43-
43+
4444
1. "Slightly more advanced user" - They can tweak the default limits discussed below.
4545
Where the defaults aren't good enough, a good set of higher-level "knobs" are exposed to satisfy most use cases
4646
without requiring users to wade into all the intricacies of libp2p's resource manager.
47-
The "knobs"/inputs are `Swarm.ResourceMgr.MaxMemory` and `Swarm.ResourceMgr.MaxFileDescriptors` as described below.
47+
The "knobs"/inputs are `Swarm.ResourceMgr.MaxMemory` and `Swarm.ResourceMgr.MaxFileDescriptors` as described below.
4848

4949
1. "Power user" - They specify overrides to computed default limits via `ipfs swarm limit` and `Swarm.ResourceMgr.Limits`;
5050

5151
### Computed Default Limits
5252
With the `Swarm.ResourceMgr.MaxMemory` and `Swarm.ResourceMgr.MaxFileDescriptors` inputs defined,
53-
[resource manager limits](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#limits) are created at the
54-
[system](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#the-system-scope),
55-
[transient](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#the-transient-scope),
53+
[resource manager limits](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#limits) are created at the
54+
[system](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#the-system-scope),
55+
[transient](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#the-transient-scope),
5656
and [peer](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#peer-scopes) scopes.
5757
Other scopes are ignored (by being set to "[~infinity](#infinite-limits])".
5858

@@ -68,11 +68,15 @@ The reason these scopes are chosen is because:
6868
(e.g., bug in a peer which is causing it to "misbehave").
6969
In the unintional case, we want to make sure a "misbehaving" node doesn't consume more resources than necessary.
7070

71-
Within these scopes, limits are just set on
72-
[memory](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#memory),
71+
Within these scopes, limits are just set on
72+
[memory](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#memory),
7373
[file descriptors (FD)](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#file-descriptors), [*inbound* connections](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#connections),
7474
and [*inbound* streams](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#streams).
7575
Limits are set based on the `Swarm.ResourceMgr.MaxMemory` and `Swarm.ResourceMgr.MaxFileDescriptors` inputs above.
76+
77+
There are also some special cases where minimum values are enforced.
78+
For example, Kubo maintainers have found in practice that it's a footgun to have too low of a value for `Swarm.ResourceMgr.Limits.System.ConnsInbound` and a default minimum is used. (See [core/node/libp2p/rcmgr_defaults.go](https://github.com/ipfs/kubo/blob/master/core/node/libp2p/rcmgr_defaults.go) for specifics.)
79+
7680
We trust this node to behave properly and thus don't limit *outbound* connection/stream limits.
7781
We apply any limits that libp2p has for its protocols/services
7882
since we assume libp2p knows best here.
@@ -139,13 +143,17 @@ There is a go-libp2p issue ([#1928](https://github.com/libp2p/go-libp2p/issues/1
139143
### How does the resource manager (ResourceMgr) relate to the connection manager (ConnMgr)?
140144
As discussed [here](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#connmanager-vs-resource-manager)
141145
these are separate systems in go-libp2p.
142-
Kubo also configures the ConnMgr separately from ResourceMgr. There is no checking to make sure the limits between the systems are congruent.
146+
Kubo performs sanity checks to ensure that some of the hard limits of the ResourceMgr are sufficiently greater than the soft limits of the ConnMgr.
143147

144-
Ideally `Swarm.ConnMgr.HighWater` is less than `Swarm.ResourceMgr.Limits.System.ConnsInbound`.
145-
This is so the ConnMgr can kick in and cleanup connections based on connection priorities before the hard limits of the ResourceMgr are applied.
148+
The soft limit of `Swarm.ConnMgr.HighWater` needs to be less than the hard limit `Swarm.ResourceMgr.Limits.System.ConnsInbound` for the configuration to make sense.
149+
This ensures the ConnMgr cleans up connections based on connection priorities before the hard limits of the ResourceMgr are applied.
146150
If `Swarm.ConnMgr.HighWater` is greater than `Swarm.ResourceMgr.Limits.System.ConnsInbound`,
147151
existing low priority idle connections can prevent new high priority connections from being established.
148-
The ResourceMgr doesn't know that the new connection is high priority and simply blocks it because of the limit its enforcing.
152+
The ResourceMgr doesn't know that the new connection is high priority and simply blocks it because of the limit its enforcing.
153+
154+
To ensure the ConnMgr and ResourceMgr are congruent, the ResourceMgr [computed default limts](#computed-default-limits) are adjusted such that:
155+
1. `Swarm.ResourceMgr.Limits.System.ConnsInbound` >= `max(Swarm.ConnMgr.HighWater * 2, 800)` AND
156+
2. `Swarm.ResourceMgr.Limits.System.StreamsInbound` is greater than any new/adjusted `Swarm.ResourceMgr.Limits.System.ConnsInbound` value so that there's enough streams per connection.
149157

150158
### How does one see the Active Limits?
151159
A dump of what limits are actually being used by the resource manager ([Computed Default Limits](#computed-default-limits) + [User Supplied Override Limits](#user-supplied-override-limits))
@@ -156,9 +164,9 @@ This can be observed with an empty [`Swarm.ResourceMgr.Limits`](https://github.c
156164
and then [seeing the active limits](#how-does-one-see-the-active-limits).
157165

158166
### How does one monitor libp2p resource usage?
159-
For [monitoring libp2p resource usage](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#monitoring),
167+
For [monitoring libp2p resource usage](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#monitoring),
160168
various `*rcmgr_*` metrics can be accessed as the prometheus endpoint at `{Addresses.API}/debug/metrics/prometheus` (default: `http://127.0.0.1:5001/debug/metrics/prometheus`).
161-
There are also [pre-built Grafana dashboards](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager/obs/grafana-dashboards) that can be added to a Grafana instance.
169+
There are also [pre-built Grafana dashboards](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager/obs/grafana-dashboards) that can be added to a Grafana instance.
162170

163171
A textual view of current resource usage and a list of services, protocols, and peers can be
164172
obtained via `ipfs swarm stats --help`

test/sharness/t0139-swarm-rcmgr.sh

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,9 +40,35 @@ test_expect_success 'disconnected: swarm stats requires running daemon' '
4040
test_should_contain "missing ResourceMgr" actual
4141
'
4242

43+
# test sanity scaling
44+
test_expect_success 'set very high connmgr highwater' '
45+
ipfs config --json Swarm.ConnMgr.HighWater 1000
46+
'
47+
48+
test_launch_ipfs_daemon
49+
50+
test_expect_success 'conns and streams are above 2000' '
51+
ipfs swarm limit system --enc=json | tee json &&
52+
[ "$(jq -r .ConnsInbound < json)" -ge 2000 ] &&
53+
[ "$(jq -r .StreamsInbound < json)" -ge 2000 ]
54+
'
55+
56+
test_kill_ipfs_daemon
57+
58+
test_expect_success 'set previous connmgr highwater' '
59+
ipfs config --json Swarm.ConnMgr.HighWater 96
60+
'
61+
62+
test_launch_ipfs_daemon
63+
64+
test_expect_success 'conns and streams are above 800' '
65+
ipfs swarm limit system --enc=json | tee json &&
66+
[ "$(jq -r .ConnsInbound < json)" -ge 800 ] &&
67+
[ "$(jq -r .StreamsInbound < json)" -ge 800 ]
68+
'
69+
4370
# swarm limit|stats should succeed in online mode by default
4471
# because Resource Manager is opt-out
45-
test_launch_ipfs_daemon
4672

4773
# every scope has the same fields, so we only inspect System
4874
test_expect_success 'ResourceMgr enabled: swarm limit' '

0 commit comments

Comments
 (0)