Skip to content

Commit e635711

Browse files
TAM config model for inband-telemetry and drop-monitoring
1 parent 29db0ee commit e635711

File tree

3 files changed

+254
-0
lines changed

3 files changed

+254
-0
lines changed

doc/tam/tam_config_flow_dm.png

97.1 KB
Loading

doc/tam/tam_config_flow_int.png

51.4 KB
Loading

doc/tam/tam_hld.md

Lines changed: 254 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,254 @@
1+
# SONiC Telemetry and Monitoring (TAM) High Level Design
2+
3+
### Rev 0.1
4+
5+
## Table of Contents
6+
7+
## 1. Revision
8+
Rev | Date | Author | Change Description
9+
----|------|--------|-------------------
10+
|v0.1|2025-10-09|Senthil Krishnamurthy|Initial version of TAM HLD
11+
12+
## 2. Scope
13+
This document describes the high level design of Telemetry and Monitoring (TAM) in SONiC.
14+
15+
## 3. Definitions/Abbreviations
16+
Definitions/Abbreviation|Description
17+
------------------------|-----------
18+
TAM| Telemetry and Monitoring
19+
SAI| Switch Abstraction Interface
20+
IPFIX| IP Flow Information Export
21+
VRF| Virtual Routing and Forwarding
22+
INT| In-band Network Telemetry
23+
IFA| In-band Flow Analyzer
24+
25+
## 4. Overview
26+
TAM provides a framework to observe and report network events and telemetry by exporting messages to a collector. Telemetry provides information about the network traffic, such as congestion, latency, and port buffer utlization, on a per-device basis. Monitoring provides insights packet drops, and flow statistics.
27+
28+
### 4.1 Telemetry
29+
Inband Telemetry (INT) is a variant of telemetry that provides insights into the path taken by the packets through the network, including per-hop latency, queue depth, and other metrics. Inband Flow Analyzer (IFA) is a mechanism INT uses to embed the telemetry data directly into the packet. In an IFA-aware network, the hops can perform one of the three roles:
30+
1. **Initiator Node**: This node initiates the IFA packet and can operate in one of two modes:
31+
a. **Inband Mode**: IFA header and the first IFA metadata, with telemetry data, is added to the data packet.
32+
b. **Probe Mode**: Incoming data packets are sampled and the first IFA metadata is added to the sampled packet.
33+
2. **Transit Node**: Matches on IFA packet and adds new IFA metadata to the packet.
34+
3. **Terminator Node**: Inband IFA packet are stripped off the IFA header and metadata, and the datapacket is forwarded to the next hop, and the metadata is exported to a collector. Probe IFA packets are teminated and mirroed to a collector.
35+
36+
### 4.2 Monitoring
37+
TAM supports the Drop-Monitor (DM) feature, which provides visibility into packet drops in the dataplane. The dropped packets are trapped to a collector along with metadata that includes the reason for the drop and the payload of the packet, upto a maximum length. It can operate in two modes:
38+
1. **Stateless DM**: Drop reasons are exported on a per-packet per-drop basis.
39+
2. **Stateful DM**: Drop reasons are exported on a per-flow basis, in periodic intervals.
40+
41+
## 5. Requirements
42+
TAM will be implemented in multiple phases.
43+
44+
### 5.1 Phase 1
45+
- Support Initiator and Transit roles for INT/IFA
46+
- Support IFA version 2.0 for the IFA header and metadata inserted in the packet
47+
- Support drop-monitor (DM) telemetry type with IPFIX reporting
48+
- Support stateless drop-monitoring
49+
- Collector endpoint can be either IPv4 or IPv6
50+
- Collectors can be reachable via default or mgmt VRF (mgmt VRF must be enabled to use it)
51+
- Flow groups can be defined to seletcs packets for IFA and DM
52+
- TAM Session can be bound to one or more flow-groups
53+
- TAM Session can be bound to one or more collectors
54+
55+
### 5.2 Phase 2
56+
- Support stateful drop-monitoring
57+
- Support Terminator role for INT/IFA
58+
- Support packet sampling
59+
60+
### 5.3 Phase 3
61+
- Support additional telemetry types (e.g., flow-statistics)
62+
- Support additional report types (e.g., gRPC)
63+
64+
## 6. Module Design
65+
### 6.1 Overall design
66+
- Management framework writes TAM tables to CONFIG_DB using yang model
67+
- TamOrch (new) in orchagent subscribes to CONFIG_DB TAM_* tables and programs SAI TAM objects
68+
- Syncd/SAI implement TAM object model and programs the dataplane
69+
- Telemetry data is exported to collectors from front-panel ports without punts to CPU
70+
71+
### 6.2 Configuration and control flow
72+
The SWSS container is enhanced to add a new component, TamOrch, to process TAM configuration and control.
73+
74+
#### 6.2.1 Inband Telemetry
75+
The following figure shows the configuration and control flows for TAM INT using IFAv2:
76+
![tam_config_flow_int](tam_config_flow_int.png)
77+
78+
1) Administrator configures TAM device attributes (CONFIG_DB), with IFA enabled
79+
2) tamorch uses aclorch to create an ACL table and an ACL rule to match on IP_PROTO 253 (IFA)
80+
3) tamorch creates SAI TAM_REPORT and TAM_INT objects
81+
4) The ACL table is bound to the switch
82+
83+
#### 6.2.2 Drop Monitoring
84+
The following figure shows the configuration and control flows for TAM Drop Monitoring:
85+
![tam_config_flow_dm](tam_config_flow_dm.png)
86+
87+
1) Administrator configures TAM collectors, flow-groups, and sessions (CONFIG_DB)
88+
2) If the collector config uses a VRF, tamorch resolves the VRF to resolve the nexthop for the collector destination IP
89+
3) tamorch creates SAI TAM_COLLECTOR and TAM_TRANSPORT objects using the resolved nexthop
90+
4) tamorch uses aclorch to create an ACL table and an ACL rule with match conditions from the flow-group rules
91+
5) The ACL table is bound to the list of front-panel ports specified in the flow-group
92+
6) tamorch processes the TAM_SESSION_TABLE to create SAI TAM objects (report, event_action, event, tam)
93+
94+
### 6.3 SWSS and syncd changes
95+
- New tamorch: consumes CONFIG_DB TAM tables, maps to SAI TAM objects, maintains reference counts and object lifecycles
96+
- syncd/SAI: no changes
97+
98+
## 8. Configuration and Management
99+
### 8.1 CONFIG_DB
100+
Configure IFA by creating a TAM table:
101+
```
102+
"TAM": {
103+
"device": {
104+
"device-id": 1234, // 28bits
105+
"enterprise-id": 1234,
106+
"ifa": true // boolean
107+
}
108+
}
109+
```
110+
111+
Configure drop-monitor by creating flow-groups, collectors, and sessions:
112+
```
113+
"TAM_FLOW_GROUP": {
114+
"fg-1": {
115+
"aging_interval": 60,
116+
"ports": ["Ethernet0", "PortChannel10"]
117+
},
118+
"fg-1|rule1": {
119+
"src_ip_prefix": "0.0.0.0/0",
120+
"dst_ip_prefix": "10.0.0.0/8",
121+
"ip_protocol": 6,
122+
"l4_dst_port": 443
123+
}
124+
},
125+
"TAM_COLLECTOR": {
126+
"c1": {
127+
"dst_ip": "192.0.2.10",
128+
"dst_port": 4739,
129+
"dscp_value": 32,
130+
"vrf": "vrf_blue"
131+
}
132+
},
133+
"TAM_SESSION": {
134+
"s-drop": {
135+
"type": "drop-monitor",
136+
"report_type": "ipfix",
137+
"flow_group": "fg-1",
138+
"collector": ["c1"]
139+
}
140+
}
141+
```
142+
Configure sFlow TAM Session:
143+
```
144+
"TAM_SESSION": {
145+
"s-sflow": {
146+
"type": "sflow",
147+
"report_type": "ipfix",
148+
"collector": ["c1"]
149+
}
150+
}
151+
```
152+
153+
### 8.2 DB and Schema changes
154+
155+
```
156+
; Defines schema for TAM device configuration attributes
157+
key = TAM:device ; TAM device-level configuration
158+
; field = value
159+
DEVICE_ID = 1*9DIGIT ; 1..134217727
160+
ENTERPRISE_ID = 1*9DIGIT ; 1..134217727
161+
IFA = "true" / "false" ; Enable IFA device-type hint
162+
```
163+
164+
```
165+
; Defines schema for TAM flow-group configuration attributes
166+
key = TAM_FLOW_GROUP:flow_group_name
167+
; field = value
168+
AGING_INTERVAL = 1*10DIGIT ; interval in seconds
169+
PORTS = ifname-list ; comma-separated list of interfaces
170+
171+
; value annotations
172+
flow_group_name = 1*255VCHAR
173+
ifname-list = ifname *( "," ifname )
174+
ifname = 1*64VCHAR
175+
```
176+
177+
```
178+
; Defines schema for TAM flow-group match rules
179+
key = TAM_FLOW_GROUP:flow_group_name|rule_name
180+
; field = value
181+
SRC_IP_PREFIX = ip_prefix ; mandatory
182+
DST_IP_PREFIX = ip_prefix ; mandatory
183+
L4_SRC_PORT = port_num ; optional
184+
L4_DST_PORT = port_num ; optional
185+
IP_PROTOCOL = 1*3DIGIT ; 1..255 (optional)
186+
187+
; value annotations
188+
rule_name = 1*64VCHAR
189+
ip_prefix = IPv4prefix / IPv6prefix
190+
IPv4prefix = IPv4address "/" 1*2DIGIT ; 0..32
191+
IPv6prefix = IPv6address "/" 1*3DIGIT ; 0..128
192+
port_num = 1*5DIGIT ; 1..65535
193+
```
194+
195+
```
196+
; Defines schema for TAM collector configuration attributes
197+
key = TAM_COLLECTOR:collector_name
198+
; field = value
199+
SRC_IP = IPv4address / IPv6address ; optional
200+
DST_IP = IPv4address / IPv6address ; mandatory
201+
DST_PORT = port_num ; mandatory
202+
DSCP_VALUE = dscp ; DSCP 0..63 (default 32)
203+
VRF = "default" / "mgmt" / vrf_name ; VRF for collector reachability
204+
205+
; value annotations
206+
collector_name = 1*255VCHAR
207+
vrf_name = 1*255VCHAR
208+
dscp = 1*2DIGIT ; 0..63
209+
```
210+
211+
```
212+
; Defines schema for TAM session configuration attributes
213+
key = TAM_SESSION:session_name
214+
; field = value
215+
TYPE = "drop-monitor"
216+
REPORT_TYPE = "ipfix"
217+
FLOW_GROUP = flow_group_name
218+
COLLECTOR = collector_list ; one or more collectors
219+
220+
; value annotations
221+
session_name = 1*255VCHAR
222+
collector_list = collector_name *( "," collector_name )
223+
```
224+
225+
> Note: Refer to swss-schema.md for standard value annotations such as IPv4address/IPv6address and ifname, and for general BNF conventions used across SONiC documents.
226+
227+
228+
## 9. Warmboot and Fastboot Impact
229+
- No additional sleeps in boot-critical path. TAM object creation occurs after dependencies are up. Service can be delayed until SYSTEM_READY is Up. When disabled/unused, no impact.
230+
231+
## 10. Memory Consumption
232+
- Minimal control-plane state in orchagent (object maps). No growth when feature disabled.
233+
234+
## 11. Restrictions/Limitations
235+
- Requires platform/SAI support for TAM drop monitoring and IPFIX export; otherwise feature remains inoperative (capability=false)
236+
- Exact limits (number of flow-groups/rules/collectors) depend on platform
237+
- mgmt VRF usage requires MGMT_VRF enabled
238+
239+
## 12. Testing Requirements
240+
### 12.1 Unit tests (one-liners)
241+
1) Validate CONFIG_DB for each table and field
242+
2) Validate reference checks (ports, VRF, flow_group, collector)
243+
3) Validate tamorch creates/updates/deletes SAI TAM objects per CONFIG_DB changes
244+
4) Capability gating: with capability=false, CONFIG_DB writes do not program SAI
245+
246+
### 12.2 System tests
247+
1) Configure flow-group, rule, collector, session; verify IPFIX is exported to collector
248+
2) Verify VRF selection (default vs mgmt) and DSCP marking
249+
3) Verify rule match scoping and port/PortChannel membership
250+
4) Reboot/warm-reboot and verify export resumes with preserved configuration
251+
252+
## 13. Open/Action items
253+
- Finalize CLI commands and Command-Reference.md updates aligned to YANG
254+
- Document per-platform capability and limits

0 commit comments

Comments
 (0)