forked from Seagate/cortx-motr
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathaddb2-primer
243 lines (181 loc) · 11.4 KB
/
addb2-primer
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
Addb (Analytic and Diagnostic Data-Base) is a Motr sub-system that collects
information about system behaviour ("telemetry information").
Motr code is instrumented to produce addb records describing events (for
example, memory allocations, error conditions, incoming fops, fom phase and
state transition), operational parameters (queue depths, operation latencies,
etc.) and statistical aggregates.
Addb records produced on storage-less clients are transmitted over network to
servers. Servers store records, both received from clients and produced locally,
in dedicated storage stobs.
Example
.......
Run Motr m0t1fs tests:
$ sudo ./m0t1fs/linux_kernel/st/m0t1fs_test.sh
This locally starts 3 md services and 4 ioservices in directories
/var/motr/systest-$PID/{ios,mds}$N. Each server has its own addb2 stob:
$ ls -l /var/motr/systest-*/????/addb-stobs/o/*2
-rwx------ 1 root root 48955392 Apr 14 15:46 /var/motr/systest-27554/ios1/addb-stobs/o/0000000000000002
-rwx------ 1 root root 41746432 Apr 14 15:46 /var/motr/systest-27554/ios2/addb-stobs/o/0000000000000002
-rwx------ 1 root root 42991616 Apr 14 15:46 /var/motr/systest-27554/ios3/addb-stobs/o/0000000000000002
-rwx------ 1 root root 52101120 Apr 14 15:46 /var/motr/systest-27554/ios4/addb-stobs/o/0000000000000002
-rwx------ 1 root root 49676288 Apr 14 15:46 /var/motr/systest-27554/mds1/addb-stobs/o/0000000000000002
-rwx------ 1 root root 22544384 Apr 14 15:46 /var/motr/systest-27554/mds2/addb-stobs/o/0000000000000002
-rwx------ 1 root root 22937600 Apr 14 15:46 /var/motr/systest-27554/mds3/addb-stobs/o/0000000000000002
(there are also addb-stobs/o/*1 files, created by the older version of addb).
Addb2 stobs can be examined by m0addb2dump utility:
$ # Note that full path to the stob must be used.
$ # Multiple stobs can be specified on the command line.
$ ./utils/m0run m0addb2dump -- /var/motr/systest-27554/????/addb-stobs/o/*2 |less
The output consists of records, formatted as following:
* 2015-10-24-04:21:44.073956527 clock
| node <11186d8bf0e34117:ab1897c062a22573>
| locality 0
A line starting with '*' is a "measurement", it describes an event or
parameter. Following lines, starting with '|', are measurement's context. In
this case, the measurement is a time-stamp, and its context specifies the node
and the locality where the measurement was taken (i.e., where time stamp was
generated).
A few more examples:
* 2015-10-24-04:21:44.073956527 m0t1fs-create <4700000000000000:10007>, mode: 100644, rc: 0
| node <11186d8bf0e34117:ab1897c062a22573>
| thread ffff88007407ca80, 7081
This is a client-originated record. The measurement records file creation
operation in m0t1fs.
* 2015-04-20-14:36:13.687531192 alloc size: 40, addr: @0x7fd27c53eb20
| node <f3b62b87d9e642b2:96a4e0520cc5477b>
| locality 1
| thread 7fd28f5fe700
| fom @0x7fd1f804f710, 'IO fom' transitions: 13 phase: Zero-copy finish
| stob-io-launch 2015-04-20-14:36:13.629431319, <200000000000003:10000>, count: 8, bvec-nr: 8, ivec-nr: 1, offset: 0
| stob-io-launch 2015-04-20-14:36:13.666152841, <100000000adf11e:3>, count: 8, bvec-nr: 8, ivec-nr: 8, offset: 65536
Record of a memory allocation, which happened when a fom in "Zero-copy finish"
phase launched AD-stob IO, which incurred Linux-stob IO.
* 2015-10-24-04:21:44.073956527 fom-phase Zero-copy finish -[Wait for Zero-copy finished buffer release]-> Network buffer release
| node <11186d8bf0e34117:ab1897c062a22573>
| locality 3
| thread 7f79e57fb700
| fom @0x7f795008a680, 'IO fom', transitions: 15, phase: 31
This record describes fom state transition. Note the "fom" label in the context:
it describes the fom which undergoes the phase transition.
* 2015-04-14-15:33:11.998165453 fom-descr service: <7300000000000001:0>, sender: c28baccf27e0001, req-opcode: Read request, rep-opcode: Read reply, local: false
| node <11186d8bf0e34117:ab1897c062a22573>
| locality 3
| thread 7f79e57fb700
| ast
| fom @0x7f795008ed20, 'IO fom', transitions: 0, phase: 0
* 2015-04-14-15:33:11.998124387 ios-io-descr file: <4700000000000000:10006>, cob: <1:10006>, read-v: 0, write-v: 0, seg-nr: 1, count: 8192, offset: 36864, descr-nr: 1, colour: 0
| node <11186d8bf0e34117:ab1897c062a22573>
| locality 3
| thread 7f79e57fb700
| ast
| fom @0x7f795008ed20, 'IO fom', transitions: 0, phase: 0
Such records (always going one after another) describe new fom. First record is
a generic fom description, second record is specific to the fom type (cob read
request in our case).
There are two types of measurements:
* measurements corresponding to something that happens in a particular
moment in particular place of code are technically called
"data-points". All records above, except for the first one, are examples
of data-points;
* other measurements correspond to parameters that change "continuously",
that is, too fast for each change to be sampled individually. Such
measurements are called "sensors". Addb2 periodically samples sensors and
records their measurements. Sensor records always include the time-stamp
of the moment when the sensor was sampled.
"Clock" is a sensor. Other examples of sensors are:
* 2015-04-15-13:04:19.614626769 fom-active nr: 24 min: 1 max: 2 avg: 1.500000 dev: 0.250000
| node <b0cf922c851b42c4:89028b3a5eeded41>
| locality 0
* 2015-04-15-13:04:19.614627234 runq nr: 242 min: 0 max: 2 avg: 0.657025 dev: 0.382368
| node <b0cf922c851b42c4:89028b3a5eeded41>
| locality 0
* 2015-04-15-13:04:19.614627526 wail nr: 218 min: 0 max: 2 avg: 0.940367 dev: 0.496444
| node <b0cf922c851b42c4:89028b3a5eeded41>
| locality 0
Which are, respectively, total number of foms in the locality given by the
context, number of foms in locality run-queue and locality wait-list.
Important sensors are fom phase transition counters:
* 2015-04-15-13:04:18.607430792 Read request/IO fom: fom_tx_wait -[Transaction opened]-> IO Prepare nr: 5 min: 131 max: 898 avg: 379.600000 dev: 88701.040000 datum: 0
| node <b0cf922c851b42c4:89028b3a5eeded41>
| locality 1
This counter measures time spent in a particular phase transition for foms of
particular type in a particular locality. Note that:
* this is statistical aggregate. Unless "nr" is 1, data for multiple foms
are accumulated in the measurement. In the example above, 5 foms executed
given state transition in the locality 1;
* the data are for foms of particular type. Not all fom types have such
counters configured for them;
* the data are for specific locality, indicated in the context;
* the counter is reset to 0 each time it is sampled. That is, the counter
shows the time spent in the given phase transition since last time this
counter was recorded.
There is a similar counter for fom *state* transitions:
* 2015-04-15-13:03:13.656801304 /FOM states: Ready -[Run]-> Running nr: 124 min: 3 max: 297 avg: 11.580645 dev: 698.808012 datum: 0
| node <b0cf922c851b42c4:89028b3a5eeded41>
| locality 0
This counter is shared by foms of all types. There are also separate fom state transitions for some fop types:
* 2015-04-20-14:36:13.572621581 Write request/FOM states: Ready -[Run]-> Running nr: 6 min: 3 max: 6 avg: 5.166667 dev: 1.138889 datum: 0
| node <f3b62b87d9e642b2:96a4e0520cc5477b>
| locality 1
See "Fop instrumenting" below.
Flattening addb2dump stream
...........................
Default m0addb2dump output is suitable for human reader, but not very convenient
for ad-hoc queries and parsing by line-oriented tools (grep, sed, awk), because
context and measurement are on separate lines.
addb2dump has options '-f' (flattan output) and '-d' (de-flatten input) converts
between m0addb2dump output format and line-oriented format. Output is
"flattened" by by appending context labels to the measurement line. E.g., the
"clock" record above becomes
* 2015-04-14-15:32:02.395575233 clock | node <11186d8bf0e34117:ab1897c062a22573> | locality 0
(as one line).
'-d' converts its input back into default m0addb2dump format.
Example use:
m0addb2dump -f $INPUT | grep forq-duration | sort -k4,4 | m0addb2dump -d | less
This finds all addb2 records describing fork-queue AST processing, sorts them by
processing time and displays results.
A more sophisticated query would be:
$ m0addb2dump -f $INPUT | \
grep 'Write.*committed.*nr: [^0]' | awk '{print $30}' | sort | uniq -c
which finds all records about WRITE foms transition to the "transaction
committed" phase, extracts locality number ($27) from records and counts how
many times foms committed in each locality.
To find how much memory is allocated for a particular fom, given by its address:
$ m0addb2dump -f $PWD/ios1/addb-stobs/o/*2 |\
grep '^\* [-0-9:.]* alloc.*fom *@0x7fab90041270' |\
awk '{print $5; sum += $5} END {print "Total:", sum}'
8,
8,
48,
...
4416,
2680,
4096,
Total: 17665
BE stats
........
Meta-data back-end transtion state machines has addb2 instrumentation.
State transition counters, sorted by maximal transition times:
$ m0addb2dump $INPUT | grep 'tx/.*nr: [^0]' | sort -k12,12 -n
...
* tx/m0_be_tx::t_sm: active -[closed]-> closed 2015-04-25-05:32:18.506901233 nr: 12 min: 758628 max: 1229427 avg: 1081680.166667 dev: 18205574164.138916 datum: 0
* tx/m0_be_tx::t_sm: active -[closed]-> closed 2015-04-25-05:32:04.350766914 nr: 12 min: 461926 max: 1233291 avg: 915499.750000 dev: 51210171038.854126 datum: 0
* tx/m0_be_tx::t_sm: active -[closed]-> closed 2015-04-25-05:32:25.485491494 nr: 12 min: 703719 max: 1235524 avg: 1040529.416667 dev: 19977720390.909668 datum: 0
* tx/m0_be_tx::t_sm: active -[closed]-> closed 2015-04-25-05:32:11.324305723 nr: 12 min: 516788 max: 1254129 avg: 1058146.416667 dev: 37896670448.576172 datum: 0
* tx/m0_be_tx::t_sm: active -[closed]-> closed 2015-04-25-05:32:40.780703286 nr: 12 min: 364525 max: 1287840 avg: 932578.750000 dev: 65987624052.354126 datum: 0
Ignore open->close transition:
$ m0addb2dump $INPUT | grep 'tx/.*nr: [^0]' | \
grep -vF 'active -[closed]-> closed' | sort -k12,12 -n
...
* tx/m0_be_tx::t_sm: grouped -[logged]-> logged 2015-04-25-05:29:45.194460614 nr: 10 min: 3169 max: 546026 avg: 59068.100000 dev: 26350130603.489998 datum: 0
* tx/m0_be_tx::t_sm: grouped -[logged]-> logged 2015-04-25-05:31:51.382959619 nr: 12 min: 8 max: 577974 avg: 51924.500000 dev: 25161338869.583332 datum: 0
* tx/m0_be_tx::t_sm: logged -[placed]-> placed 2015-04-25-05:32:40.780703645 nr: 12 min: 5 max: 594235 avg: 49571.416667 dev: 26968951761.576389 datum: 0
* tx/m0_be_tx::t_sm: grouped -[logged]-> logged 2015-04-25-05:32:11.324306002 nr: 12 min: 2197 max: 596630 avg: 54254.333333 dev: 26744635298.555557 datum: 0
* tx/m0_be_tx::t_sm: grouped -[logged]-> logged 2015-04-25-05:33:59.814640695 nr: 10 min: 1909 max: 614836 avg: 172483.700000 dev: 66176101077.410004 datum: 0
Statistics on individual state transitions by locality:
m0addb2dump $INPUT -f | grep 'tx-state.*locality *[0-9]' |\
awk '{print $15}' | sort | uniq -c
18838 0
19715 1
20196 2
19162 3