Skip to content

Commit 73de328

Browse files
authored
Add numa.md design sketch (#6719)
Design sketch for exposing a VM's _NUMA_ properties observable though the API.
2 parents 7a3cf08 + c058215 commit 73de328

File tree

1 file changed

+136
-0
lines changed

1 file changed

+136
-0
lines changed

doc/content/design/numa.md

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
+++ title = "NUMA" +++
2+
3+
# NUMA
4+
5+
NUMA stands for Non-Uniform Memory Access and describes that RAM access
6+
for CPUs in a large system is not equally fast for all of them. CPUs
7+
are grouped into so-called nodes and each node has fast access to RAM
8+
that is considered local to its node and slower access to other RAM.
9+
Conceptually, a node is a container that bundles some CPUs and RAM and
10+
there is an associated cost when accessing RAM in a different node. In
11+
the context of CPU virtualisation assigning vCPUs to NUMA nodes is an
12+
optimisation strategy to reduce memory latency. This document describes
13+
a design to make NUMA-related assignments for Xen domains (hence, VMs)
14+
visible to the user. Below we refer to these assignments and
15+
optimisations collectively as NUMA for simplicity.
16+
17+
NUMA is more generally discussed as
18+
[NUMA Feature](../toolstack/features/NUMA/index.md).
19+
20+
21+
## NUMA Properties
22+
23+
Xen 4.20 implements NUMA optimisation. We want to expose the following
24+
NUMA-related properties of VMs to API clients, and in particualar
25+
XenCenter. Each one is represented by a new field in XAPI's `VM_metrics`
26+
data model:
27+
28+
* RO `VM_metrics.numa_optimised`: boolean: if the VM is
29+
optimised for NUMA
30+
* RO `VM_metrics.numa_nodes`: integer: number of NUMA nodes of the host
31+
the VM is using
32+
* MRO `VM_metrics.numa_node_memory`: int -> int map; mapping a NUMA node
33+
(int) to an amount of memory (bytes) in that node.
34+
35+
Required NUMA support is only available in Xen 4.20. Some parts of the
36+
code will have to be managed by patches.
37+
38+
## XAPI High-Level Implementation
39+
40+
As far as Xapi clients are concerned, we implement new fields in the
41+
`VM_metrics` class of the data model and surface the values in the CLI
42+
via `records.ml`; we could decide to make `numa_optimised` visible by
43+
default in `xe vm-list`.
44+
45+
Introducing new fields requires defaults; these would be:
46+
47+
* `numa_optimised`: false
48+
* `numa_nodes`: 0
49+
* `numa_node_memory`: []
50+
51+
The data model ensures that the values are visible to API clients.
52+
53+
## XAPI Low-Level Implementation
54+
55+
NUMA properties are observed by Xenopsd and Xapi learns about them as
56+
part of the `Client.VM.stat` call implemented by Xenopsd. Xapi makes
57+
these calls frequently and we will update the Xapi VM fields related to
58+
NUMA simply as part of processing the result of such a call in Xapi.
59+
60+
For this to work, we extend the return type of `VM.stat` in
61+
62+
* `xenops_types.ml`, type `Vm.state`
63+
64+
with three fields:
65+
66+
* `numa_optimised: bool`
67+
* `numa_nodes: int`
68+
* `numa_node_memory: (int, int64) list`
69+
70+
matching the semantics from above.
71+
72+
## Xenopsd Implementation
73+
74+
Xenopsd implements the `VM.stat` return value in
75+
76+
* `Xenops_server_sen.get_state`
77+
78+
where the three fields would be set. Xenopsds relies on bindings to Xen to
79+
observe NUMA-related properties of a domain.
80+
81+
Given that NUMA related functionality is only available for Xen 4.20, we
82+
probably will have to maintain a patch in xapi.spec for compatibility
83+
with earlier Xen versions.
84+
85+
The (existing) C bindings and changes come in two forms: new functions
86+
and an extension of a type used by and existing function.
87+
88+
```ocaml
89+
external domain_get_numa_info_node_pages_size : handle -> int -> int
90+
= "stub_xc_domain_get_numa_info_node_pages_size"
91+
```
92+
93+
Thia function reports the number of NUMA nodes used by a Xen domain
94+
(supplied as an argument)
95+
96+
```ocaml
97+
type domain_numainfo_node_pages = {
98+
tot_pages_per_node : int64 array;
99+
}
100+
external domain_get_numa_info_node_pages :
101+
handle -> int -> int -> domain_numainfo_node_pages
102+
= "stub_xc_domain_get_numa_info_node_pages"
103+
```
104+
105+
This function receives as arguments a domain ID and the number of nodes
106+
this domain is using (acquired using `domain_get_numa_info_node_pages`)
107+
108+
The number of NUMA nodes of the host (not domain) is reported by
109+
`Xenctrl.physinfo` which returns a value of type `physinfo`.
110+
111+
```diff
112+
index b4579862ff..491bd3fc73 100644
113+
--- a/tools/ocaml/libs/xc/xenctrl.ml
114+
+++ b/tools/ocaml/libs/xc/xenctrl.ml
115+
@@ -155,6 +155,7 @@ type physinfo =
116+
capabilities : physinfo_cap_flag list;
117+
max_nr_cpus : int;
118+
arch_capabilities : arch_physinfo_cap_flags;
119+
+ nr_nodes : int;
120+
}
121+
```
122+
123+
We are not reporting `nr_nodes` directly but use it to determine the
124+
value of `numa_optimised` for a domain/VM:
125+
126+
numa_optimised =
127+
(VM.numa_nodes = 1)
128+
or (VM.numa_nodes < physinfo.Xenctrl.nr_nodes)
129+
130+
### Details
131+
132+
The three new fields that become part of type `VM.state` are updated as
133+
part of `get_state()` using the primitives above.
134+
135+
136+

0 commit comments

Comments
 (0)