|
| 1 | ++++ title = "NUMA" +++ |
| 2 | + |
| 3 | +# NUMA |
| 4 | + |
| 5 | +NUMA stands for Non-Uniform Memory Access and describes that RAM access |
| 6 | +for CPUs in a large system is not equally fast for all of them. CPUs |
| 7 | +are grouped into so-called nodes and each node has fast access to RAM |
| 8 | +that is considered local to its node and slower access to other RAM. |
| 9 | +Conceptually, a node is a container that bundles some CPUs and RAM and |
| 10 | +there is an associated cost when accessing RAM in a different node. In |
| 11 | +the context of CPU virtualisation assigning vCPUs to NUMA nodes is an |
| 12 | +optimisation strategy to reduce memory latency. This document describes |
| 13 | +a design to make NUMA-related assignments for Xen domains (hence, VMs) |
| 14 | +visible to the user. Below we refer to these assignments and |
| 15 | +optimisations collectively as NUMA for simplicity. |
| 16 | + |
| 17 | +NUMA is more generally discussed as |
| 18 | +[NUMA Feature](../toolstack/features/NUMA/index.md). |
| 19 | + |
| 20 | + |
| 21 | +## NUMA Properties |
| 22 | + |
| 23 | +Xen 4.20 implements NUMA optimisation. We want to expose the following |
| 24 | +NUMA-related properties of VMs to API clients, and in particualar |
| 25 | +XenCenter. Each one is represented by a new field in XAPI's `VM_metrics` |
| 26 | +data model: |
| 27 | + |
| 28 | +* RO `VM_metrics.numa_optimised`: boolean: if the VM is |
| 29 | + optimised for NUMA |
| 30 | +* RO `VM_metrics.numa_nodes`: integer: number of NUMA nodes of the host |
| 31 | + the VM is using |
| 32 | +* MRO `VM_metrics.numa_node_memory`: int -> int map; mapping a NUMA node |
| 33 | + (int) to an amount of memory (bytes) in that node. |
| 34 | + |
| 35 | +Required NUMA support is only available in Xen 4.20. Some parts of the |
| 36 | +code will have to be managed by patches. |
| 37 | + |
| 38 | +## XAPI High-Level Implementation |
| 39 | + |
| 40 | +As far as Xapi clients are concerned, we implement new fields in the |
| 41 | +`VM_metrics` class of the data model and surface the values in the CLI |
| 42 | +via `records.ml`; we could decide to make `numa_optimised` visible by |
| 43 | +default in `xe vm-list`. |
| 44 | + |
| 45 | +Introducing new fields requires defaults; these would be: |
| 46 | + |
| 47 | +* `numa_optimised`: false |
| 48 | +* `numa_nodes`: 0 |
| 49 | +* `numa_node_memory`: [] |
| 50 | + |
| 51 | +The data model ensures that the values are visible to API clients. |
| 52 | + |
| 53 | +## XAPI Low-Level Implementation |
| 54 | + |
| 55 | +NUMA properties are observed by Xenopsd and Xapi learns about them as |
| 56 | +part of the `Client.VM.stat` call implemented by Xenopsd. Xapi makes |
| 57 | +these calls frequently and we will update the Xapi VM fields related to |
| 58 | +NUMA simply as part of processing the result of such a call in Xapi. |
| 59 | + |
| 60 | +For this to work, we extend the return type of `VM.stat` in |
| 61 | + |
| 62 | +* `xenops_types.ml`, type `Vm.state` |
| 63 | + |
| 64 | +with three fields: |
| 65 | + |
| 66 | +* `numa_optimised: bool` |
| 67 | +* `numa_nodes: int` |
| 68 | +* `numa_node_memory: (int, int64) list` |
| 69 | + |
| 70 | +matching the semantics from above. |
| 71 | + |
| 72 | +## Xenopsd Implementation |
| 73 | + |
| 74 | +Xenopsd implements the `VM.stat` return value in |
| 75 | + |
| 76 | +* `Xenops_server_sen.get_state` |
| 77 | + |
| 78 | +where the three fields would be set. Xenopsds relies on bindings to Xen to |
| 79 | +observe NUMA-related properties of a domain. |
| 80 | + |
| 81 | +Given that NUMA related functionality is only available for Xen 4.20, we |
| 82 | +probably will have to maintain a patch in xapi.spec for compatibility |
| 83 | +with earlier Xen versions. |
| 84 | + |
| 85 | +The (existing) C bindings and changes come in two forms: new functions |
| 86 | +and an extension of a type used by and existing function. |
| 87 | + |
| 88 | +```ocaml |
| 89 | + external domain_get_numa_info_node_pages_size : handle -> int -> int |
| 90 | + = "stub_xc_domain_get_numa_info_node_pages_size" |
| 91 | +``` |
| 92 | + |
| 93 | +Thia function reports the number of NUMA nodes used by a Xen domain |
| 94 | +(supplied as an argument) |
| 95 | + |
| 96 | +```ocaml |
| 97 | + type domain_numainfo_node_pages = { |
| 98 | + tot_pages_per_node : int64 array; |
| 99 | + } |
| 100 | + external domain_get_numa_info_node_pages : |
| 101 | + handle -> int -> int -> domain_numainfo_node_pages |
| 102 | + = "stub_xc_domain_get_numa_info_node_pages" |
| 103 | +``` |
| 104 | + |
| 105 | +This function receives as arguments a domain ID and the number of nodes |
| 106 | +this domain is using (acquired using `domain_get_numa_info_node_pages`) |
| 107 | + |
| 108 | +The number of NUMA nodes of the host (not domain) is reported by |
| 109 | +`Xenctrl.physinfo` which returns a value of type `physinfo`. |
| 110 | + |
| 111 | +```diff |
| 112 | + index b4579862ff..491bd3fc73 100644 |
| 113 | + --- a/tools/ocaml/libs/xc/xenctrl.ml |
| 114 | + +++ b/tools/ocaml/libs/xc/xenctrl.ml |
| 115 | + @@ -155,6 +155,7 @@ type physinfo = |
| 116 | + capabilities : physinfo_cap_flag list; |
| 117 | + max_nr_cpus : int; |
| 118 | + arch_capabilities : arch_physinfo_cap_flags; |
| 119 | + + nr_nodes : int; |
| 120 | + } |
| 121 | +``` |
| 122 | + |
| 123 | +We are not reporting `nr_nodes` directly but use it to determine the |
| 124 | +value of `numa_optimised` for a domain/VM: |
| 125 | + |
| 126 | + numa_optimised = |
| 127 | + (VM.numa_nodes = 1) |
| 128 | + or (VM.numa_nodes < physinfo.Xenctrl.nr_nodes) |
| 129 | + |
| 130 | +### Details |
| 131 | + |
| 132 | +The three new fields that become part of type `VM.state` are updated as |
| 133 | +part of `get_state()` using the primitives above. |
| 134 | + |
| 135 | + |
| 136 | + |
0 commit comments