Skip to content

Commit 5b53a91

Browse files
authored
Merge pull request #746 from Mirantis/jell/vmpodlc
Documentation of VM pod lifecycle
2 parents 8d0ea61 + 3ed00cf commit 5b53a91

File tree

2 files changed

+98
-0
lines changed

2 files changed

+98
-0
lines changed

docs/README.md

+1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ This directory contains the Virtlet documentation.
44
* [Cloud-init data generation](cloud-init-data-generation.md)
55
* [Developer documentation](devel/README.md)
66
* [Architecture overview](architecture.md)
7+
* [VM pod lifecycle](vmpod-lifecycle.md)
78
* [Description of networking](networking.md)
89
* [Description of SyncPod workflow](sync-pod-workflow.md)
910
* [Volume handling](volumes.md)

docs/vmpod-lifecycle.md

+97
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# Lifecycle of a VM pod
2+
3+
This document describes the lifecycle of VM pod managed by Virtlet.
4+
5+
This description omits the details of volume setup (using
6+
[flexvolumes](https://kubernetes.io/docs/concepts/storage/volumes/#flexvolume)),
7+
handling of logs, the VM console and port forwarding (done by
8+
[streaming server](https://github.com/Mirantis/virtlet/tree/master/pkg/stream)),
9+
or port forwarding.
10+
11+
## Assumptions
12+
13+
Communication between kubelet and Virtlet goes through [criproxy](https://github.com/Mirantis/criproxy)
14+
which directs requests to Virtlet only if the requests concern a pod that has
15+
Virtlet-specific annotation or an image that has Virtlet-specific prefix.
16+
17+
## Lifecycle
18+
19+
### VM Pod Startup
20+
21+
* A pod is created in Kubernetes cluster, either directly by the user or via
22+
some other mechanism such as a higher-level Kubernetes object managed by
23+
`kube-controller-manager` (ReplicaSet, DaemonSet etc.).
24+
* Scheduler places the pod on a node based on the requested resources
25+
(CPU, memory, etc.) as well as pod's nodeSelector and pod/node affinity
26+
constraints, taints/tolerations and so on.
27+
* `kubelet` running on the target node accepts the pod.
28+
* `kubelet` invokes a [CRI](https://contributor.kubernetes.io/contributors/devel/container-runtime-interface/)
29+
call RunPodSandbox to create the pod sandbox which
30+
will enclose all the containers in the pod definition. Note that at this
31+
point no information about the containers within the pod is passed
32+
to the call. `kubelet` can later request the information about the pod
33+
by means of `PodSandboxStatus` calls.
34+
* If there's a Virtlet-specific annotation `kubernetes.io/target-runtime: virtlet.cloud`,
35+
CRI proxy passes the call to Virtlet.
36+
* Virtlet saves sandbox metadata in its internal database, sets up the
37+
network namespace and then uses internal `tapmanager` mechanism to invoke
38+
`ADD` operation via the CNI plugin as specified by the
39+
CNI configuration on the node.
40+
* The CNI plugin configures the network namespace by setting up
41+
network interfaces, IP addresses, routes, iptables rules and so on,
42+
and returns the network configuration information to the caller as described
43+
in the [CNI spec](https://github.com/containernetworking/cni/blob/master/SPEC.md#result).
44+
* Virtlet's [`tapmanager`](https://github.com/Mirantis/virtlet/tree/master/pkg/tapmanager)
45+
mechanism adjusts the configuration of the network namespace to make it work with the VM.
46+
* After creating the sandbox, kubelet starts the containers defined in
47+
the pod sandbox. Currently, Virtlet supports just one container per VM pod.
48+
So, the VM pod startup steps after this one describe the startup of this single container.
49+
* Depending on the image pull police of the container, kubelet checks if
50+
the image needs to be pulled by means of `ImageStatus` call and then uses
51+
`PullImage` CRI call to pull the image if it doesn't exist or if
52+
`imagePullPolicy: Always` is used.
53+
* If `PullImage` is invoked, Virtlet resolves the image location based on the
54+
[image name translation configuration](https://github.com/Mirantis/virtlet/blob/master/docs/image-name-translation.md),
55+
then downloads the file and stores it in the image store.
56+
* After the image is ready (no pull was needed or the `PullImage` call completed
57+
successfully), kubelet uses `CreateContainer` CRI call to create
58+
the container in the pod sandbox using the specified image.
59+
* Virtlet uses the sandbox and container metadata to generate libvirt domain definition,
60+
using [`vmwrapper`](https://github.com/Mirantis/virtlet/tree/master/cmd/vmwrapper)
61+
binary as the emulator and without specifying any network configuration in the domain.
62+
* After `CreateContainer` call completes, `kubelet` invokes `StartContainer` call
63+
on the newly created container.
64+
* Virtlet starts the libvirt domain. libvirt invokes `vmwrapper` as the emulator,
65+
passing it the necessary command line arguments as well as environment variables
66+
set by Virtlet. `vmwrapper` uses the environment variable values passed
67+
to Virtlet to communicate with `tapmanager` over an Unix domain socket,
68+
retrieving a file descriptor for a tap device and/or pci address of SR-IOV
69+
device set up by `tapmanager`. `tapmanager` uses its own simple protocol to
70+
communicate with `vmwrapper` because it needs to send file descriptors over
71+
the socket. This is not usually supported by RPC libraries, see e.g.
72+
[grpc/grpc#11417](https://github.com/grpc/grpc/issues/11417).
73+
`vmwrapper` then updates the command line arguments to include the network
74+
interface information and execs the actual emulator (`qemu`).
75+
76+
At this point the VM is running and accessible via the network, and the pod is
77+
in `Running` state as well as it's only container.
78+
79+
### Deleting the pod
80+
81+
This sequence is initiated when the pod is deleted, either by means of `kubectl delete`
82+
or a controller manager action due to deletion or downscaling of a higher-level object.
83+
84+
* `kubelet` notices the pod being deleted.
85+
* `kubelet` invokes `StopContainer` CRI calls which is getting forwared
86+
to Virtlet based on the containing pod sandbox annotations.
87+
* Virtlet stops the libvirt domain. libvirt sends a signal to `qemu`, which initiates
88+
the shutdown. If it doesn't quit in a reasonable time determined by pod's
89+
termination grace period, Virtlet will forcibly terminate the domain,
90+
thus killing the `qemu` process.
91+
* After all the containers in the pod (the single container in case of
92+
Virtlet VM pod) are stopped, kubelet invokes `StopPodSandbox` CRI call.
93+
* Virtlet asks its `tapmanager` to remove pod from the network by means of
94+
`CNI DEL` command.
95+
* after `StopPodSandbox` returns, the pod sandbox will be eventually GC'd
96+
by `kubelet` by means of `RemovePodSandbox` CRI call.
97+
* Upon `RemovePodSandbox`, Virtlet removes the pod metadata from its internal database.

0 commit comments

Comments
 (0)