Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 46 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -385,15 +385,58 @@ Finally, you can run the following to cleanup your environment and delete the

## Anatomy of a DRA resource driver

TBD
A DRA resource driver consists of several key components that work together to manage custom resources within a Kubernetes cluster. This example driver illustrates a common pattern for these components:

1. **Kubelet Plugin**: This is a gRPC server that runs on every node where the resource is available, typically as a DaemonSet. It communicates with the kubelet over a Unix domain socket. Its primary responsibilities are:
- **Resource Discovery**: Detecting the available resources on the node and reporting them to the Kubernetes API server by creating `ResourceSlice` objects.
- **Resource Preparation**: When a pod is scheduled to a node, the kubelet calls the `NodePrepareResources` RPC. The plugin then performs any necessary setup for the allocated devices, such as configuring hardware or setting modes.
- **CDI File Generation**: It creates Container Device Interface (CDI) specification files. These files tell the container runtime (like containerd or CRI-O) how to expose the device to the container (e.g., by mounting device nodes or setting environment variables).
- **Resource Unpreparation**: When the pod terminates, the kubelet calls `NodeUnprepareResources`, and the plugin cleans up the resources.

2. **Validating Admission Webhook**: This is a central component, typically run as a Deployment, that intercepts requests to create or update `ResourceClaim` and `ResourceClaimTemplate` objects. It validates the driver-specific parameters to ensure they are correct before the objects are stored in etcd, providing early feedback to the user.

3. **Custom API (CRD) for Parameters**: The driver defines its own API for configuration, which is installed as a Custom Resource Definition (CRD). In this example, it's the `GpuConfig` CRD. This allows users to specify detailed, structured configuration for their resource requests within a `ResourceClaim`.

4. **Deployment Mechanism (Helm)**: The driver components (Kubelet Plugin DaemonSet, Webhook Deployment, CRD, RBAC rules, etc.) are packaged into a Helm chart for easy and repeatable installation onto a cluster.

## Code Organization

TBD
The repository is organized to separate these components clearly:

```
├── api/ # Go types for the custom resource parameters (ex. GpuConfig CRD)
├── cmd/
│ ├── dra-example-kubeletplugin/ # Source code for the node-local Kubelet Plugin
│ └── dra-example-webhook/ # Source code for the validating admission webhook
├── demo/ # Scripts and manifests for running a local demo
├── deployments/
│ └── helm/ # The Helm chart for deploying the driver and other Kubernetes objects needed to run it
├── pkg/ # Shared Go packages (currently minimal)
├── hack/ # Helper scripts for development (e.g., code generation)
└── test/ # End-to-end tests
```

## Best Practices

TBD
When using this repository as a starting point for your own production driver, consider the following best practices:

* **Fork, Don't Reinvent**: Use this repository as a template. It provides a solid foundation for handling gRPC communication, checkpointing, and CDI integration.

* **Define a Clear API**: Create a well-defined, versioned API for your resource parameters (your equivalent of `GpuConfig`). Use the validating webhook to enforce the schema and provide users with immediate, clear feedback on invalid configurations.

* **Implement Real Device Logic**:
* Replace the mock device discovery in `cmd/dra-example-kubeletplugin/discovery.go` with code that interacts with your actual hardware.
* In `cmd/dra-example-kubeletplugin/state.go`, modify `applyConfig` to perform real hardware configuration instead of just setting environment variables.

* **Use CDI for Container Integration**: The Container Device Interface (CDI) is the standard, portable way to make devices available to containers. Use it to specify device nodes, environment variables, and mounts. Avoid runtime-specific workarounds.

* **Ensure Idempotency**: The kubelet may call `NodePrepareResources` or `NodeUnprepareResources` multiple times for the same claim. Your implementation of these functions must be idempotent, meaning they can be run multiple times without causing errors or unintended side effects.

* **Manage State with Checkpoints**: The kubelet plugin is stateless from the kubelet's perspective. As shown in `cmd/dra-example-kubeletplugin/state.go`, use a checkpoint file to persist the state of prepared resources on the node. This allows your driver to recover its state if it restarts.

* **Robust Deployment**:
* Use a Helm chart or a similar tool to manage the deployment of all your driver's components, including the CRD, DaemonSet, Deployment, and all necessary RBAC roles and bindings.
* Ensure your Kubelet Plugin DaemonSet uses the correct tolerations to run on all applicable nodes.

## References

Expand Down
Loading