Skip to content

Commit

Permalink
add helm chart (#2)
Browse files Browse the repository at this point in the history
Signed-off-by: Dmitry Shmulevich <[email protected]>
  • Loading branch information
dmitsh authored Sep 26, 2024
1 parent 2b62522 commit 3094731
Show file tree
Hide file tree
Showing 24 changed files with 802 additions and 5 deletions.
61 changes: 61 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Contribute to the NVIDIA `topograph` Project

Want to contribute to the NVIDIA `topograph` project? Awesome!
We only require you to sign your work as described in the following section.

## Sign your work

The sign-off is a simple signature at the end of the description for the patch.
Your signature certifies that you wrote the patch or otherwise have the right
to pass it on as an open-source patch.

The rules are pretty simple, and sign-off means that you certify the DCO below
(from [developercertificate.org](http://developercertificate.org/)):

```
Developer Certificate of Origin
Version 1.1
Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
```

To sign off, you just add the following line to every git commit message:

Signed-off-by: Joe Smith <[email protected]>

You must use your real name (sorry, no pseudonyms or anonymous contributions).

If you set your `user.name` and `user.email` using git config, you can sign
your commit automatically with `git commit -s`.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

LINTER_BIN ?= golangci-lint
DOCKER_BIN ?= docker
TARGETS := topograph topology-state-observer toposim
TARGETS := topograph node-observer toposim
CMD_DIR := ./cmd
OUTPUT_DIR := ./bin

Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,9 @@ The Topology Generator is the central component that manages the overall network
- **Topology Gathering:** Instructs the CSP Connector to fetch the current network topology from the CSP.
- **User Cluster Update:** Translates network topology from the internal format into a format expected by the user cluster, such as SLURM or Kubernetes.

### 4. Kubernetes State Observer
The State Observer is used when the Topology Generator is deployed in a Kubernetes cluster. It monitors changes in the cluster nodes and the ConfigMap containing the topology configuration. If a node's status changes (e.g., a node goes down or comes up) or if the ConfigMap is deleted, the State Observer sends a request to the API Server to generate a new topology configuration.

### 4. Kubernetes Node Observer
The Node Observer is used when the Topology Generator is deployed in a Kubernetes cluster. It monitors changes in the cluster nodes.
If a node's status changes (e.g., a node goes down or comes up), the State Observer sends a request to the API Server to generate a new topology configuration.

## Supported Environments

Expand Down
23 changes: 23 additions & 0 deletions charts/node-observer/.helmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/
24 changes: 24 additions & 0 deletions charts/node-observer/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
apiVersion: v2
name: node-observer
description: A Helm chart for Kubernetes

# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application

# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.1.0

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "1.16.0"
1 change: 1 addition & 0 deletions charts/node-observer/templates/NOTES.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Helm chart has been successfully installed.
62 changes: 62 additions & 0 deletions charts/node-observer/templates/_helpers.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
{{/*
Expand the name of the chart.
*/}}
{{- define "node-observer.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Create a default fully qualified app name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
If release name contains chart name it will be used as a full name.
*/}}
{{- define "node-observer.fullname" -}}
{{- if .Values.fullnameOverride }}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- $name := default .Chart.Name .Values.nameOverride }}
{{- if contains $name .Release.Name }}
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}
{{- end }}

{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "node-observer.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Common labels
*/}}
{{- define "node-observer.labels" -}}
helm.sh/chart: {{ include "node-observer.chart" . }}
{{ include "node-observer.selectorLabels" . }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}

{{/*
Selector labels
*/}}
{{- define "node-observer.selectorLabels" -}}
app.kubernetes.io/name: {{ include "node-observer.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}

{{/*
Create the name of the service account to use
*/}}
{{- define "node-observer.serviceAccountName" -}}
{{- if .Values.serviceAccount.create }}
{{- default (include "node-observer.fullname" .) .Values.serviceAccount.name }}
{{- else }}
{{- default "default" .Values.serviceAccount.name }}
{{- end }}
{{- end }}
17 changes: 17 additions & 0 deletions charts/node-observer/templates/configmap.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "node-observer.fullname" . }}
labels:
{{- include "node-observer.labels" . | nindent 4 }}
data:
node-observer-config.yaml: |-
topology_generator_url: "{{ .Values.topograph.url }}"
topology_configmap:
name: {{ .Values.topograph.configmap.name }}
namespace: {{ .Values.topograph.configmap.namespace }}
filename: {{ .Values.topograph.configmap.filename }}
node_labels:
{{- toYaml .Values.topograph.node_labels | nindent 6 }}
provider: {{ .Values.topograph.provider }}
engine: {{ .Values.topograph.engine }}
62 changes: 62 additions & 0 deletions charts/node-observer/templates/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "node-observer.fullname" . }}
labels:
{{- include "node-observer.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
{{- include "node-observer.selectorLabels" . | nindent 6 }}
template:
metadata:
{{- with .Values.podAnnotations }}
annotations:
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
{{- include "node-observer.labels" . | nindent 8 }}
{{- with .Values.podLabels }}
{{- toYaml . | nindent 8 }}
{{- end }}
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "node-observer.serviceAccountName" . }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
containers:
- name: {{ .Chart.Name }}
securityContext:
{{- toYaml .Values.securityContext | nindent 12 }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
command:
- /usr/local/bin/node-observer
args:
- -v={{ .Values.verbosity }}
resources:
{{- toYaml .Values.resources | nindent 12 }}
volumeMounts:
- name: config-volume
mountPath: /etc/topograph
volumes:
- name: config-volume
configMap:
defaultMode: 420
name: {{ include "node-observer.fullname" . }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
22 changes: 22 additions & 0 deletions charts/node-observer/templates/rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ include "node-observer.serviceAccountName" . }}
rules:
- apiGroups: [""]
resources: ["*"]
verbs: [get,list,watch]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: {{ include "node-observer.serviceAccountName" . }}
subjects:
- kind: ServiceAccount
name: {{ include "node-observer.serviceAccountName" . }}
namespace: {{.Release.Namespace}}
apiGroup: ""
roleRef:
kind: ClusterRole
name: {{ include "node-observer.serviceAccountName" . }}
apiGroup: ""
13 changes: 13 additions & 0 deletions charts/node-observer/templates/serviceaccount.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{{- if .Values.serviceAccount.create -}}
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ include "node-observer.serviceAccountName" . }}
labels:
{{- include "node-observer.labels" . | nindent 4 }}
{{- with .Values.serviceAccount.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
automountServiceAccountToken: {{ .Values.serviceAccount.automount }}
{{- end }}
80 changes: 80 additions & 0 deletions charts/node-observer/values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Default values for topology-state-observer.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

replicaCount: 1

image:
repository: ghcr.io/nvidia/topograph
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: "main"

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

serviceAccount:
# Specifies whether a service account should be created
create: true
# Automatically mount a ServiceAccount's API credentials?
automount: true
# Annotations to add to the service account
annotations: {}
# The name of the service account to use.
# If not set and create is true, a name is generated using the fullname template
name: ""

verbosity: 3

topograph:
url: "http://topograph.default.svc.cluster.local:49021/v1/generate"
configmap:
name: topology-config
namespace: default
filename: topology.conf
node_labels:
kubernetes.io/role: agent
provider: test
engine: k8s

podAnnotations: {}
podLabels: {}

podSecurityContext: {}
# fsGroup: 2000

securityContext: {}
# capabilities:
# drop:
# - ALL
# readOnlyRootFilesystem: true
# runAsNonRoot: true
# runAsUser: 1000

resources: {}
# We usually recommend not to specify default resources and to leave this as a conscious
# choice for the user. This also increases chances charts run on environments with little
# resources, such as Minikube. If you do want to specify resources, uncomment the following
# lines, adjust them as necessary, and remove the curly braces after 'resources:'.
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi

livenessProbe:
httpGet:
path: /
port: http
readinessProbe:
httpGet:
path: /
port: http

nodeSelector: {}

tolerations: []

affinity: {}
Loading

0 comments on commit 3094731

Please sign in to comment.