diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..d60474d --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,61 @@ +# Contribute to the NVIDIA `topograph` Project + +Want to contribute to the NVIDIA `topograph` project? Awesome! +We only require you to sign your work as described in the following section. + +## Sign your work + +The sign-off is a simple signature at the end of the description for the patch. +Your signature certifies that you wrote the patch or otherwise have the right +to pass it on as an open-source patch. + +The rules are pretty simple, and sign-off means that you certify the DCO below +(from [developercertificate.org](http://developercertificate.org/)): + +``` +Developer Certificate of Origin +Version 1.1 + +Copyright (C) 2004, 2006 The Linux Foundation and its contributors. +1 Letterman Drive +Suite D4700 +San Francisco, CA, 94129 + +Everyone is permitted to copy and distribute verbatim copies of this +license document, but changing it is not allowed. + +Developer's Certificate of Origin 1.1 + +By making a contribution to this project, I certify that: + +(a) The contribution was created in whole or in part by me and I + have the right to submit it under the open source license + indicated in the file; or + +(b) The contribution is based upon previous work that, to the best + of my knowledge, is covered under an appropriate open source + license and I have the right under that license to submit that + work with modifications, whether created in whole or in part + by me, under the same open source license (unless I am + permitted to submit under a different license), as indicated + in the file; or + +(c) The contribution was provided directly to me by some other + person who certified (a), (b) or (c) and I have not modified + it. + +(d) I understand and agree that this project and the contribution + are public and that a record of the contribution (including all + personal information I submit with it, including my sign-off) is + maintained indefinitely and may be redistributed consistent with + this project or the open source license(s) involved. +``` + +To sign off, you just add the following line to every git commit message: + + Signed-off-by: Joe Smith + +You must use your real name (sorry, no pseudonyms or anonymous contributions). + +If you set your `user.name` and `user.email` using git config, you can sign +your commit automatically with `git commit -s`. diff --git a/Makefile b/Makefile index 6782492..56180da 100644 --- a/Makefile +++ b/Makefile @@ -14,7 +14,7 @@ LINTER_BIN ?= golangci-lint DOCKER_BIN ?= docker -TARGETS := topograph topology-state-observer toposim +TARGETS := topograph node-observer toposim CMD_DIR := ./cmd OUTPUT_DIR := ./bin diff --git a/README.md b/README.md index 6252861..8e1ac47 100644 --- a/README.md +++ b/README.md @@ -30,9 +30,9 @@ The Topology Generator is the central component that manages the overall network - **Topology Gathering:** Instructs the CSP Connector to fetch the current network topology from the CSP. - **User Cluster Update:** Translates network topology from the internal format into a format expected by the user cluster, such as SLURM or Kubernetes. -### 4. Kubernetes State Observer -The State Observer is used when the Topology Generator is deployed in a Kubernetes cluster. It monitors changes in the cluster nodes and the ConfigMap containing the topology configuration. If a node's status changes (e.g., a node goes down or comes up) or if the ConfigMap is deleted, the State Observer sends a request to the API Server to generate a new topology configuration. - +### 4. Kubernetes Node Observer +The Node Observer is used when the Topology Generator is deployed in a Kubernetes cluster. It monitors changes in the cluster nodes. +If a node's status changes (e.g., a node goes down or comes up), the State Observer sends a request to the API Server to generate a new topology configuration. ## Supported Environments diff --git a/charts/node-observer/.helmignore b/charts/node-observer/.helmignore new file mode 100644 index 0000000..0e8a0eb --- /dev/null +++ b/charts/node-observer/.helmignore @@ -0,0 +1,23 @@ +# Patterns to ignore when building packages. +# This supports shell glob matching, relative path matching, and +# negation (prefixed with !). Only one pattern per line. +.DS_Store +# Common VCS dirs +.git/ +.gitignore +.bzr/ +.bzrignore +.hg/ +.hgignore +.svn/ +# Common backup files +*.swp +*.bak +*.tmp +*.orig +*~ +# Various IDEs +.project +.idea/ +*.tmproj +.vscode/ diff --git a/charts/node-observer/Chart.yaml b/charts/node-observer/Chart.yaml new file mode 100644 index 0000000..f746cfd --- /dev/null +++ b/charts/node-observer/Chart.yaml @@ -0,0 +1,24 @@ +apiVersion: v2 +name: node-observer +description: A Helm chart for Kubernetes + +# A chart can be either an 'application' or a 'library' chart. +# +# Application charts are a collection of templates that can be packaged into versioned archives +# to be deployed. +# +# Library charts provide useful utilities or functions for the chart developer. They're included as +# a dependency of application charts to inject those utilities and functions into the rendering +# pipeline. Library charts do not define any templates and therefore cannot be deployed. +type: application + +# This is the chart version. This version number should be incremented each time you make changes +# to the chart and its templates, including the app version. +# Versions are expected to follow Semantic Versioning (https://semver.org/) +version: 0.1.0 + +# This is the version number of the application being deployed. This version number should be +# incremented each time you make changes to the application. Versions are not expected to +# follow Semantic Versioning. They should reflect the version the application is using. +# It is recommended to use it with quotes. +appVersion: "1.16.0" diff --git a/charts/node-observer/templates/NOTES.txt b/charts/node-observer/templates/NOTES.txt new file mode 100644 index 0000000..13044b5 --- /dev/null +++ b/charts/node-observer/templates/NOTES.txt @@ -0,0 +1 @@ +Helm chart has been successfully installed. diff --git a/charts/node-observer/templates/_helpers.tpl b/charts/node-observer/templates/_helpers.tpl new file mode 100644 index 0000000..d098fda --- /dev/null +++ b/charts/node-observer/templates/_helpers.tpl @@ -0,0 +1,62 @@ +{{/* +Expand the name of the chart. +*/}} +{{- define "node-observer.name" -}} +{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Create a default fully qualified app name. +We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec). +If release name contains chart name it will be used as a full name. +*/}} +{{- define "node-observer.fullname" -}} +{{- if .Values.fullnameOverride }} +{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }} +{{- else }} +{{- $name := default .Chart.Name .Values.nameOverride }} +{{- if contains $name .Release.Name }} +{{- .Release.Name | trunc 63 | trimSuffix "-" }} +{{- else }} +{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }} +{{- end }} +{{- end }} +{{- end }} + +{{/* +Create chart name and version as used by the chart label. +*/}} +{{- define "node-observer.chart" -}} +{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Common labels +*/}} +{{- define "node-observer.labels" -}} +helm.sh/chart: {{ include "node-observer.chart" . }} +{{ include "node-observer.selectorLabels" . }} +{{- if .Chart.AppVersion }} +app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} +{{- end }} +app.kubernetes.io/managed-by: {{ .Release.Service }} +{{- end }} + +{{/* +Selector labels +*/}} +{{- define "node-observer.selectorLabels" -}} +app.kubernetes.io/name: {{ include "node-observer.name" . }} +app.kubernetes.io/instance: {{ .Release.Name }} +{{- end }} + +{{/* +Create the name of the service account to use +*/}} +{{- define "node-observer.serviceAccountName" -}} +{{- if .Values.serviceAccount.create }} +{{- default (include "node-observer.fullname" .) .Values.serviceAccount.name }} +{{- else }} +{{- default "default" .Values.serviceAccount.name }} +{{- end }} +{{- end }} diff --git a/charts/node-observer/templates/configmap.yml b/charts/node-observer/templates/configmap.yml new file mode 100644 index 0000000..f5b4796 --- /dev/null +++ b/charts/node-observer/templates/configmap.yml @@ -0,0 +1,17 @@ +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ include "node-observer.fullname" . }} + labels: + {{- include "node-observer.labels" . | nindent 4 }} +data: + node-observer-config.yaml: |- + topology_generator_url: "{{ .Values.topograph.url }}" + topology_configmap: + name: {{ .Values.topograph.configmap.name }} + namespace: {{ .Values.topograph.configmap.namespace }} + filename: {{ .Values.topograph.configmap.filename }} + node_labels: + {{- toYaml .Values.topograph.node_labels | nindent 6 }} + provider: {{ .Values.topograph.provider }} + engine: {{ .Values.topograph.engine }} diff --git a/charts/node-observer/templates/deployment.yaml b/charts/node-observer/templates/deployment.yaml new file mode 100644 index 0000000..b38944b --- /dev/null +++ b/charts/node-observer/templates/deployment.yaml @@ -0,0 +1,62 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: {{ include "node-observer.fullname" . }} + labels: + {{- include "node-observer.labels" . | nindent 4 }} +spec: + replicas: {{ .Values.replicaCount }} + selector: + matchLabels: + {{- include "node-observer.selectorLabels" . | nindent 6 }} + template: + metadata: + {{- with .Values.podAnnotations }} + annotations: + {{- toYaml . | nindent 8 }} + {{- end }} + labels: + {{- include "node-observer.labels" . | nindent 8 }} + {{- with .Values.podLabels }} + {{- toYaml . | nindent 8 }} + {{- end }} + spec: + {{- with .Values.imagePullSecrets }} + imagePullSecrets: + {{- toYaml . | nindent 8 }} + {{- end }} + serviceAccountName: {{ include "node-observer.serviceAccountName" . }} + securityContext: + {{- toYaml .Values.podSecurityContext | nindent 8 }} + containers: + - name: {{ .Chart.Name }} + securityContext: + {{- toYaml .Values.securityContext | nindent 12 }} + image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}" + imagePullPolicy: {{ .Values.image.pullPolicy }} + command: + - /usr/local/bin/node-observer + args: + - -v={{ .Values.verbosity }} + resources: + {{- toYaml .Values.resources | nindent 12 }} + volumeMounts: + - name: config-volume + mountPath: /etc/topograph + volumes: + - name: config-volume + configMap: + defaultMode: 420 + name: {{ include "node-observer.fullname" . }} + {{- with .Values.nodeSelector }} + nodeSelector: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.affinity }} + affinity: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.tolerations }} + tolerations: + {{- toYaml . | nindent 8 }} + {{- end }} diff --git a/charts/node-observer/templates/rbac.yaml b/charts/node-observer/templates/rbac.yaml new file mode 100644 index 0000000..7b397a5 --- /dev/null +++ b/charts/node-observer/templates/rbac.yaml @@ -0,0 +1,22 @@ +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: {{ include "node-observer.serviceAccountName" . }} +rules: +- apiGroups: [""] + resources: ["*"] + verbs: [get,list,watch] +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: {{ include "node-observer.serviceAccountName" . }} +subjects: +- kind: ServiceAccount + name: {{ include "node-observer.serviceAccountName" . }} + namespace: {{.Release.Namespace}} + apiGroup: "" +roleRef: + kind: ClusterRole + name: {{ include "node-observer.serviceAccountName" . }} + apiGroup: "" diff --git a/charts/node-observer/templates/serviceaccount.yaml b/charts/node-observer/templates/serviceaccount.yaml new file mode 100644 index 0000000..44e09ea --- /dev/null +++ b/charts/node-observer/templates/serviceaccount.yaml @@ -0,0 +1,13 @@ +{{- if .Values.serviceAccount.create -}} +apiVersion: v1 +kind: ServiceAccount +metadata: + name: {{ include "node-observer.serviceAccountName" . }} + labels: + {{- include "node-observer.labels" . | nindent 4 }} + {{- with .Values.serviceAccount.annotations }} + annotations: + {{- toYaml . | nindent 4 }} + {{- end }} +automountServiceAccountToken: {{ .Values.serviceAccount.automount }} +{{- end }} diff --git a/charts/node-observer/values.yaml b/charts/node-observer/values.yaml new file mode 100644 index 0000000..2278da2 --- /dev/null +++ b/charts/node-observer/values.yaml @@ -0,0 +1,80 @@ +# Default values for topology-state-observer. +# This is a YAML-formatted file. +# Declare variables to be passed into your templates. + +replicaCount: 1 + +image: + repository: ghcr.io/nvidia/topograph + pullPolicy: IfNotPresent + # Overrides the image tag whose default is the chart appVersion. + tag: "main" + +imagePullSecrets: [] +nameOverride: "" +fullnameOverride: "" + +serviceAccount: + # Specifies whether a service account should be created + create: true + # Automatically mount a ServiceAccount's API credentials? + automount: true + # Annotations to add to the service account + annotations: {} + # The name of the service account to use. + # If not set and create is true, a name is generated using the fullname template + name: "" + +verbosity: 3 + +topograph: + url: "http://topograph.default.svc.cluster.local:49021/v1/generate" + configmap: + name: topology-config + namespace: default + filename: topology.conf + node_labels: + kubernetes.io/role: agent + provider: test + engine: k8s + +podAnnotations: {} +podLabels: {} + +podSecurityContext: {} + # fsGroup: 2000 + +securityContext: {} + # capabilities: + # drop: + # - ALL + # readOnlyRootFilesystem: true + # runAsNonRoot: true + # runAsUser: 1000 + +resources: {} + # We usually recommend not to specify default resources and to leave this as a conscious + # choice for the user. This also increases chances charts run on environments with little + # resources, such as Minikube. If you do want to specify resources, uncomment the following + # lines, adjust them as necessary, and remove the curly braces after 'resources:'. + # limits: + # cpu: 100m + # memory: 128Mi + # requests: + # cpu: 100m + # memory: 128Mi + +livenessProbe: + httpGet: + path: / + port: http +readinessProbe: + httpGet: + path: / + port: http + +nodeSelector: {} + +tolerations: [] + +affinity: {} diff --git a/charts/topograph/.helmignore b/charts/topograph/.helmignore new file mode 100644 index 0000000..0e8a0eb --- /dev/null +++ b/charts/topograph/.helmignore @@ -0,0 +1,23 @@ +# Patterns to ignore when building packages. +# This supports shell glob matching, relative path matching, and +# negation (prefixed with !). Only one pattern per line. +.DS_Store +# Common VCS dirs +.git/ +.gitignore +.bzr/ +.bzrignore +.hg/ +.hgignore +.svn/ +# Common backup files +*.swp +*.bak +*.tmp +*.orig +*~ +# Various IDEs +.project +.idea/ +*.tmproj +.vscode/ diff --git a/charts/topograph/Chart.yaml b/charts/topograph/Chart.yaml new file mode 100644 index 0000000..d858d1e --- /dev/null +++ b/charts/topograph/Chart.yaml @@ -0,0 +1,24 @@ +apiVersion: v2 +name: topograph +description: A Helm chart for Kubernetes + +# A chart can be either an 'application' or a 'library' chart. +# +# Application charts are a collection of templates that can be packaged into versioned archives +# to be deployed. +# +# Library charts provide useful utilities or functions for the chart developer. They're included as +# a dependency of application charts to inject those utilities and functions into the rendering +# pipeline. Library charts do not define any templates and therefore cannot be deployed. +type: application + +# This is the chart version. This version number should be incremented each time you make changes +# to the chart and its templates, including the app version. +# Versions are expected to follow Semantic Versioning (https://semver.org/) +version: 0.1.0 + +# This is the version number of the application being deployed. This version number should be +# incremented each time you make changes to the application. Versions are not expected to +# follow Semantic Versioning. They should reflect the version the application is using. +# It is recommended to use it with quotes. +appVersion: "1.16.0" diff --git a/charts/topograph/templates/NOTES.txt b/charts/topograph/templates/NOTES.txt new file mode 100644 index 0000000..2f09866 --- /dev/null +++ b/charts/topograph/templates/NOTES.txt @@ -0,0 +1,22 @@ +1. Get the application URL by running these commands: +{{- if .Values.ingress.enabled }} +{{- range $host := .Values.ingress.hosts }} + {{- range .paths }} + http{{ if $.Values.ingress.tls }}s{{ end }}://{{ $host.host }}{{ .path }} + {{- end }} +{{- end }} +{{- else if contains "NodePort" .Values.service.type }} + export NODE_PORT=$(kubectl get --namespace {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}" services {{ include "topograph.fullname" . }}) + export NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}") + echo http://$NODE_IP:$NODE_PORT +{{- else if contains "LoadBalancer" .Values.service.type }} + NOTE: It may take a few minutes for the LoadBalancer IP to be available. + You can watch its status by running 'kubectl get --namespace {{ .Release.Namespace }} svc -w {{ include "topograph.fullname" . }}' + export SERVICE_IP=$(kubectl get svc --namespace {{ .Release.Namespace }} {{ include "topograph.fullname" . }} --template "{{"{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}"}}") + echo http://$SERVICE_IP:{{ .Values.service.port }} +{{- else if contains "ClusterIP" .Values.service.type }} + export POD_NAME=$(kubectl get pods --namespace {{ .Release.Namespace }} -l "app.kubernetes.io/name={{ include "topograph.name" . }},app.kubernetes.io/instance={{ .Release.Name }}" -o jsonpath="{.items[0].metadata.name}") + export CONTAINER_PORT=$(kubectl get pod --namespace {{ .Release.Namespace }} $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}") + echo "Visit http://127.0.0.1:8080 to use your application" + kubectl --namespace {{ .Release.Namespace }} port-forward $POD_NAME 8080:$CONTAINER_PORT +{{- end }} diff --git a/charts/topograph/templates/_helpers.tpl b/charts/topograph/templates/_helpers.tpl new file mode 100644 index 0000000..556db37 --- /dev/null +++ b/charts/topograph/templates/_helpers.tpl @@ -0,0 +1,62 @@ +{{/* +Expand the name of the chart. +*/}} +{{- define "topograph.name" -}} +{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Create a default fully qualified app name. +We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec). +If release name contains chart name it will be used as a full name. +*/}} +{{- define "topograph.fullname" -}} +{{- if .Values.fullnameOverride }} +{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }} +{{- else }} +{{- $name := default .Chart.Name .Values.nameOverride }} +{{- if contains $name .Release.Name }} +{{- .Release.Name | trunc 63 | trimSuffix "-" }} +{{- else }} +{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }} +{{- end }} +{{- end }} +{{- end }} + +{{/* +Create chart name and version as used by the chart label. +*/}} +{{- define "topograph.chart" -}} +{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Common labels +*/}} +{{- define "topograph.labels" -}} +helm.sh/chart: {{ include "topograph.chart" . }} +{{ include "topograph.selectorLabels" . }} +{{- if .Chart.AppVersion }} +app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} +{{- end }} +app.kubernetes.io/managed-by: {{ .Release.Service }} +{{- end }} + +{{/* +Selector labels +*/}} +{{- define "topograph.selectorLabels" -}} +app.kubernetes.io/name: {{ include "topograph.name" . }} +app.kubernetes.io/instance: {{ .Release.Name }} +{{- end }} + +{{/* +Create the name of the service account to use +*/}} +{{- define "topograph.serviceAccountName" -}} +{{- if .Values.serviceAccount.create }} +{{- default (include "topograph.fullname" .) .Values.serviceAccount.name }} +{{- else }} +{{- default "default" .Values.serviceAccount.name }} +{{- end }} +{{- end }} diff --git a/charts/topograph/templates/configmap.yml b/charts/topograph/templates/configmap.yml new file mode 100644 index 0000000..c99c57f --- /dev/null +++ b/charts/topograph/templates/configmap.yml @@ -0,0 +1,15 @@ +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ include "topograph.fullname" . }} + labels: + {{- include "topograph.labels" . | nindent 4 }} +data: + topograph-config.yaml: |- + http: + port: {{ .Values.service.port }} + ssl: false + request_aggregation_delay: {{ .Values.service.request_aggregation_delay }} + {{- if .Values.service.credentials_secret }} + credentials_path: /etc/topograph/credentials/config.yaml + {{- end }} diff --git a/charts/topograph/templates/deployment.yaml b/charts/topograph/templates/deployment.yaml new file mode 100644 index 0000000..67dc561 --- /dev/null +++ b/charts/topograph/templates/deployment.yaml @@ -0,0 +1,80 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: {{ include "topograph.fullname" . }} + labels: + {{- include "topograph.labels" . | nindent 4 }} +spec: + replicas: {{ .Values.replicaCount }} + selector: + matchLabels: + {{- include "topograph.selectorLabels" . | nindent 6 }} + template: + metadata: + {{- with .Values.podAnnotations }} + annotations: + {{- toYaml . | nindent 8 }} + {{- end }} + labels: + {{- include "topograph.labels" . | nindent 8 }} + {{- with .Values.podLabels }} + {{- toYaml . | nindent 8 }} + {{- end }} + spec: + {{- with .Values.imagePullSecrets }} + imagePullSecrets: + {{- toYaml . | nindent 8 }} + {{- end }} + serviceAccountName: {{ include "topograph.serviceAccountName" . }} + securityContext: + {{- toYaml .Values.podSecurityContext | nindent 8 }} + containers: + - name: {{ .Chart.Name }} + securityContext: + {{- toYaml .Values.securityContext | nindent 12 }} + image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}" + imagePullPolicy: {{ .Values.image.pullPolicy }} + command: + - /usr/local/bin/topograph + args: + - -v={{ .Values.verbosity }} + ports: + - name: http + containerPort: {{ .Values.service.port }} + protocol: TCP + livenessProbe: + {{- toYaml .Values.livenessProbe | nindent 12 }} + readinessProbe: + {{- toYaml .Values.readinessProbe | nindent 12 }} + resources: + {{- toYaml .Values.resources | nindent 12 }} + volumeMounts: + - name: config-volume + mountPath: /etc/topograph + {{- if .Values.service.credentials_secret }} + - name: secret-volume + mountPath: /etc/topograph/credentials + readOnly: true + {{- end }} + volumes: + - name: config-volume + configMap: + defaultMode: 420 + name: {{ include "topograph.fullname" . }} + {{- if .Values.service.credentials_secret }} + - name: secret-volume + secret: + secretName: {{ .Values.service.credentials_secret }} + {{- end }} + {{- with .Values.nodeSelector }} + nodeSelector: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.affinity }} + affinity: + {{- toYaml . | nindent 8 }} + {{- end }} + {{- with .Values.tolerations }} + tolerations: + {{- toYaml . | nindent 8 }} + {{- end }} diff --git a/charts/topograph/templates/ingress.yaml b/charts/topograph/templates/ingress.yaml new file mode 100644 index 0000000..86236bb --- /dev/null +++ b/charts/topograph/templates/ingress.yaml @@ -0,0 +1,61 @@ +{{- if .Values.ingress.enabled -}} +{{- $fullName := include "topograph.fullname" . -}} +{{- $svcPort := .Values.service.port -}} +{{- if and .Values.ingress.className (not (semverCompare ">=1.18-0" .Capabilities.KubeVersion.GitVersion)) }} + {{- if not (hasKey .Values.ingress.annotations "kubernetes.io/ingress.class") }} + {{- $_ := set .Values.ingress.annotations "kubernetes.io/ingress.class" .Values.ingress.className}} + {{- end }} +{{- end }} +{{- if semverCompare ">=1.19-0" .Capabilities.KubeVersion.GitVersion -}} +apiVersion: networking.k8s.io/v1 +{{- else if semverCompare ">=1.14-0" .Capabilities.KubeVersion.GitVersion -}} +apiVersion: networking.k8s.io/v1beta1 +{{- else -}} +apiVersion: extensions/v1beta1 +{{- end }} +kind: Ingress +metadata: + name: {{ $fullName }} + labels: + {{- include "topograph.labels" . | nindent 4 }} + {{- with .Values.ingress.annotations }} + annotations: + {{- toYaml . | nindent 4 }} + {{- end }} +spec: + {{- if and .Values.ingress.className (semverCompare ">=1.18-0" .Capabilities.KubeVersion.GitVersion) }} + ingressClassName: {{ .Values.ingress.className }} + {{- end }} + {{- if .Values.ingress.tls }} + tls: + {{- range .Values.ingress.tls }} + - hosts: + {{- range .hosts }} + - {{ . | quote }} + {{- end }} + secretName: {{ .secretName }} + {{- end }} + {{- end }} + rules: + {{- range .Values.ingress.hosts }} + - host: {{ .host | quote }} + http: + paths: + {{- range .paths }} + - path: {{ .path }} + {{- if and .pathType (semverCompare ">=1.18-0" $.Capabilities.KubeVersion.GitVersion) }} + pathType: {{ .pathType }} + {{- end }} + backend: + {{- if semverCompare ">=1.19-0" $.Capabilities.KubeVersion.GitVersion }} + service: + name: {{ $fullName }} + port: + number: {{ $svcPort }} + {{- else }} + serviceName: {{ $fullName }} + servicePort: {{ $svcPort }} + {{- end }} + {{- end }} + {{- end }} +{{- end }} diff --git a/charts/topograph/templates/rbac.yaml b/charts/topograph/templates/rbac.yaml new file mode 100644 index 0000000..d09c4ed --- /dev/null +++ b/charts/topograph/templates/rbac.yaml @@ -0,0 +1,25 @@ +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: {{ include "topograph.serviceAccountName" . }} +rules: +- apiGroups: [""] + resources: ["configmaps"] + verbs: [get,list,watch,create,update] +- apiGroups: [""] + resources: ["nodes"] + verbs: [get,list,watch,update] +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: {{ include "topograph.serviceAccountName" . }} +subjects: +- kind: ServiceAccount + name: {{ include "topograph.serviceAccountName" . }} + namespace: {{.Release.Namespace}} + apiGroup: "" +roleRef: + kind: ClusterRole + name: {{ include "topograph.serviceAccountName" . }} + apiGroup: "" diff --git a/charts/topograph/templates/service.yaml b/charts/topograph/templates/service.yaml new file mode 100644 index 0000000..f739eeb --- /dev/null +++ b/charts/topograph/templates/service.yaml @@ -0,0 +1,15 @@ +apiVersion: v1 +kind: Service +metadata: + name: {{ include "topograph.fullname" . }} + labels: + {{- include "topograph.labels" . | nindent 4 }} +spec: + type: {{ .Values.service.type }} + ports: + - port: {{ .Values.service.port }} + targetPort: http + protocol: TCP + name: http + selector: + {{- include "topograph.selectorLabels" . | nindent 4 }} diff --git a/charts/topograph/templates/serviceaccount.yaml b/charts/topograph/templates/serviceaccount.yaml new file mode 100644 index 0000000..f8811e7 --- /dev/null +++ b/charts/topograph/templates/serviceaccount.yaml @@ -0,0 +1,13 @@ +{{- if .Values.serviceAccount.create -}} +apiVersion: v1 +kind: ServiceAccount +metadata: + name: {{ include "topograph.serviceAccountName" . }} + labels: + {{- include "topograph.labels" . | nindent 4 }} + {{- with .Values.serviceAccount.annotations }} + annotations: + {{- toYaml . | nindent 4 }} + {{- end }} +automountServiceAccountToken: {{ .Values.serviceAccount.automount }} +{{- end }} diff --git a/charts/topograph/values.yaml b/charts/topograph/values.yaml new file mode 100644 index 0000000..9eb7b14 --- /dev/null +++ b/charts/topograph/values.yaml @@ -0,0 +1,92 @@ +# Default values for topology-generator. +# This is a YAML-formatted file. +# Declare variables to be passed into your templates. + +replicaCount: 1 + +image: + repository: ghcr.io/nvidia/topograph + pullPolicy: IfNotPresent + # Overrides the image tag whose default is the chart appVersion. + tag: "main" + +imagePullSecrets: [] +nameOverride: "" +fullnameOverride: "" + +serviceAccount: + # Specifies whether a service account should be created + create: true + # Automatically mount a ServiceAccount's API credentials? + automount: true + # Annotations to add to the service account + annotations: {} + # The name of the service account to use. + # If not set and create is true, a name is generated using the fullname template + name: "" + +verbosity: 3 + +podAnnotations: {} +podLabels: {} + +podSecurityContext: {} + # fsGroup: 2000 + +securityContext: {} + # capabilities: + # drop: + # - ALL + # readOnlyRootFilesystem: true + # runAsNonRoot: true + # runAsUser: 1000 + +service: + type: ClusterIP + port: 49021 + request_aggregation_delay: 15s + # Optional secret with CSP credentials + # credentials_secret: + +ingress: + enabled: false + className: "" + annotations: {} + # kubernetes.io/ingress.class: nginx + # kubernetes.io/tls-acme: "true" + hosts: + - host: chart-example.local + paths: + - path: / + pathType: ImplementationSpecific + tls: [] + # - secretName: chart-example-tls + # hosts: + # - chart-example.local + +resources: {} + # We usually recommend not to specify default resources and to leave this as a conscious + # choice for the user. This also increases chances charts run on environments with little + # resources, such as Minikube. If you do want to specify resources, uncomment the following + # lines, adjust them as necessary, and remove the curly braces after 'resources:'. + # limits: + # cpu: 100m + # memory: 128Mi + # requests: + # cpu: 100m + # memory: 128Mi + +livenessProbe: + httpGet: + path: /healthz + port: http +readinessProbe: + httpGet: + path: /healthz + port: http + +nodeSelector: {} + +tolerations: [] + +affinity: {} diff --git a/cmd/topology-state-observer/main.go b/cmd/node-observer/main.go similarity index 95% rename from cmd/topology-state-observer/main.go rename to cmd/node-observer/main.go index 0442402..da506bf 100644 --- a/cmd/topology-state-observer/main.go +++ b/cmd/node-observer/main.go @@ -36,7 +36,7 @@ var GitTag string func main() { var c string var version bool - flag.StringVar(&c, "c", "/etc/topograph/state-observer-config.yaml", "config file") + flag.StringVar(&c, "c", "/etc/topograph/node-observer-config.yaml", "config file") flag.BoolVar(&version, "version", false, "show the version") klog.InitFlags(nil)