Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update prometheus-slurm-exporter version #280

Merged
merged 1 commit into from
Jun 7, 2023

Conversation

m-bull
Copy link
Collaborator

@m-bull m-bull commented May 31, 2023

prometheus-slurm-exporter 0.20 fails to parse GRES output for Slurm >19.05.0rc1 and just ends up in a crash loop.

Update versions to point to new RPM (https://github.com/stackhpc/prometheus-slurm-exporter/releases/tag/0.21) built from the development branch of https://github.com/stackhpc/prometheus-slurm-exporter, which has the relevant fixes in.

@m-bull m-bull requested a review from a team as a code owner May 31, 2023 16:33
@m-bull
Copy link
Collaborator Author

m-bull commented May 31, 2023

Version 0.20 logs this:

May 31 18:57:14 matta-slurm-control-0.novalocal prometheus-slurm-exporter[64809]: time="2023-05-31T18:57:14Z" level=info msg="Starting Server: 0.0.0.0:9341" source="main.go:59"
May 31 18:57:14 matta-slurm-control-0.novalocal prometheus-slurm-exporter[64809]: time="2023-05-31T18:57:14Z" level=info msg="GPUs Accounting: false" source="main.go:60"
May 31 18:57:18 matta-slurm-control-0.novalocal prometheus-slurm-exporter[64809]: panic: runtime error: index out of range [4] with length 4
May 31 18:57:18 matta-slurm-control-0.novalocal prometheus-slurm-exporter[64809]: goroutine 40 [running]:
May 31 18:57:18 matta-slurm-control-0.novalocal prometheus-slurm-exporter[64809]: main.ParseNodeMetrics(0xc000436000, 0xca, 0x600, 0x88f973)
May 31 18:57:18 matta-slurm-control-0.novalocal prometheus-slurm-exporter[64809]:         /github/home/rpmbuild/BUILD/prometheus-slurm-exporter-0.20/node.go:56 +0x6d6
May 31 18:57:18 matta-slurm-control-0.novalocal prometheus-slurm-exporter[64809]: main.NodeGetMetrics(0x1)
May 31 18:57:18 matta-slurm-control-0.novalocal prometheus-slurm-exporter[64809]:         /github/home/rpmbuild/BUILD/prometheus-slurm-exporter-0.20/node.go:40 +0x2a
May 31 18:57:18 matta-slurm-control-0.novalocal prometheus-slurm-exporter[64809]: main.(*NodeCollector).Collect(0xc00007fdd0, 0xc000070ea0)
May 31 18:57:18 matta-slurm-control-0.novalocal prometheus-slurm-exporter[64809]:         /github/home/rpmbuild/BUILD/prometheus-slurm-exporter-0.20/node.go:128 +0x37
May 31 18:57:18 matta-slurm-control-0.novalocal prometheus-slurm-exporter[64809]: github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
May 31 18:57:18 matta-slurm-control-0.novalocal prometheus-slurm-exporter[64809]:         /github/home/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/registry.go:443 +0x12b
May 31 18:57:18 matta-slurm-control-0.novalocal prometheus-slurm-exporter[64809]: created by github.com/prometheus/client_golang/prometheus.(*Registry).Gather
May 31 18:57:18 matta-slurm-control-0.novalocal prometheus-slurm-exporter[64809]:         /github/home/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/registry.go:535 +0xe4d
May 31 18:57:18 matta-slurm-control-0.novalocal systemd[1]: prometheus-slurm-exporter.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
May 31 18:57:18 matta-slurm-control-0.novalocal systemd[1]: prometheus-slurm-exporter.service: Failed with result 'exit-code'.

Which is fixed in 0.21.

Copy link
Member

@JohnGarbutt JohnGarbutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice.

Copy link
Member

@JohnGarbutt JohnGarbutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice.

Copy link
Member

@sd109 sd109 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sjpb
Copy link
Collaborator

sjpb commented Jun 6, 2023

@m-bull does this also fix #187?

@m-bull
Copy link
Collaborator Author

m-bull commented Jun 6, 2023

Yeah - discussion here: vpenso/prometheus-slurm-exporter#67, with this fix from the development branch.

@sjpb sjpb merged commit 584f4b2 into main Jun 7, 2023
@sjpb sjpb deleted the update/prom-slurm-exporter-0.21 branch June 7, 2023 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants