Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v24.3.4] Redpanda won't form cluster #24985

Open
jsonbrooks opened this issue Jan 30, 2025 · 9 comments
Open

[v24.3.4] Redpanda won't form cluster #24985

jsonbrooks opened this issue Jan 30, 2025 · 9 comments
Labels
kind/bug Something isn't working

Comments

@jsonbrooks
Copy link

jsonbrooks commented Jan 30, 2025

Version & Environment

Redpanda version: (use rpk version): v24.3.4. I've also tried on v23.3.

Running on Ubuntu 22.04.3 LTS. Manual installation. No Docker.

What went wrong?

I have 3 bare-metal nodes that can ping each other. I ran all tuning steps and the following config commands as mentioned in the docs for manual installation.

sudo rpk redpanda config bootstrap --self 10.0.25.98 --ips 10.0.25.98,10.0.25.99,10.0.25.103 && \
sudo rpk redpanda config set redpanda.empty_seed_starts_cluster false

sudo rpk redpanda config bootstrap --self 10.0.25.99 --ips 10.0.25.98,10.0.25.99,10.0.25.103 && \
sudo rpk redpanda config set redpanda.empty_seed_starts_cluster false

sudo rpk redpanda config bootstrap --self 10.0.25.103 --ips 10.0.25.98,10.0.25.99,10.0.25.103 && \
sudo rpk redpanda config set redpanda.empty_seed_starts_cluster false

I checked the redpanda.yaml config files and they all looked exactly like what I would expect from the docs. empty_seed_starts_cluster is false. Seeds are correct and exactly the same on each, self ip everywhere else is correct. The Redpanda systemctl process is definitely pointing at these configs.

When I ran systemctl start redpanda on all of the nodes, redpanda started up successfully but rpk cluster info shows only one node on each redpanda, itself. There are no errors in the logs, and I see no mentioned of trying to reach out to the other seed nodes. I do see the correct configs spit out in the logs on startup.

What should have happened instead?

It should form a cluster.

JIRA Link: CORE-8967

@jsonbrooks jsonbrooks added the kind/bug Something isn't working label Jan 30, 2025
@dotnwat
Copy link
Member

dotnwat commented Jan 30, 2025

it sounds like three one-node clusters may have been formed. did you by chance try to setup the cluster once, and then repeat it later without wiping out all data/configs?

sharing start-up logs for the nodes or maybe the rpk cluster information (pointed at each of the three nodes separately) could help confirm the three one-node clusters scenario.

@jsonbrooks
Copy link
Author

I'm not sure if I've done what you say, but here are the results of rpk cluster info on all 3 nodes separately:

It does seem like its formed 3 one node clusters, even though its configured otherwise. How do I get it to stop?

CLUSTER
=======
redpanda.a92b503d-afd8-42f0-9f6e-8f15e121cbd7

BROKERS
=======
ID    HOST        PORT
0*    10.0.25.98  9092
CLUSTER
=======
redpanda.155e6dca-0a05-43da-bca3-6fcc4b351655

BROKERS
=======
ID    HOST        PORT
0*    10.0.25.99  9092
CLUSTER
=======
redpanda.32285561-2360-4786-997a-152810b95130

BROKERS
=======
ID    HOST         PORT
0*    10.0.25.103  9092

@jsonbrooks
Copy link
Author

I manually wiped the data directories and that did it.

Just a bit of feedback. I couldn't find any reference to any sort of resets or states like this anywhere in the docs or even google. Even after you told me what was happening I don't really get what happened or how to fix it without a hard manual wipe.

My guess is it just started up when I installed it, but before I configured it? Anyway as a result I've been stuck for days. Maybe add something about this to the docs?

@dotnwat
Copy link
Member

dotnwat commented Jan 31, 2025

Even after you told me what was happening I don't really get what happened or how to fix it without a hard manual wipe.

It's hard to say how it happened. I can't remember the last time I received a report like this, but it is certainly possible to create the situation if you try hard enough. Glad it is working for you now. If you happen to reproduce it, we'd be interested in that flow especially if there is a gap in the docs that can lead someone down a bad path.

Anyway as a result I've been stuck for days. Maybe add something about this to the docs?

I'll pass this along! Thanks!

@patrickangeles
Copy link

hey @jsonbrooks ... which docs did you follow?

@jsonbrooks
Copy link
Author

jsonbrooks commented Feb 3, 2025

@dotnwat
Copy link
Member

dotnwat commented Feb 4, 2025

@jsonbrooks just to be clear if you follow the instructions on that page then the problem doesn't reproduce, but you had followed it before and the issue appeared? for example, some fragile part of the instructions or something along those lines? I guess I'm trying to understand if there is a way for us to reproduce the issue.

@jsonbrooks
Copy link
Author

jsonbrooks commented Feb 4, 2025

I followed the instructions and the issue appeared. I'm also currently using the environment so I'm afraid I can't go delete everything to try and reproduce the issue at the moment.

My best guess is I ran it or it ran itself right after installation, so before the tuning stuff or configuring the cluster IPs, and that caused it to established one node clusters, which don't seem to change with new configurations. It is supposed to keep the current cluster seeds etc if you configure new ones without deleting data dir by the way or was that part a bug?

@dotnwat
Copy link
Member

dotnwat commented Feb 5, 2025

It is supposed to keep the current cluster seeds etc if you configure new ones without deleting data dir by the way or was that part a bug?

Yeh, I believe so. That initial bootstrap command. Thanks again for all the info. Let us know if you run into any more issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants