Trying to run a single server in Kubernetes #95

barryw · 2018-08-08T13:19:20Z

Summary

I'm trying to deploy Cronicle to Kubernetes, and I have it mostly working, but if the Cronicle pod gets recreated Cronicle comes up in a perpetual "Waiting for master" state. I've set my maingrp regex to match the hostname of any new pod, but it doesn't help. The pod will also have a new IP address, so maybe that's the problem.

I'm using a persistent volume so that a new pod still retains the Cronicle data.

Steps to reproduce the problem

I have a private Docker image that I'm using that's specific for Kubernetes deployment, but the gist of the problem is that I should be able to recreate the Cronicle container using persistent data and have it declare itself the new master without intervention.

Your Setup

I'm running a custom Cronicle docker image that I've built myself that was tailor made for Kubernetes. In this environment, the Cronicle pod could get recreated (failed Kubernetes worker node), and when it does, it will have a different hostname (which is based on the pod name) and different IP address. I need to have a configuration that allows it to come up and declare itself the new master without intervention. The underlying config and data are stored on a persistent volume, so Cronicle will keep config and data even after the pod is recreated.

Operating system and version?

I'm using the node:6.11-alpine base Docker image.

Node.js version?

6.11

Cronicle software version?

Latest from master branch

Are you using a multi-server setup, or just a single server?

For now I just want to use a single master deployed in Kubernetes. It might even be nice to have a switch or config setting that does away with the whole master election and lets me run as a single server.

Are you using the filesystem as back-end storage, or S3/Couchbase?

Data and configs are stored persistently on Kubernetes persistent volumes (filesystem storage).

Can you reproduce the crash consistently?

Absolutely. I can bring up a new Cronicle with an empty data directory and everything works great. If I delete the Cronicle pod, Kubernetes dutifully brings up a new one and lets me know that Cronicle has already been configured (persistent disk which includes information about the current master). It determines that the current host is not the master and then waits for the master to contact it. My maingrp regex is ".+", which should catch any host. I've also tried using "^(cybric-local-cronicle.+)$" (pod names look like this: cybric-local-cronicle-5ddd6fbf66-cbhsl) without success.

Log Excerpts

I don't see anything unusual in the logs, except that it says that it's not a master and will wait for the current one:

[1533676052.705][2018-08-07 21:07:32][cybric-local-cronicle-5ddd6fbf66-pmhbk][Cronicle][debug][2][Server IP: 172.17.0.9, Daemon PID: 40][]
[1533676052.715][2018-08-07 21:07:32][cybric-local-cronicle-5ddd6fbf66-pmhbk][Cronicle][debug][3][Starting component: Storage][]
[1533676052.724][2018-08-07 21:07:32][cybric-local-cronicle-5ddd6fbf66-pmhbk][Cronicle][debug][3][Starting component: WebServer][]
[1533676052.739][2018-08-07 21:07:32][cybric-local-cronicle-5ddd6fbf66-pmhbk][Cronicle][debug][3][Starting component: API][]
[1533676052.742][2018-08-07 21:07:32][cybric-local-cronicle-5ddd6fbf66-pmhbk][Cronicle][debug][3][Starting component: User][]
[1533676052.744][2018-08-07 21:07:32][cybric-local-cronicle-5ddd6fbf66-pmhbk][Cronicle][debug][3][Starting component: Cronicle][]
[1533676052.745][2018-08-07 21:07:32][cybric-local-cronicle-5ddd6fbf66-pmhbk][Cronicle][debug][3][Cronicle engine starting up][["/usr/local/bin/node","/opt/cronicle/lib/main.js","--debug","--echo","--master"]]
[1533676052.757][2018-08-07 21:07:32][cybric-local-cronicle-5ddd6fbf66-pmhbk][Cronicle][debug][4][Using broadcast IP: 172.17.255.255][]
[1533676052.758][2018-08-07 21:07:32][cybric-local-cronicle-5ddd6fbf66-pmhbk][Cronicle][debug][4][Starting UDP server on port: 3014][]
[1533676052.783][2018-08-07 21:07:32][cybric-local-cronicle-5ddd6fbf66-pmhbk][Cronicle][debug][4][Server not found in cluster -- waiting for a master server to contact us][]
[1533676052.785][2018-08-07 21:07:32][cybric-local-cronicle-5ddd6fbf66-pmhbk][Cronicle][debug][2][Startup complete, entering main loop][]
[1533676061.722][2018-08-07 21:07:41][cybric-local-cronicle-5ddd6fbf66-pmhbk][Cronicle][debug][5][New socket.io client connected: Yj2vw_X-ABG1KHVBAAAA (IP: ::ffff:172.17.0.4)][]
[1533676061.754][2018-08-07 21:07:41][cybric-local-cronicle-5ddd6fbf66-pmhbk][Cronicle][debug][4][Socket client Yj2vw_X-ABG1KHVBAAAA has authenticated via user session (IP: ::ffff:172.17.0.4)][]
[1533676074.729][2018-08-07 21:07:54][cybric-local-cronicle-5ddd6fbf66-pmhbk][Cronicle][debug][5][Socket.io client disconnected: Yj2vw_X-ABG1KHVBAAAA (IP: ::ffff:172.17.0.4)][]

Dockerfile

FROM node:6.11-alpine

WORKDIR /opt/cronicle/
RUN apk update && apk add git curl wget perl bash perl-pathtools
# Update tar in Alpine to fix "tar: unrecognized option: strip-components" issue:
RUN apk --update add tar procps jq
RUN curl -s https://raw.githubusercontent.com/jhuckaby/Cronicle/master/bin/install.js | node
ADD setup_and_start_in_debug.sh /opt/cronicle/setup_and_start_in_debug.sh

EXPOSE 3012

CMD ["sh", "setup_and_start_in_debug.sh"]

setup_and_start_in_debug.sh

#!/bin/bash

cp $CONFIG_DIR/config.json /opt/cronicle/conf/config.json
cp $CONFIG_DIR/setup.json /opt/cronicle/conf/setup.json

/opt/cronicle/bin/control.sh setup

# Set the admin password
/opt/cronicle/bin/control.sh admin admin $ADMIN_PASSWORD

# Set the API key
sed -i -e "s/__API_KEY__/${API_KEY}/" /opt/cronicle/conf/setup.json

/opt/cronicle/bin/debug.sh --master

$CONFIG_DIR is mounted from a ConfigMap, which is just a way of storing files and configurations in Kubernetes. The data directory is mounted as a persistent volume in /opt/cronicle/data_mount

API_KEY and ADMIN_PASSWORD are stored as Kubernetes secrets and passed in at pod creation time so that we can default them to known values. We use a custom setup.json in the ConfigMap that contains a placeholder for our API user.

Thanks!
Barry

The text was updated successfully, but these errors were encountered:

jhuckaby · 2018-08-08T23:19:21Z

Hey Barry,

This issue seems almost identical to another one you submitted last year: Issue #36 Stuck in "Waiting for master server..."

I think my response is still the same. I've never used Kubernetes, but I suspect your "pod" (server?) hostname is changing, which really confuses Cronicle. As explained in #36, Cronicle is extremely sensitive and frankly entirely dependent on the server hostname being static (never changing after initial install). If your hostname is going to change, you have to do one of two things:

(1) Follow the instructions in #36 (which have been promoted to a Troubleshooting Wiki) to change your server hostname, possibly on boot. You can now do it programmatically by the way, by exporting, manipulating and resubmitting the server data record. For example:

# Export server data (JSON)
/opt/cronicle/bin/storage-cli.js get global/servers/0 > SERVERS.JSON

# Manipulate SERVERS.JSON using whatever tools you want and fix the server hostname(s)
# Use sed, awk, or something like jq: https://stedolan.github.io/jq/

# Submit new server data JSON back to Cronicle:
cat SERVERS.JSON | /opt/cronicle/bin/storage-cli.js put global/servers/0

(2) See Issue #4 (Add docker support) which has lots of information about making Cronicle work with Docker. I assume Kubernetes would be similar. It sounds like several people have succeeded in making this work. One thing I talked about in #4 is forcing the server to become master, and ignoring the server hostname data. This is detailed in Comment 268882549.

Good luck.

barryw · 2018-08-09T14:55:22Z

Thanks, @jhuckaby , that did the trick.

asharma2040 · 2022-08-29T19:26:57Z

Hi @barryw
May I know how did you solved the above issue which you were facing? I am facing the same issue & I am looking for an automatic way to handle this whenever AWS EKS decides to re-deploy the cluster.

ugoviti · 2022-12-24T09:12:03Z

Hi,

if can help to somebody else, here a quick and dirty single line command:

    ./bin/storage-cli.js get global/servers/0 | jq --arg hostname "$(hostname -s)" --arg ip "$(hostname -i)" '.items[0].hostname=$hostname | .items[0].ip=$ip' | ./bin/storage-cli.js put global/servers/0
    ./bin/storage-cli.js get global/server_groups/0 | jq --arg hostname "^($(hostname -s))$" '.items[0].regexp=$hostname' | ./bin/storage-cli.js put global/server_groups/0

barryw closed this as completed Aug 9, 2018

jhuckaby mentioned this issue Aug 29, 2022

Static server name & ip doesn’t update #526

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to run a single server in Kubernetes #95

Trying to run a single server in Kubernetes #95

barryw commented Aug 8, 2018 •

edited

Loading

jhuckaby commented Aug 8, 2018

barryw commented Aug 9, 2018

asharma2040 commented Aug 29, 2022

ugoviti commented Dec 24, 2022 •

edited

Loading

Trying to run a single server in Kubernetes #95

Trying to run a single server in Kubernetes #95

Comments

barryw commented Aug 8, 2018 • edited Loading

Summary

Steps to reproduce the problem

Your Setup

Operating system and version?

Node.js version?

Cronicle software version?

Are you using a multi-server setup, or just a single server?

Are you using the filesystem as back-end storage, or S3/Couchbase?

Can you reproduce the crash consistently?

Log Excerpts

jhuckaby commented Aug 8, 2018

barryw commented Aug 9, 2018

asharma2040 commented Aug 29, 2022

ugoviti commented Dec 24, 2022 • edited Loading

barryw commented Aug 8, 2018 •

edited

Loading

ugoviti commented Dec 24, 2022 •

edited

Loading