Quick installation manual for a docker-compose cluster including Apache Kafka managed by Apache Zokeeper as well as a Apache Spark image together with a jupyter notebook.
- docker & docker-compose
- git
- get tar/gzip utility. E.g.:
- 7zip [for Windows]
- tar [for Linux]
Get latest kafka version from the official Download Page and unzip it. E.g. on Linux
wget http://ftp.halifax.rwth-aachen.de/apache/kafka/2.1.0/kafka_2.11-2.1.0.tgz
tar -xvzf kafka_2.11-2.1.0.tgz
Fix the kafka_dir parameter in the docker-composer.yml to the directory name of the extracted kafka binaries, in case it is different. Caution: The extraction directory for the kafka files should lay within the repository folder. Docker can only copy files which are within the directroy of docker-compose.yml. Next build the docker containers using compose:
docker-compose build
Finally run the container cluster by calling
docker-compose up
From the logs docker keeps during runtime, you can find the login token necessary for the jupyter login. Execute
docker logs $(docker ps -f name=notebook -q)
from Powershell or Bash. When following the log entries the following line should be printed somewhere:
"Copy/paste this URL into your browser when you connect for the first time, to login with a token:"
Beneath the login token is presented together with the URL like:
http://(bce2f8d711eb or 127.0.0.1):8888/?token=cddf0aaf8cefe270ff318973c0dd382c957917fa7ea8e5bd
Thus, depending on the IP of your docker machine (usually its localhost) you can reach the notebook with:
http://[docker machine IP]:8888/?token=cddf0aaf8cefe270ff318973c0dd382c957917fa7ea8e5bd
Executed from kafka main directory:
For Linux
# create new topic
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
# list topics
bin/kafka-topics.sh --list --zookeeper localhost:2181
# produce ...
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
# consume ...
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
For Windows
# create new topic
bin\windows\kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
# list topics
bin\windows\kafka-topics.bat --list --zookeeper localhost:2181
# produce ...
bin\windows\kafka-console-producer.bat --broker-list localhost:9092 --topic test
# consume ...
bin\windows\kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic test --from-beginning
Further help can be found at the Kafka Quickstart Tutorial as well as on the Kafka Documentation Website. Next you'll find some common problems and how to solve them.
- Eduroam Wifi Problems (docker does not build)
- Weird error messages
- docker-compose build fails
- No entry found for connection
In some cases the cluster fails to start because it seems that there is an issue with the firewall. Thus, it might be necessary to add an DNS entry into the docker-compose.yml file which is described here and can be found using ipconfig /all
on windows and e.g. cat /etc/resolv.conf
on Linux.
In special cases (particular on Windows) there are strange errors, often related to network connections, which happens sometimes e.g. when the computer came back from hibernation. That means that the docker daemon might be in an inconsistent state, where usually a restart of the daemon solves the problem.
The building using docker-compose fails with the error message: "ERROR: Service 'zookeeper' failed to build: COPY failed: stat ...". In that case the downloaded Kafka binaries are in the wrong directory. The extraction directory for the kafka files should lay within the repository folder. Docker can only copy files which are within the directroy of docker-compose.yml
It may occure, when running a consumer or a producer that you receive the error "No entry found for connection ". In that case you need to add an entry into the hosts file for kafka and zookeeper referencing 127.0.0.1. Details on how to do that you can read about at the Java Producer/Cosumer Example.