Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added docker standalone support. #12

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

monksy
Copy link

@monksy monksy commented Apr 14, 2020

This adds in support so that you can stand up this connector and have it up and working from a local build. Also, this gives support to customize the subreddit, comments subreddit, and broker via environment variable.

Please note: the blank lines in connect-standalone.properties are neccessary because the environment variables are added to the property files and it's overriding the defaults from what you had.

@Guberlo
Copy link

Guberlo commented Apr 17, 2020

Thank you very much, this helps a lot!
I'd like to ask you how did you created the topics.
When I try to run it, after that I've started two different containers with ZK and Kafka server (everything working), it works but after 2 minutes I get a TimeoutException because no node was assigned.

Should I create a new container like this in order to create a topic?

docker run -e KAFKA_ACTION=create-topic -e KAKFA_SERVER=10.0.100.23 -e KAFKA_TOPIC=$1 --network tap --ip 10.0.100.24 --name kafkaTopic -it tap:kafka

Can you help me?

This is the error that I get:

[2020-04-17 19:23:10,563] INFO Kafka startTimeMs: 1587151390562 (org.apache.kafka.common.utils.AppInfoParser:119)
[2020-04-17 19:25:10,566] INFO [AdminClient clientId=adminclient-1] Metadata update failed (org.apache.kafka.clients.admin.internals.AdminMetadataManager:238)
org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call.
[2020-04-17 19:25:10,575] INFO [AdminClient clientId=adminclient-1] Metadata update failed (org.apache.kafka.clients.admin.internals.AdminMetadataManager:238)
org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call.
[2020-04-17 19:25:10,578] ERROR Stopping due to error (org.apache.kafka.connect.cli.ConnectStandalone:130)
org.apache.kafka.connect.errors.ConnectException: Failed to connect to and describe Kafka cluster. Check worker's broker connection and security properties.
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:64)
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:45)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:83)
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.
at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:58)
... 2 more
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.

Thank you again for your work.

@monksy
Copy link
Author

monksy commented Apr 18, 2020

So what i did is that I stood up KA+ZK with docker compose and defined the host IP:

Those containers also create the topics at start.

version: '2'
services:
  zookeeper:
    image: wurstmeister/zookeeper
    ports:
      - "2181:2181"
  kafka:
    image: wurstmeister/kafka
    ports:
      - "9092:9092"
    
    environment:
      KAFKA_CREATE_TOPICS: "reddit-posts:3:1,reddit-comments:3:1:compact"
      KAFKA_ADVERTISED_HOST_NAME: 192.168.1.244
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

@Guberlo
Copy link

Guberlo commented Apr 20, 2020

Perfect, since I'd like to start my kafka server on this ip 10.0.100.23, I've created this .yaml file.

version: '2'
services:
  zookeeper:
    image: wurstmeister/zookeeper
    ports:
      - "2181:2181"
  kafka:
    image: wurstmeister/kafka
    ports:
      - "9092:9092"
    
    environment:
      KAFKA_CREATE_TOPICS: "reddit-posts:3:1,reddit-comments:3:1:compact"
      KAFKA_ADVERTISED_HOST_NAME: 10.0.100.23
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

Saved it and typed docker-compose up in that folder.
Both Kafka and ZK seems to work fine. Then I moved to the repository folder created the env file as follows:

SUBREDDIT=all
COMMENTS_SUBREDDIT=all
KAFKA_SERVER=localhost:9092

and, after building the image as said, typed the docker run command.
The container starts but, after 2 minutes, I get the same error as before.

Is there something that I am missing/doing wrong?

@monksy
Copy link
Author

monksy commented Apr 20, 2020 via email

@Guberlo
Copy link

Guberlo commented Apr 20, 2020

Everything is working now locally with this configuration:

version: '2'
services:
  zookeeper:
    image: wurstmeister/zookeeper
    ports:
      - "2181:2181"
  kafka:
    image: wurstmeister/kafka
    ports:
      - "9092:9092"

    environment:
      KAFKA_CREATE_TOPICS: "reddit-posts:3:1,reddit-comments:3:1:compact"
      KAFKA_ADVERTISED_HOST_NAME: 192.168.1.23
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://192.168.1.23:9092
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

Thanks for your help.
The last thing I'd like to ask you is where is the output stored, or better where should I see it?
(Sorry, I'm new to kafka and ZK)

---Edit---

I found the location. Everthing is working now!

@C0urante C0urante self-requested a review April 27, 2020 03:12
@C0urante
Copy link
Owner

@monksy my sincerest apologies, turns out I forgot to watch my own repo and thus didn't get notified when you filed this PR and the several feature requests! I'll be reviewing this shortly and will get back to you.

@monksy
Copy link
Author

monksy commented Apr 27, 2020

Awesome, I appreciate it!

Copy link
Owner

@C0urante C0urante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @monksy! I've left some comments about the implementation, but also wanted to take a step back and ask some questions about the goals here.

What's the intended use case? If we just want to make it as easy as possible to get the connector up and running in any environment, we could create a docker-compose file that not only brings up the connector, but also ZooKeeper and Kafka.

If the goal is to get the connector running against a pre-existing Kafka broker, I think we might be able to simplify the deployment process a little further, and also add a richer level of configurability. The environment variable-based approach is good, but with only a pre-selected collection of hard-coded properties that the user can configure, seems a little restrictive. We might consider using something like the built-in templating logic that Confluent provides with their Docker images: create a config file template (such as this one) for the connector config, then use dub to render the template (like this) before starting the connector. We could either bake default environment variables into the image itself, or use the templating logic to set some (disclaimer: not sure if this is possible, haven't personally worked with dub before).

plugin.path=target/components/packages/

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, apologies for any headaches the lack of a newline may have caused!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really just an issue for the extra appending. This won't be needed with the replace script.

@@ -41,8 +41,8 @@
<properties>
<jraw.version>1.1.0</jraw.version>
<junit.version>4.12</junit.version>
<kafka.version>2.0.0</kafka.version>
<kafka.connect.maven.plugin.version>0.11.1</kafka.connect.maven.plugin.version>
<kafka.version>2.4.1</kafka.version>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why is this needed? The only dependency that this property is used for is org.apache.kafka:connect-api, which AFAIK hasn't changed in between 2.0.0 and 2.4.1.

Copy link
Author

@monksy monksy Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought there may be changes between 2.0 and 2.4. I just wanted to get it on the (then latest)

# We're going to override the subreddit, comments, and broker if we have an environment variable set

[[ -v SUBREDDIT ]] && echo "posts.subreddits=$SUBREDDIT" >> /kafka-connect-reddit/kafka-connect-reddit-source.properties
[[ -v COMMENTS_SUBREDDIT ]] && echo "comments.subreddits=$COMMENTS_SUBREDDIT" >> /kafka-connect-reddit/kafka-connect-reddit-source.properties
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion here RE: replacing with a sed script

The environment file that was used looks like this: (To configure the subreddit, comment location, and where your broker can be reached)

```bash
SUBREDDIT=chicagohelicopters
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use POSTS_SUBREDDITS here (with the POST_ prefix, and pluralized)? Seems better to be explicit than implicit, and it'd align with the precedents set elsewhere in the connector for the properties that dictate the posts/comments subreddits.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point.. I didn't consider that.

```bash
SUBREDDIT=chicagohelicopters
COMMENTS_SUBREDDIT=chicagohelicopters
KAFKA_SERVER=<kafka hostname>:9092
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: kind of seems like 9092 is a hard coded port here, maybe change it to:

Suggested change
KAFKA_SERVER=<kafka hostname>:9092
KAFKA_SERVER=localhost:9092

KAFKA_SERVER=<kafka hostname>:9092
```

Note: Change the kafka host name. If the variable in the file or docker run arguements is not present then it will default to the following values:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: spelling, wording

Suggested change
Note: Change the kafka host name. If the variable in the file or docker run arguements is not present then it will default to the following values:
Note: Make sure to change the Kafka host name. Variables not specified in the env file or docker run arguments will default to the following values:

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, might be worth pointing out that people can use the docker.for.mac.localhost hostname if they're using Mac and have a Kafka broker running locally.

monksy and others added 3 commits April 26, 2020 23:56
Co-Authored-By: Chris Egerton <[email protected]>
Co-Authored-By: Chris Egerton <[email protected]>
Co-Authored-By: Chris Egerton <[email protected]>
@monksy
Copy link
Author

monksy commented Apr 27, 2020

Right now I'm not using the Confluent platform with my kafka setup. (I'm not sure what's community and what's not open source)

I was looking to have this stood up as a self-contained docker container in isolation that'll just run this with whatever broker you have available. I don't want to bind it to a docker compose cluster. I would leave that to the person using it. As far as the images that Confluent provides, I'm not well versed at what they provide. I'm open to suggestions.

Co-Authored-By: Chris Egerton <[email protected]>
@C0urante
Copy link
Owner

Okay, that helps clarify things. I'm not very familiar either with Confluent's Docker utilities; I think we might be able to just get away with some copy+paste+tweak here with a template file and an invocation of dub (see the links I posted before for examples). I can probably put together something over the weekend if you don't want to bother; LMK

@monksy
Copy link
Author

monksy commented Apr 29, 2020

I'm willing to work with you this weekend. I'm not sure how much time I will have between now and Saturday.. But if you want to get together to work through this, I would be up for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants