Skip to content
This repository has been archived by the owner on Nov 15, 2020. It is now read-only.

Latest commit

 

History

History
230 lines (157 loc) · 9.38 KB

README.md

File metadata and controls

230 lines (157 loc) · 9.38 KB

Kinesis Output Plugin

Build Status Gem info

This is a plugin for Logstash. It will send log records to a Kinesis stream, using the Kinesis Producer Library (KPL).

This version is intended for use with Logstash 5.x. For plugin versions compatible with older versions of Logstash:

Configuration

Minimum required configuration to get this plugin chugging along:

output {
  kinesis {
    stream_name => "logs-stream"
    region => "ap-southeast-2"
  }
}

This plugin accepts a wide range of configuration options, most of which come from the underlying KPL library itself. View the full list of KPL configuration options here.

Please note that configuration options are snake_cased instead of camelCased. So, where KinesisProducerConfiguration offers a setMetricsLevel option, this plugin accepts a metrics_level option.

Dynamic stream name

You can dictate the name of the stream to send a record to, based on data in the record itself.

output {
  kinesis {
    stream_name => "%{myfield}-%{myotherfield}"
  }
}

Metrics

The underlying KPL library defaults to sending CloudWatch metrics to give insight into what it's actually doing at runtime. It's highly recommended you ensure these metrics are flowing through, and use them to monitor the health of your log shipping.

If for some reason you want to switch them off, you can easily do so:

output {
  kinesis {
    # ...

    metrics_level => "none"
  }
}

If you choose to keep metrics enabled, ensure the AWS credentials you provide to this plugin are able to write to Kinesis and write to CloudWatch.

Authentication

By default, this plugin will use the AWS SDK DefaultAWSCredentialsProviderChain to obtain credentials for communication with the Kinesis stream (and CloudWatch, if metrics are enabled). The following places will be checked for credentials:

  • AWS_ACCESS_KEY_ID / AWS_SECRET_KEY environment variables available to the Logstash prociess
  • ~/.aws/credentials credentials file
  • Instance profile (if Logstash is running in an EC2 instance)

If you want to provide credentials directly in the config file, you can do so:

output {
  kinesis {
    # ...

    access_key => "AKIAIDFAKECREDENTIAL"
    secret_key => "KX0ofakeLcredentialsGrightJherepOlolPkQk"

    # You can provide specific credentials for CloudWatch metrics:
    metrics_access_key => "AKIAIDFAKECREDENTIAL"
    metrics_secret_key => "KX0ofakeLcredentialsGrightJherepOlolPkQk"
  }
}

If access_key and secret_key are provided, they will be used for communicating with Kinesis and CloudWatch. If metrics_access_key and metrics_secret_key are provided, they will be used for communication with CloudWatch. If only the metrics credentials were provided, Kinesis would use the default credentials provider (explained above) and CloudWatch would use the specific credentials. Confused? Good!

Using STS

You can also configure this plugin to use AWS STS to "assume" a role that has access to Kinesis and CloudWatch. If you use this in combination with EC2 instance profiles (which the defaults credentials provider explained above uses) then you can actually configure your Logstash to write to Kinesis and CloudWatch without any hardcoded credentials.

output {
  kinesis {
    # ...

    role_arn => "arn:aws:iam::123456789:role/my-kinesis-producer-role"

    # You can also provide a specific role to assume for CloudWatch metrics:
    metrics_role_arn => "arn:aws:iam::123456789:role/my-metrics-role"
  }
}

You can combine role_arn / metrics_role_arn with the explicit AWS credentials config explained earlier, too.

All this stuff can be mixed too - if you wanted to use hardcoded credentials for Kinesis, but then assume a role via STS for accessing CloudWatch, you can do that. Vice versa would work too - assume a role for accessing Kinesis and then providing hardcoded credentials for CloudWatch. Make things as arbitrarily complicated for yourself as you like ;)

Building a partition key

Kinesis demands a partition key be provided for each record. By default, this plugin will provide a very boring partition key of -. However, you can configure it to compute a partition key from fields in your log events.

output {
  kinesis {
    # ...
    event_partition_keys => ["[field1]", "[field2]"]
  }
}

Randomised partition keys

If you don't care about the ordering of your logs in the Kinesis stream, you might want to use a random partition key. This way, your log stream will be more or less uniformly spread across all available shards in the Kinesis stream.

output {
  kinesis {
    randomized_partition_key => true
  }
}

Record Aggregation

The Amazon KPL library can aggregate your records when writing to the Kinesis stream. This behaviour is configured to be enabled by default.

If you are using an older version of the Amazon KCL library to consume your records, or not using KCL at all, your consumer application(s) will probably not behave correctly. See the matrix on this page for more info, and read more about de-aggregating records here.

If you wish to simply disable record aggregation, that's easy:

output {
  kinesis {
    aggregation_enabled => false
  }
}

Backpressure

This plugin will enforce backpressure if the records passing through Logstash's pipeline are not making it up to Kinesis fast enough. When this happens, Logstash will stop accepting records for input or filtering, and a warning will be emitted in the Logstash logs.

By default, the threshold for blocking is 1000 pending records. If you want to throw more memory / CPU cycles at buffering lots of stuff before it makes it to Kinesis, you can control the high-watermark:

output {
  kinesis {
    max_pending_records => 10000
  }
}

Logging configuration

The underlying KPL uses SLF4J for logging and binding for Log4j (used by Logstash) is included in the plugin package. Thus the logging levels can be controlled with the log4j2.properties file provided by Logstash.

As the KPL might be too noisy with INFO level, you might want to dial it down by following configuration in log4j2.properties:

...
logger.kinesis.name = com.amazonaws.services.kinesis
logger.kinesis.level = WARN
logger.kinesis.additivity = false
logger.kinesis.appenderRef.console.ref = console
...

Known Issues

Alpine Linux is not supported

This Logstash plugin uses the KPL daemon under the covers. This daemon is linked against, and specifically requires, glibc. See awslabs/amazon-kinesis-producer#86.

Noisy shutdown

During shutdown of Logstash, you might get noisy warnings like this:

[pool-1-thread-6] WARN com.amazonaws.services.kinesis.producer.Daemon - Exception during updateCredentials
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at com.amazonaws.services.kinesis.producer.Daemon$5.run(Daemon.java:316)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

This is caused by awslabs/amazon-kinesis-producer#10.

Noisy warnings about Error during socket read

While your Logstash instance is running, you may occasionally get a warning on stderr that looks like this:

[2015-10-20 06:31:08.441640] [0x00007f36c9402700] [error] [io_service_socket.h:229] Error during socket read: End of file; 0 bytes read so far (kinesis.us-west-1.amazonaws.com:443)

This is being tracked in awslabs/amazon-kinesis-producer#17. This log message seems to just be noise - your logs should still be delivering to Kinesis fine (but of course, you should independently verify this!).

Contributions

Are more than welcome. See CONTRIBUTING.md

License

Apache License 2.0