Releases · openzipkin/zipkin

12 Nov 10:24

1.15.0

d0d12fa

Zipkin 1.15

Zipkin 1.15 completes the transition to support 128-bit trace IDs, notably considering high resolution ids when querying and grouping traces.

Regular zipkin usage is unimpacted as this is all behind the scenes. However, the below details will be interesting to some and particularly of note during any transition from 64-128 bit trace IDs.

128-bit trace IDs

Zipkin supports 64 and 128-bit trace identifiers, typically serialized
as 16 or 32 character hex strings. By default, spans reported to zipkin
with the same trace ID will be considered in the same trace.

For example, 463ac35c9f6413ad48485a3953bb6124 is a 128-bit trace ID,
while 48485a3953bb6124 is a 64-bit one.

Note: Span (or parent) IDs within a trace are 64-bit regardless of the
length or value of their trace ID.

Migrating from 64 to 128-bit trace IDs

Unless you only issue 128-bit traces when all applications support them,
the process of updating applications from 64 to 128-bit trace IDs results
in a mixed state. This mixed state is mitigated by the setting
STRICT_TRACE_ID=false, explained below. Once a migration is complete,
remove the setting STRICT_TRACE_ID=false or set it to true.

Here are a few trace IDs the help what happens during this setting.

Trace ID A: 463ac35c9f6413ad48485a3953bb6124
Trace ID B: 48485a3953bb6124
Trace ID C: 463ac35c9f6413adf1a48a8cff464e0e
Trace ID D: 463ac35c9f6413ad

In a 64-bit environment, trace IDs will look like B or D above. When an
application upgrades to 128-bit instrumentation and decides to create a
128-bit trace, its trace IDs will look like A or C above.

Applications who aren't yet 128-bit capable typically only retain the
right-most 16 characters of the trace ID. When this happens, the same
trace could be reported as trace ID A or trace ID B.

By default, Zipkin will think these are different trace IDs, as they are
different strings. During a transition from 64-128 bit trace IDs, spans
would appear split across two IDs. For example, it might start as trace
ID A, but the next hop might truncate it to trace ID B. This would render
the system unusable for applications performing upgrades.

One way to address this problem is to not use 128-bit trace IDs until
all applications support them. This prevents a mixed scenario at the cost
of coordination. Another way is to set STRICT_TRACE_ID=false.

When STRICT_TRACE_ID=false, only the right-most 16 of a 32 character
trace ID are considered when grouping or retrieving traces. This setting
should only be applied when transitioning from 64 to 128-bit trace IDs
and removed once the transition is complete.

See openzipkin/b3-propagation#6 for the status
of known open source libraries on 128-bit trace identifiers.

Cassandra

There's no impact to the cassandra (Cassandra 2.x) schema. The experimental cassandra3 schema has changed and needs to be recreated.

Elasticsearch

When STRICT_TRACE_ID=false, the indexing template will be less efficient as it tokenizes trace IDs. Don't set STRICT_TRACE_ID=false unless you really need to.

MySQL

There are no schema changes since last versions, but you'll likely want to add indexes in consideration of 128bit trace IDs.

ALTER TABLE zipkin_spans ADD INDEX(`trace_id_high`, `trace_id`, `id`);
ALTER TABLE zipkin_spans ADD INDEX(`trace_id_high`, `trace_id`);
ALTER TABLE zipkin_annotations ADD INDEX(`trace_id_high`, `trace_id`, `span_id`);
ALTER TABLE zipkin_annotations ADD INDEX(`trace_id_high`, `trace_id`);

Java Api

The STRICT_TRACE_ID variable above corresponds to zipkin.storage.StorageComponent.Builder.strictTraceId. Those using storage components directly will want to set this to false under similar circumstances to those described above.

We've added methods to SpanStore, in support of high-resolution gets. Traces with 64-bit ids are retrieved by simply passing 0 as traceIdHigh.

  @Nullable
  List<Span> getTrace(long traceIdHigh, long traceIdLow);

  @Nullable
  List<Span> getRawTrace(long traceIdHigh, long traceIdLow);

Assets 2

28 Oct 09:32

codefromthecrypt

1.14.4

cb6c667

Zipkin 1.14

Zipkin 1.14 introduces support for 128-bit trace identifiers

Most zipkin sites store traces for a limited amount of time (like 2 days) and also trace a small percentage of operations (via sampling). For these reasons and also those of simplicity, 64-bit trace identifiers have been the norm since zipkin started over 4 years ago.

Starting with Zipkin 1.14, 128-bit trace identifiers are also supported. This can be useful in sites that have very large traffic volume, persist traces forever, or are re-using externally generated 128-bit IDs as trace IDs. You can also use 128-bit trace ids to interop with other 128-bit systems such as Google Stackdriver Trace. Note: span IDs within a trace are still 64-bit.

When 128-bit trace ids are propagated, they will be twice as long as before. For example, the X-B3-TraceId header will hold a 32-character value like 163ac35c9f6413ad48485a3953bb6124. Prior to Zipkin 1.14, we updated all major tracing libraries to silently truncate long trace ids to 64-bit. With the example noted, its 64-bit counterpart would be 48485a3953bb6124. For the foreseeable future, you will be able to lookup a trace by either its 128-bit or 64-bit ID. This allows you to upgrade your instrumentation and environment in steps.

Should you want to use 128-bit tracing today, you'll need to update to latest Zipkin, and if using MySQL, issue the following DDL update:

ALTER TABLE zipkin_spans ADD `trace_id_high` BIGINT NOT NULL DEFAULT 0;
ALTER TABLE zipkin_annotations ADD `trace_id_high` BIGINT NOT NULL DEFAULT 0;
ALTER TABLE zipkin_spans
   DROP INDEX trace_id,
   ADD UNIQUE KEY(`trace_id_high`, `trace_id`, `id`) COMMENT 'ignore insert on duplicate';
ALTER TABLE zipkin_annotations
   DROP INDEX trace_id,
   ADD UNIQUE KEY(`trace_id_high`, `trace_id`, `span_id`, `a_key`, `a_timestamp`) COMMENT 'Ignore insert on duplicate';

Next, you'll need to use a library that supports generating 128-bit ids. The first two to support this are zipkin-go-opentracing v0.2 and Brave (java) v3.5. The supporting change in thrift is a new trace_id_high field.

If you have any further questions on this feature, reach out to us on gitter: https://gitter.im/openzipkin/zipkin

Assets 2

05 Oct 03:14

codefromthecrypt

1.13.0

ff13a9c

Zipkin 1.13

Zipkin 1.13 most notably refines our Elasticsearch code. It is now easier for us to tune as self-tracing is built-in.

For example, let's say I created a domain in Amazon's Elasticsearch service named 'zipkin'. As I'm doing testing, I'll run our Docker image and share my AWS credentials with it.

$ docker run -d -p 9411:9411 \
  -e SELF_TRACING_ENABLED=true \
  -e STORAGE_TYPE=elasticsearch -e ES_AWS_DOMAIN=zipkin \
  -v $HOME/.aws:/root/.aws:ro \
  openzipkin/zipkin

Once zipkin starts up, SELF_TRACING_ENABLED=true indicates that it should trace each api request. As I click in the UI, more traces appear under the service zipkin-server. Here's one which shows the overall latency of a request (from my laptop to amazon), for a zipkin trace search.

With tools like this, we can use Zipkin to improve zipkin.

The Elasticsearch experience was created by @anuraaga and extended to Amazon by @sethp-jive. The tracing functionality is thanks to our Brave OkHttp interceptor initially written by @tburch. Watch for more news as we head towards Elasticsearch 5 compatibility.

Assets 2

12 Sep 12:37

codefromthecrypt

1.11.0

3d1d74c

Zipkin 1.11

Zipkin 1.11 allows you to see instrumented clients in the dependency view. It also fixes a search collision problem.

Before, the dependency view (ex http://your_host:9411/dependency) presented a server-centric diagram. This worked well enough as traces usually start at the first server. Especially with new projects like zipkin-js, client-originated traces are becoming more common. For example, the trace could start in your web browser instead of on a server. Zipkin's dependency linker is now trained to look for client send annotations in the root span, and if present, add them to the far-left of the dependency graph. Thanks to @rogeralsing for reporting.

We also fixed a search bug where a query like http.method=GET matched against any service in a trace as opposed to the service specified in the UI. This affected all storage types except cassandra and is now fixed.

Note: While seemingly simple, this smoked out a latent problem in our Elasticsearch indexing template. Please re-index at your earliest convenience, or drop the index and let Zipkin recreate it.

Assets 2

12 Sep 12:24

codefromthecrypt

1.10.0

09bd1d1

Zipkin 1.10

Zipkin 1.10 addresses a couple long-term problems relating to span timestamp and duration.

Firstly, we no longer attempt to support duration queries on the "cassandra" storage type. Cassandra 2.2+ doesn't support SASI indexing, and trying to work around that resulted in a feature most couldn't use. @michaelsembwever from The Last Pickle has a more sustainable solution in mind that uses Cassandra 3.8+. Please look for announcements on the experimental cassandra3 storage type.

Next is something that applies to all storage types. When trace instrumentation don't record Span.timestamp and duration, the Zipkin server tries to guess by looking at annotations. Previously, when we guessed wrong, the trace would render strangely. We now guess much more conservatively so as to avoid this.

Here's the impact:

Span duration is no longer derived by collectors, as it is often wrong. Duration queries won't work unless traces reported to zipkin include duration.
Span timestamp is derived only when needed, usually to support indexing
Span timestamp and duration are still backfilled at query time, as otherwise the UI wouldn't work.

Note: The Span.timestamp and duration fields were added a year ago, but many tracers still don't record them. We hope our documentation on how to record timestamp and duration will help ease the task of updating them. If you use a tracer that doesn't yet record Span.timestamp and duration, please raise an issue or PR to the corresponding repository so that it is eventually fixed.

Assets 2

31 Aug 02:06

codefromthecrypt

1.8.0

8038a1c

Zipkin 1.8

Zipkin 1.8 is a library change focused on encoding performance. If you are instrumenting apps and use Zipkin's Codec, you'll want to upgrade.

Span encoding has been completely rewritten in order to get common-case overhead in microsecond or less range.

Zipkin 1.7 Codec.writeSpan() vs libthrift (pace car)

CodecBenchmarks.writeClientSpan_json_zipkin       avgt   15  17.131 ± 0.446  us/op
CodecBenchmarks.writeClientSpan_thrift_libthrift  avgt   15   1.952 ± 0.043  us/op
CodecBenchmarks.writeClientSpan_thrift_zipkin     avgt   15   0.996 ± 0.021  us/op
CodecBenchmarks.writeLocalSpan_json_zipkin        avgt   15  10.124 ± 0.177  us/op
CodecBenchmarks.writeLocalSpan_thrift_libthrift   avgt   15   1.168 ± 0.016  us/op
CodecBenchmarks.writeLocalSpan_thrift_zipkin      avgt   15   0.593 ± 0.010  us/op
CodecBenchmarks.writeRpcSpan_json_zipkin          avgt   15  43.495 ± 1.086  us/op
CodecBenchmarks.writeRpcSpan_thrift_libthrift     avgt   15   4.878 ± 0.046  us/op
CodecBenchmarks.writeRpcSpan_thrift_zipkin        avgt   15   2.666 ± 0.018  us/op
CodecBenchmarks.writeRpcV6Span_json_zipkin        avgt   15  49.759 ± 0.867  us/op
CodecBenchmarks.writeRpcV6Span_thrift_libthrift   avgt   15   5.390 ± 0.073  us/op
CodecBenchmarks.writeRpcV6Span_thrift_zipkin      avgt   15   3.147 ± 0.026  us/op

Zipkin 1.8 Codec.writeSpan() vs libthrift (pace car)

CodecBenchmarks.writeClientSpan_json_zipkin       avgt   15   1.445 ± 0.036  us/op
CodecBenchmarks.writeClientSpan_thrift_libthrift  avgt   15   1.951 ± 0.014  us/op
CodecBenchmarks.writeClientSpan_thrift_zipkin     avgt   15   0.433 ± 0.011  us/op
CodecBenchmarks.writeLocalSpan_json_zipkin        avgt   15   0.813 ± 0.010  us/op
CodecBenchmarks.writeLocalSpan_thrift_libthrift   avgt   15   1.191 ± 0.016  us/op
CodecBenchmarks.writeLocalSpan_thrift_zipkin      avgt   15   0.268 ± 0.004  us/op
CodecBenchmarks.writeRpcSpan_json_zipkin          avgt   15   3.606 ± 0.068  us/op
CodecBenchmarks.writeRpcSpan_thrift_libthrift     avgt   15   5.134 ± 0.081  us/op
CodecBenchmarks.writeRpcSpan_thrift_zipkin        avgt   15   1.384 ± 0.078  us/op
CodecBenchmarks.writeRpcV6Span_json_zipkin        avgt   15   3.912 ± 0.115  us/op
CodecBenchmarks.writeRpcV6Span_thrift_libthrift   avgt   15   5.488 ± 0.098  us/op
CodecBenchmarks.writeRpcV6Span_thrift_zipkin      avgt   15   1.323 ± 0.014  us/op

Why encoding speed matters

Applications that report to Zipkin typically record timing information and metadata on the calling thread. After the operation completes, this is encoded into a Span and scheduled to go out of process, usually via http or Kafka. When the encoding overhead is measurable, it can confuse timing information, particularly when operations are in single-digit or less milliseconds.

For example, if a local operation takes 400us, and your encoding overhead is 40us, there will be a 10% gap between the end of one span and the start of the next. This will notably skew the duration of the parent, particularly if there are a lot of spans like this. When encoding overhead in single-digit microseconds or less, this problem is less noticeable.

Assets 2

18 Aug 13:32

codefromthecrypt

1.7.0

ebbbcc7

Zipkin 1.7

Zipkin 1.7 has a lot to offer, thanks to users for telling us what they'd like.

@dragontree101 wanted to be able to know which version of zipkin his server was running. @shakuzen landed the /info endpoint, which prints out something like this:

{
  "zipkin": {
    "version": "1.7.0"
  }
}

@mikewrighton wants to run zipkin-ui from a different host than zipkin-server. @hyleung spiked a new variable you can use to control cross-origin policy. For example, you can export ZIPKIN_QUERY_ALLOWED_ORIGINS=http://foo.bar.com, if you are the lucky owner of foo.bar.com!

@dan-tr uses Zipkin with Elasticsearch, but found our microsecond timestamps didn't work out-of-box with Kibana. He suggested we add a field timestamp_millis, and we did! because it was a smart idea.

@ivansenic works on an APM called inspectIT. He rightly noted there's still a ton of Java 6 VMs out there that need to be traceable by Java agents. Now, zipkin.jar is an agent-friendly, 152k jar full of Java 6 bytecode (still with no dependencies!).

We're occasionally asked where javadocs are published. Thanks to @abesto's automation expertise, historical javadocs can now be found here http://zipkin.io/zipkin/

Finally, we're looking for incremental and compatible ways to improve zipkin's model, particularly for asynchronous activity (like tracing Kafka). If you are interested in steering us, please comment on..

Thanks for keeping with us,
OpenZipkin

Assets 2

31 Jul 21:07

codefromthecrypt

1.6.0

f54cbe6

Zipkin 1.6

Zipkin 1.6 server has been updated to use Spring Boot 1.4.

We've also corrected default values around the UI, which should lead to better search performance. Most notably, startTs defaults to 1 hour back instead of 7 days back. #1212

Note: You can reset the lookback value to whatever you like. For example, you might set JAVA_OPTS="-Dzipkin.ui.default-lookback=86400000" for 1 day. Settings like this are documented in the README

Assets 2

23 Jul 03:21

codefromthecrypt

1.5.0

e252bc3

Zipkin 1.5

Zipkin 1.5 is all about the dependency view in the UI.

Many of you may have seen the dependency tab, and never any data in it. This would be the case if you were running Cassandra or Elasticsearch.

What you should have seen is a diagram showing the relative amount of calls between services, something like this (except with your services present!):

Zipkin 1.5 includes support to populate the data under this screen for all storage options (mysql, cassandra and elasticsearch).

The job that produces this data is called zipkin-dependencies. Zipkin Dependencies aggregates links between services into a daily bucket. This means you should run it daily, like a batch job (eventhough underneath it is spark). In fact, our docker image includes cron setup to do that for you!

For example, here's a run against a small cassandra DB using spark standalone (default):

$ STORAGE_TYPE=cassandra CASSANDRA_CONTACT_POINTS=192.168.99.100 java -jar zipkin-dependencies.jar
Running Dependencies job for 2016-07-23: 1469232000000000 ≤ Span.timestamp 1469318399999999
11:05:09.653 [main] WARN  o.a.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
11:05:09.706 [main] WARN  org.apache.spark.util.Utils - Your hostname, acole resolves to a loopback address: 127.0.0.1; using 192.168.1.10 instead (on interface en0)
11:05:09.706 [main] WARN  org.apache.spark.util.Utils - Set SPARK_LOCAL_IP if you need to bind to another address
11:05:11.078 [main] WARN  com.datastax.driver.core.NettyUtil - Found Netty's native epoll transport, but not running on linux-based operating system. Using NIO instead.
Saved with day=2016-07-23
Dependencies: [{"parent":"brave-resteasy-example","child":"brave-resteasy-example","callCount":1}, {"parent":"zipkin-server","child":"cassandra","callCount":14}]

Upgrading

If you are using cassandra or elasticsearch, you should upgrade to zipkin 1.5, but there's no schema-related change required.

If you are using mysql, you'll need to add a new table for this to work. Here's a copy/paste of the DDL for your convenience.

CREATE TABLE IF NOT EXISTS zipkin_dependencies (
  `day` DATE NOT NULL,
  `parent` VARCHAR(255) NOT NULL,
  `child` VARCHAR(255) NOT NULL,
  `call_count` BIGINT
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED;

ALTER TABLE zipkin_dependencies ADD UNIQUE KEY(`day`, `parent`, `child`);

Credits

The spark job was originally written by @yurishkuro, based on a hadoop job originally written by @eirslett years ago. IOTW, the job itself isn't new, rather the accessibility of it. Before, it only worked with cassandra and wasn't published to maven central or integrated with docker. Now, it should be easy for anyone to include this functionality into their deployment.

Assets 2

13 Jul 07:17

codefromthecrypt

1.4.0

2df0845

Zipkin 1.4

Zipkin 1.4 most notably includes the ability to store and show IPv6 addresses associated with services.

Endpoint.ipv6

Zipkin span data can now include an ipv6 address of an Endpoint, binary encoded in thrift or text-encoded in json. If using MySQL, you need to add a column to store this. No action is needed in Cassandra or Elasticsearch. See #1178

Operational Improvements

Adds SCRIBE_ENABLED: set to false to disable scribe
Adds SELF_TRACING_SAMPLE_RATE: set to a low value like 0.001 to safely self-trace production

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

128-bit trace IDs

Migrating from 64 to 128-bit trace IDs

Cassandra

Elasticsearch

MySQL

Java Api

Zipkin 1.14 introduces support for 128-bit trace identifiers

Why encoding speed matters

Upgrading

Credits

Endpoint.ipv6

Operational Improvements

Releases: openzipkin/zipkin

Zipkin 1.15

128-bit trace IDs

Migrating from 64 to 128-bit trace IDs

Cassandra

Elasticsearch

MySQL

Java Api

Zipkin 1.14

Zipkin 1.14 introduces support for 128-bit trace identifiers

Zipkin 1.13

Zipkin 1.11

Zipkin 1.10

Zipkin 1.8

Why encoding speed matters

Zipkin 1.7

Zipkin 1.6

Zipkin 1.5

Upgrading

Credits

Zipkin 1.4

Endpoint.ipv6

Operational Improvements