CockroachDB is a new distributed database engine developed at Cockroach Labs.
Jepsen is a testing framework for networked databases, developed by Kyle 'Aphyr' Kingsbury to exercise and validate the claims to consistency made by database developers or their documentation.
The tests run concurrent operations to some shared data from different nodes in a CockroachDB cluster and checks that the operations preserve the consistency properties defined in each test. During the tests, various combinations of nemeses can be added to interfere with the database operations and exercise the database's consistency protocols.
lein run test --test sets --nemesis subcritical-skews
See lein run test --help
for full options, and full.sh
for an example. all
is a valid value for a test or nemesis, and expands to all known tests or nemeses, respectively.
The following tests are implemented:
register
concurrent atomic updates to a shared register;
sets
concurrent unique appends to a shared table;
monotonic
concurrent ordered appends with carried dependency;
bank
concurrent transfers between rows of a shared table;
bank-multitable
concurrent transfers between rows of different tables.
g2
concurrent read-write cycles over a common predicate.
sequential
sequential consistency over known groups of keys.
comments
verifies that non-concurrent inserts are visible in order.
Nemeses:
none
no nemesis
small-skews
clock skews of ~100 ms
subcritical-skews
clock skews of ~200 ms
critical-skews
clock skews of ~250 ms
big-skews
clock skews of ~500 ms
huge-skews
clock skews of ~5 s
strobe-skews
strobes the clock at a high frequency between 0 and 200 ms ahead
parts
random network partitions
majority-ring
random network partition where each node sees a majority of other nodes
start-stop-2
db processes on 2 nodes are stopped and restarted with SIGSTOP/SIGCONT
start-kill-2
db processes on 2 nodes are stopped with SIGKILL and restarted from scratch
split
periodically splits the keyrange at a randomly selected key
slow
delays network packets
Jepsen will test every combination of nemesis
and nemesis2
, except where
both nemeses would be identical, or both would introduce clock skew.
One table contains a single row.
Jepsen sends concurrently to different nodes either read, write or atomic compare-and-swap operations.
Each node may report ok, the operation is known to have succeeded; fail, the operation is known to have failed; and unknown otherwise (e.g. the connection dropped before the answer could be read).
At the end, a linearizability checker validates that the trace of reads as observed from each client is compatible with a linearizable history of across all nodes.
One shared table of values.
Jepsen sends appends of unique values via different nodes over time.
Each node may report ok, the operation is known to have succeeded; fail, the operation is known to have failed; and unknown otherwise (e.g. the connection dropped before the answer could be read).
At the end, a uniqueness checker validates that no value was added two or more times; that all known-ok additions are indeed present in the table; and that all known-fail additions are indeed not present in the table.
Several tables of (value, timestamp) pairs.
Jepsen sends atomic transactions that append the last known max value + 1 and the current db's now(), concurrently to different nodes over time.
Each node may report ok, the operation is known to have succeeded; fail, the operation is known to have failed; and unknown otherwise (e.g. the connection dropped before the answer could be read).
At the end, a monotonic checker validates that no value was added two or more times; that all known-ok additions are indeed present in the table; that all max values are in the same order as the now() timestamps.
One shared table contains multiple bank accounts, one per row (or one per table in the "bank-multitable" variant).
Jepsen sends concurrently read and transfer operations via different nodes to/between randomly selected accounts.
Each node may report ok, the operation is known to have succeeded; fail, the operation is known to have failed; and unknown otherwise (e.g. the connection dropped before the answer could be read).
At the end, the checker validates that the sum of the remaining balances of all accounts is the same as the initial sum.
Cockroach does not offer strict serializability. However, as a consequence of its implementation of hybrid logical clocks, all transactions on a particular node should observe a strong real-time order. So long as CockroachDB clients are sticky (e.g. bound to the same server), we expect those clients should observe sequential consistency as well: the effective order of transactions should be consistent with the order on every client.
To verify this, we have a single client perform a sequence of independent transactions, inserting k1, k2, ..., kn into different tables. Concurrently, a different client attempts to read each of kn, ..., k2, k1 in turn. Because all inserts occur from the same process, they must also be visible to any single process in that order. This implies that once a process observes kn, any subsequent read must see kn-1, and by induction, all smaller keys.
Transactions select a predicate over two tables, then insert to one or the other if no rows are present. Serializability implies that at most one transaction may commit per predicate.
This test demonstrates a known strict serializability violation in Cockroach and is intended to fail. It performs a sequence of concurrent inserts to a table, and selects all rows from that table periodically. We look for cases where one record was inserted strictly before another (e.g. they are not concurrent), and the latter value is visible to a read without the former.
Overview: One computer will run the Jepsen framework, and will send requests to other computers running the CockroachDB database. After a while, the trace of accesses is analyzed and checked for inconsistencies. If the database does its job properly, a Jepsen checker will report that no inconsistencies were found; otherwise it will indicate at which point the database started to perform invalid operations. Optionally, some performance metrics are also reported at the end.
How to get there:
-
ensure Sun/Oracle's JDK 8 is installed on your Jepsen host, and install leiningen__.
.. __: http://leiningen.org/
-
configure a 5-node CockroachDB cluster, for example using the configuration in CockroachDB's
cloud/aws
subdirectory... note:: As of Jan 27th 2016, some additional tweaking may be required on top of the default configuration to get the database up and running. If in doubt, ping us on Gitter__ or our
issue tracker
__... __: https://gitter.im/cockroachdb/cockroach .. __: https://github.com/cockroachdb/cockroach/issues
.. note:: If you cannot use AWS or this pre-packaged configuration, you can set up your cluster manually as well. The Jepsen test code assumes Debian Jessie on all nodes, CockroachDB available to run from a user account called
ubuntu
, and a SSH server on each node reachable from the Jepsen host. -
Ensure that all nodes can resolve each other's hostnames
-
tweak your SSH configuration on both your cluster nodes and Jepsen host so that you can log in password-less to the
ubuntu
account on each node from the Jepsen host. (suggestion: create passwordless key pairs, populateauthorized_keys
where needed, and runssh-agent
/ssh-add
on the Jespen host).Note: the "ubuntu" account should be able to run sudo without a password.
(You can tweak the name of the user account in
src/jepsen/cockroach.clj
) -
If you are using SSL (the default), you need to:
-
ensure that your CockroachDB nodes all have their certificates set up to run in secure mode.
-
copy the client and CA certificates and client key to your jepsen master host.
-
generate a Java-ready encoding of the client key using the following command::
openssl pkcs8 -topk8 -inform PEM -outform DER
-in .../node.client.key -out .../node.client.pk8 -nocrypt -
indicate the path to the client and CA certs and key in the configuration variables in
src/jepsen/cockroach.clj
.
To disable SSL instead, set
insecure
to false insrc/jepsen/cockroach.clj
. -
-
run
lein run test ...
from thecockroachdb
test directory. This will run the Jepsen tests and exercise the database. -
Wait for the results of the tests. There will be multiple reports, one per test. Each report ends with detailed data structure containing the test's results, including either
:valid? true
or:valid? false
depending on whether inconsistencies were found. -
Optionally, after the tests complete, collect additional outputs in the subdirectory
cockroachdb/store/latest/
.
You can use Jepsen's built in web server by running lein run serve
, or...
A small utility is provided to navigate the results using a web browser.
To use this, point a HTTP server to cockroachdb/store
and
enable CGI handling for cockroachdb/stores/cgi-bin
; you can
also achieve this simply by running the Python embedded web server with::
cd cockroachdb/store && python3 -m http.server --cgi 8080
Then navigate to /cgi-bin/display.py
in the web browser to start
the interface.