Skip to content

Voldemort thin client

shingon edited this page Dec 3, 2012 · 7 revisions

Why Thin Client ?

The current (thick) client used in the client side routing mode in Voldemort has a number of issues:

  • The cluster topology and store metadata has to be synchronized on the client side. This makes bootstrapping (a process by which the client obtains this metadata from a random server node) an expensive operation. In addition, it introduces the additional burden of fetching the latest changes or bouncing the client when such an event occurs.
  • Complicated connection management which seems to have operational and performance implications.
  • Existing client handles all the replication (including cross zone) which is inefficient.
  • The client design is pretty complicated, with multiple levels of nesting.
  • Client side config parameters are far too many (sometimes confusing) which causes an operational overhead.

Although some of these issues can be fixed, it requires an iterative release to all the clients. For instance, we have recently implemented an auto-bootstrapper mechanism which tracks changes in cluster.xml happening on the cluster and automatically re-bootstraps the client which helps in doing cluster maintenance. However, in order to reap the benefits, it is essential that all the clients using the cluster pick up this new release. This is a very costly operation when a large number of clients are involved.

Architectural details

Isolating the thin client

The different layers of the thick client are broken down into two key pieces:

  1. Thin client
  2. Coordinator

1) Thin client:

The Thin client is intended to be a RESTful library that can be incorporated in any given application that wants to use Voldemort. Currently, we're targeting a thin Java client only. But the beauty of this segregation is that, building a RESTful client in any other language becomes very easy. The different layers in this component are:

  1. Monitoring: For measuring the different metrics over time (eg: JMX Mbeans)
  2. Inconsistency Resolver: The default (timestamp based) or user defined resolution mechanism
  3. Serialization / De-serialization : To send / receive data on the wire
  4. Compression / de-compression
  5. HTTP client: To handle the request to and the response from the coordinator (HTTP based).

2) Coordinator:

The coordinator is a logical entity which accepts Voldemort requests and handles the main routing logic. It may be deployed as a set of machines in front of the cluster, or part of each storage node. The coordinators are symmetric - in the sense each coordinator can accept a request from any client intended for any store. The different layers in this component are:

  1. HTTP handler: To handle requests from and response to the thin client
  2. Stat tracking store: Similar to the monitoring layer
  3. Routed store: To handle the actual Voldemort routing

Essentially the Coordinator maintains a set of thick clients for different stores owned by this cluster. After receiving a HTTP request, it determines the target store name and hands off that request to the corresponding thick client.

Routing between the thin client and the coordinator

Routing requests from the thin client to the coordinator has two main challenges

i) Load balancing ii) Topology transparent routing

i) Load balancing:

This is an easier problem to solve. The requests from different clients should ideally be sent to the available set of coordinators in a round robin manner. Any of the available (preferably software) load balancers would do.

ii) Topology transparent routing

The challenge here is that the available set of coordinators might keep changing. We might add more coordinators (to the original set) in the future to handle the increase traffic (related to cluster expansion as well). The thin clients should be able to detect such changes transparently - without the application knowing about this.

At LinkedIn, there is a proprietary solution that resolves the above Open Issue. Unfortunately, the proprietary solution is not open sourced (yet). We will keep this thread updated. In the meantime, the general assumption is that the load balancers would need to be re-initialized in case of a change in the coordinator membership.

Clone this wiki locally