Skip to content

Latest commit

 

History

History
37 lines (30 loc) · 3.57 KB

File metadata and controls

37 lines (30 loc) · 3.57 KB

Partitioning/Sharding

  • For very large datasets, or very high query throughput, replication is not sufficient - we need to break the data up into partitions, also known as sharding.
  • Instead of one shard for writes, we partition/shard the database based on a partition key.
  • This would increase query throughput and overall system write throughput.

Note - This partitioning is not related to network partition (in CAP Theorem).

Key Terminologies

Terminology Description
Partition Key Partitioning would be done based on a partition key.
- Hence we need to carefully choose this key to distribute the data evenly b/w partitions.
Hash Function Hash function helps to determine the partition for a given key.
- MD5 as a hash function used in Casandra, MongoDB.
Secondary Indexes Read more
Consistent Hashing This handles data sharding with dynamic number of servers.
Unique-ID-Generator Since NoSQL dbs don't generate primary key automatically, we would have generate unique ID on the application side.

Kafka Cluster

Other Clusters & Examples

References