Partitioning/Sharding

For very large datasets, or very high query throughput, replication is not sufficient - we need to break the data up into partitions, also known as sharding.
Instead of one shard for writes, we partition/shard the database based on a partition key.
This would increase query throughput and overall system write throughput.

Note - This partitioning is not related to network partition (in CAP Theorem).

Key Terminologies

Terminology	Description
Partition Key	Partitioning would be done based on a partition key. - Hence we need to carefully choose this key to distribute the data evenly b/w partitions.
Hash Function	Hash function helps to determine the partition for a given key. - MD5 as a hash function used in Casandra, MongoDB.
Secondary Indexes	Read more
Consistent Hashing	This handles data sharding with dynamic number of servers.
Unique-ID-Generator	Since NoSQL dbs don't generate primary key automatically, we would have generate unique ID on the application side.