- For very large datasets, or very high query throughput, replication is not sufficient - we need to break the data up into partitions, also known as sharding.
- Instead of one shard for writes, we partition/shard the database based on a partition key.
- This would increase query throughput and overall system write throughput.
Note - This partitioning is not related to network partition (in CAP Theorem).
Terminology | Description |
---|---|
Partition Key | Partitioning would be done based on a partition key. - Hence we need to carefully choose this key to distribute the data evenly b/w partitions. |
Hash Function | Hash function helps to determine the partition for a given key. - MD5 as a hash function used in Casandra, MongoDB. |
Secondary Indexes | Read more |
Consistent Hashing | This handles data sharding with dynamic number of servers. |
Unique-ID-Generator | Since NoSQL dbs don't generate primary key automatically, we would have generate unique ID on the application side. |