Skip to content

Joeavaib/LSMTree_Basic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LSM-Tree Basic

A high-performance, production-ready, Log-Structured Merge-Tree (LSM-Tree) Key-Value Store implemented in modern C++ (C++17).

This project was built from the ground up to provide a robust foundation for a persistent, embedded key-value store, drawing architectural inspiration from RocksDB and LevelDB.

🚀 Features

  • Lock-Free In-Memory Store (MemTable): Utilizes a highly concurrent SkipList for fast reads and lock-free point-in-time snapshots.
  • Custom Arena Allocator: Drastically reduces memory fragmentation and the overhead of system allocations (malloc/new) during MemTable operations.
  • Write-Ahead Logging (WAL): Ensures 100% crash consistency and data durability. All writes are appended to a WAL before being acknowledged.
  • SSTable Format (SST-P v1.0):
    • Prefix Compression: Reduces the storage footprint of data blocks by sharing common key prefixes.
    • Bloom Filters: Minimizes disk I/O by probabilistically avoiding reads for non-existent keys (optimized to ~1% false-positive rate).
    • Block-based Indexing: Enables fast binary-search lookups within tables.
  • Hardware-Accelerated Checksums: Protects all blocks and logs against silent data corruption using CRC32C.
  • Multi-Version Concurrency Control (MVCC): Assigns sequence numbers to every operation (Put and Delete) to allow consistent reads and background compactions without blocking foreground writes.

🏗️ Architecture

The storage engine follows the classic LSM-Tree pipeline:

  1. Write Path (Put/Delete):

    • The operation is first appended to the Write-Ahead Log (WAL) on disk.
    • It is then inserted into the MemTable (In-Memory SkipList).
    • When the MemTable reaches its capacity (e.g., 4MB), it becomes an Immutable MemTable.
    • A background thread flushes the Immutable MemTable to disk as an SSTable at Level 0 (L0).
  2. Read Path (Get):

    • The active MemTable is searched first.
    • If not found, the Immutable MemTable(s) are searched.
    • If not found, the search queries the SSTables on disk.
    • Disk reads are highly optimized: The Bloom Filter is checked first. If the filter indicates the key might exist, the Index Block is read and binary-searched to find the exact Data Block. Finally, the data block is read and decompressed.

🛠️ Build Instructions

Prerequisites

  • A modern C++17 compatible compiler (GCC 9+, Clang 10+, MSVC 2019+).
  • CMake (3.20+).

Building the Project

# Clone the repository
git clone https://github.com/Joeavaib/LSMTree_Basic.git
cd LSMTree_Basic

# Configure and build
mkdir build
cd build
cmake ..
make -j$(nproc)

Running Tests

The project includes an integration test suite.

cd build
./db_test

💻 Usage Example

Here is a minimal example demonstrating how to embed and use the database:

#include <iostream>
#include <cassert>
#include "lsmdb/db.h"

int main() {
    lsmdb::DB* db;
    lsmdb::Options options;
    options.create_if_missing = true;
    options.write_buffer_size = 4 * 1024 * 1024; // 4MB MemTable

    // 1. Open the database
    lsmdb::Status s = lsmdb::DB::Open(options, "/tmp/test_db", &db);
    assert(s.ok());

    lsmdb::WriteOptions write_options;
    lsmdb::ReadOptions read_options;

    // 2. Put a key-value pair
    s = db->Put(write_options, "user:1001", "Alice");
    assert(s.ok());

    // 3. Get the value back
    std::string value;
    s = db->Get(read_options, "user:1001", &value);
    if (s.ok()) {
        std::cout << "Found: " << value << std::endl;
    }

    // 4. Delete the key
    s = db->Delete(write_options, "user:1001");
    assert(s.ok());

    // 5. Verify deletion
    s = db->Get(read_options, "user:1001", &value);
    assert(s.IsNotFound());

    // 6. Clean up
    delete db;
    return 0;
}

📅 Roadmap (Next Steps)

  • Leveled Compaction (L0 -> L1+): Implementing a merging iterator and size-based compaction picker to actively merge and push SSTables down the level hierarchy to optimize read amplification.
  • MANIFEST Logging: Persisting VersionEdit records to a MANIFEST file to allow the database to reconstruct its VersionSet (the layout of all SSTables across levels) upon restart.
  • Block Cache: An LRU cache to keep hot Data and Index blocks in memory, eliminating disk reads for frequently accessed keys.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors