{"id":27790826,"url":"https://github.com/umit/lsm-tree-storage","last_synced_at":"2025-04-30T18:45:55.608Z","repository":{"id":290055654,"uuid":"973235025","full_name":"umit/lsm-tree-storage","owner":"umit","description":"A high-performance, Log-Structured Merge-Tree (LSM-tree) based storage system for key-value databases. This implementation leverages Java 21's new features like the Arena API and MemoryLayout for efficient memory management.","archived":false,"fork":false,"pushed_at":"2025-04-26T15:05:30.000Z","size":153,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-30T18:45:51.788Z","etag":null,"topics":["database","distributed-systems","java","keyvalue","log-structured-merge-tree","lsm","lsm-tree"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/umit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-26T14:58:27.000Z","updated_at":"2025-04-26T15:22:58.000Z","dependencies_parsed_at":"2025-04-26T16:20:47.627Z","dependency_job_id":"9ca5953d-60b5-4900-9254-420e4ad415a9","html_url":"https://github.com/umit/lsm-tree-storage","commit_stats":null,"previous_names":["umit/lsm-tree-storage"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/umit%2Flsm-tree-storage","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/umit%2Flsm-tree-storage/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/umit%2Flsm-tree-storage/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/umit%2Flsm-tree-storage/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/umit","download_url":"https://codeload.github.com/umit/lsm-tree-storage/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251765254,"owners_count":21640160,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","distributed-systems","java","keyvalue","log-structured-merge-tree","lsm","lsm-tree"],"created_at":"2025-04-30T18:45:53.229Z","updated_at":"2025-04-30T18:45:55.586Z","avatar_url":"https://github.com/umit.png","language":"Java","readme":"# LSM-Tree Storage\n\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Java Version](https://img.shields.io/badge/Java-21-orange.svg)](https://openjdk.java.net/projects/jdk/21/)\n\nA high-performance, Log-Structured Merge-Tree (LSM-tree) based storage system for key-value databases. This implementation leverages Java 21's new features like the Arena API and MemoryLayout for efficient memory management.\n\n## Table of Contents\n\n- [Overview](#overview)\n- [Features](#features)\n- [Requirements](#requirements)\n- [Installation](#installation)\n- [Usage](#usage)\n- [Architecture](#architecture)\n- [Performance](#performance)\n- [Contributing](#contributing)\n- [License](#license)\n- [Acknowledgements](#acknowledgements)\n\n## Overview\n\nLSM-Tree Storage is a Java implementation of the Log-Structured Merge-Tree, a data structure designed for high write throughput while maintaining good read performance. It's particularly well-suited for write-heavy workloads and time-series data.\n\nThe implementation follows the classic LSM-tree architecture with:\n- In-memory MemTable for recent writes\n- Immutable SSTables on disk for persistent storage\n- Write-Ahead Log (WAL) for durability and crash recovery\n- Background compaction to merge SSTables and optimize storage\n\n## Features\n\n- **High Write Throughput**: Optimized for write-heavy workloads by batching writes in memory.\n- **Efficient Range Queries**: Data is stored in sorted order for efficient range scans.\n- **Automatic Compaction**: Background process to merge SSTables and reclaim space.\n- **TTL Support**: Entries can expire after a specified time.\n- **Tombstone Markers**: Special markers for deleted entries.\n- **Bloom Filters**: Efficient negative lookups to avoid unnecessary disk reads.\n- **Memory Efficiency**: Uses Java 21's Arena API for efficient memory management.\n- **Structured Data Access**: Uses Java 21's MemoryLayout for efficient off-heap storage.\n- **Configurable Compaction Strategies**: Choose between threshold-based and size-tiered compaction.\n- **Thread-Safe**: Concurrent read and write operations with appropriate locking.\n\n## Requirements\n\n- Java 21 or higher\n- Maven 3.6 or higher\n\n## Installation\n\n### Maven\n\nAdd the following dependency to your `pom.xml`:\n\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003ecom.umitunal\u003c/groupId\u003e\n    \u003cartifactId\u003elsm-tree-storage\u003c/artifactId\u003e\n    \u003cversion\u003e1.0-SNAPSHOT\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n### Building from Source\n\nClone the repository and build with Maven:\n\n```bash\ngit clone https://github.com/umit/lsm-tree-storage.git\ncd lsm-tree-storage\nmvn clean install\n```\n\n## Usage\n\n### Basic Operations\n\n```java\n// Create an LSMStore with default configuration\nStorage storage = new LSMStore();\n\n// Or with custom configuration\nStorage storage = new LSMStore(\n    10 * 1024 * 1024,  // 10MB MemTable size\n    \"./data\",          // Data directory\n    4                  // Compact after 4 SSTables\n);\n\n// Put a key-value pair\nstorage.put(\"key\".getBytes(), \"value\".getBytes());\n\n// Get a value by key\nbyte[] value = storage.get(\"key\".getBytes());\n\n// Delete a key\nstorage.delete(\"key\".getBytes());\n\n// Check if a key exists\nboolean exists = storage.containsKey(\"key\".getBytes());\n\n// Get all keys\nList\u003cbyte[]\u003e keys = storage.listKeys();\n\n// Get the number of entries\nint size = storage.size();\n\n// Clear all entries\nstorage.clear();\n\n// Shutdown the storage\nstorage.shutdown();\n```\n\n### Range Queries\n\n```java\n// Get all key-value pairs in a range\nbyte[] startKey = \"a\".getBytes();\nbyte[] endKey = \"z\".getBytes();\nMap\u003cbyte[], byte[]\u003e range = storage.getRange(startKey, endKey);\n\n// Or use an iterator for more efficient processing\ntry (KeyValueIterator iterator = storage.getIterator(startKey, endKey)) {\n    while (iterator.hasNext()) {\n        Map.Entry\u003cbyte[], byte[]\u003e entry = iterator.next();\n        byte[] key = entry.getKey();\n        byte[] value = entry.getValue();\n        // Process the key-value pair\n    }\n}\n```\n\n### TTL (Time-To-Live)\n\n```java\n// Put a key-value pair with a TTL of 60 seconds\nstorage.put(\"key\".getBytes(), \"value\".getBytes(), 60);\n```\n\n### Advanced Configuration\n\n```java\n// Create an LSMStore with a specific compaction strategy\nStorage storage = new LSMStore(\n    10 * 1024 * 1024,                    // 10MB MemTable size\n    \"./data\",                            // Data directory\n    4,                                   // Compact after 4 SSTables\n    CompactionStrategyType.SIZE_TIERED   // Use size-tiered compaction\n);\n\n// Or with a fully custom configuration\nLSMStoreConfig config = new LSMStoreConfig(\n    10 * 1024 * 1024,                    // 10MB MemTable size\n    \"./data\",                            // Data directory\n    4,                                   // Compact after 4 SSTables\n    30,                                  // 30 minutes compaction interval\n    1,                                   // 1 minute cleanup interval\n    10,                                  // 10 seconds flush interval\n    CompactionStrategyType.THRESHOLD     // Use threshold-based compaction\n);\nStorage storage = new LSMStore(config);\n```\n\n## Architecture\n\nThe LSM-tree implementation is organized into the following packages:\n\n1. **core**: Contains the main `LSMStore` class that implements the `Storage` interface and coordinates all operations.\n2. **memtable**: Contains the `MemTable` class that represents the in-memory component of the storage system.\n3. **sstable**: Contains the `SSTable` class that represents the on-disk component of the storage system.\n4. **wal**: Contains the `WriteAheadLog` class that provides durability and crash recovery.\n\nSee the README.md files in each package for more details on the specific components:\n\n- [core/README.md](src/main/java/com/umitunal/lsm/core/README.md): Details on the `LSMStore` class and overall coordination.\n- [memtable/README.md](src/main/java/com/umitunal/lsm/memtable/README.md): Details on the `MemTable` class and in-memory storage.\n- [sstable/README.md](src/main/java/com/umitunal/lsm/sstable/README.md): Details on the `SSTable` class and on-disk storage.\n- [wal/README.md](src/main/java/com/umitunal/lsm/wal/README.md): Details on the `WriteAheadLog` class and durability.\n\n## Performance\n\nThe LSM-Tree Storage is designed for high write throughput. In benchmarks, it can achieve:\n\n- Write throughput: Up to 1 million operations per second on modern hardware\n- Read throughput: Up to 500,000 operations per second for point lookups\n- Range query performance: Efficient for small to medium-sized ranges\n\nActual performance will vary based on hardware, configuration, and workload characteristics.\n\n## Contributing\n\nContributions are welcome! Please read our [Contributing Guide](CONTRIBUTING.md) for details on our code of conduct and the process for submitting pull requests.\n\n## License\n\nThis project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgements\n\n- The LSM-tree data structure was first described in the paper [\"The Log-Structured Merge-Tree (LSM-Tree)\"](https://www.cs.umb.edu/~poneil/lsmtree.pdf) by Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil.\n- This implementation draws inspiration from various open-source LSM-tree implementations, including LevelDB, RocksDB, and Cassandra.\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fumit%2Flsm-tree-storage","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fumit%2Flsm-tree-storage","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fumit%2Flsm-tree-storage/lists"}