{"id":16647617,"url":"https://github.com/sachaarbonel/reefdb","last_synced_at":"2025-04-05T05:04:00.723Z","repository":{"id":160107314,"uuid":"634902624","full_name":"sachaarbonel/reefdb","owner":"sachaarbonel","description":"ReefDB is a minimalistic, in-memory and on-disk database management system written in Rust, implementing basic SQL query capabilities and full-text search.","archived":false,"fork":false,"pushed_at":"2025-01-24T23:04:24.000Z","size":406,"stargazers_count":86,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-29T04:04:43.644Z","etag":null,"topics":["database","disk","memory","rust","search","sql"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sachaarbonel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-01T14:03:28.000Z","updated_at":"2025-03-10T22:45:33.000Z","dependencies_parsed_at":"2025-02-16T15:13:28.120Z","dependency_job_id":"b0a53690-622b-423d-a97a-2cf9fd24f60b","html_url":"https://github.com/sachaarbonel/reefdb","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sachaarbonel%2Freefdb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sachaarbonel%2Freefdb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sachaarbonel%2Freefdb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sachaarbonel%2Freefdb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sachaarbonel","download_url":"https://codeload.github.com/sachaarbonel/reefdb/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247289424,"owners_count":20914464,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","disk","memory","rust","search","sql"],"created_at":"2024-10-12T08:45:18.378Z","updated_at":"2025-04-05T05:04:00.708Z","avatar_url":"https://github.com/sachaarbonel.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ReefDB\n\n![ReefDB logo](https://user-images.githubusercontent.com/18029834/236632891-643c5a0a-8e26-4e88-9bc2-db69125b295f.png)\n\nReefDB is a minimalistic, in-memory and on-disk database management system written in Rust, implementing basic SQL query capabilities and full-text search.\n\n## Usage\n\nTo use ReefDB, you can choose between an in-memory storage (`InMemoryReefDB`) or on-disk storage (`OnDiskReefDB`). \n\n### Basic Example\n\n```rust\nuse reefdb::InMemoryReefDB;\n\nfn main() {\n    let mut db = InMemoryReefDB::new();\n\n    // Create a table with various data types\n    db.query(\"CREATE TABLE records (\n        id INTEGER PRIMARY KEY,\n        name TEXT,\n        active BOOLEAN,\n        score FLOAT,\n        birth_date DATE,\n        last_login TIMESTAMP,\n        description TSVECTOR\n    )\");\n\n    // Insert data with different types\n    db.query(\"INSERT INTO records VALUES (\n        1,\n        'Alice',\n        TRUE,\n        95.5,\n        '2000-01-01',\n        '2024-03-14 12:34:56',\n        'Software engineer with expertise in databases'\n    )\");\n\n    // Query with type-specific operations\n    db.query(\"SELECT * FROM records WHERE score \u003e 90.0\");\n    db.query(\"SELECT * FROM records WHERE birth_date \u003e '1999-12-31'\");\n    db.query(\"SELECT * FROM records WHERE active = TRUE\");\n    db.query(\"SELECT * FROM records WHERE to_tsvector(description) @@ to_tsquery('database')\");\n}\n```\n\n### On-Disk Storage\n\n```rust\nuse reefdb::OnDiskReefDB;\n\nfn main() {\n    let mut db = OnDiskReefDB::new(\"db.reef\".to_string(), \"index.bin\".to_string());\n    // Use the same SQL queries as with InMemoryReefDB\n}\n```\n\n## Features\n\n### Core Database Features\n- ✅ In-Memory and On-Disk storage modes\n- ✅ Basic SQL statements (CREATE, INSERT, SELECT, UPDATE, DELETE)\n- ✅ ALTER TABLE with ADD/DROP/RENAME column support\n- ✅ DROP TABLE functionality\n- ✅ INNER JOIN support\n- ✅ Primary key constraints\n- ✅ Basic error handling system\n- ✅ Rich data type support (INTEGER, TEXT, BOOLEAN, FLOAT, DATE, TIMESTAMP, NULL)\n\n### Data Types\n- ✅ INTEGER: Whole number values\n- ✅ TEXT: String values with support for escaped quotes\n- ✅ BOOLEAN: TRUE/FALSE values\n- ✅ FLOAT: Decimal number values\n- ✅ DATE: Date values in 'YYYY-MM-DD' format\n- ✅ TIMESTAMP: Datetime values in 'YYYY-MM-DD HH:MM:SS' format\n- ✅ NULL: Null values\n- ✅ TSVECTOR: Full-text search optimized text type\n\n### Full-Text Search\n- ✅ TSVECTOR data type\n- ✅ Inverted index implementation\n- ✅ Basic tokenization\n- ✅ Memory and disk-based index storage\n- ✅ @@ operator for text search\n\n### Transaction Support\n- ✅ Basic transaction structure\n- ✅ Transaction isolation levels (ReadUncommitted, ReadCommitted, RepeatableRead, Serializable)\n- ✅ Write-Ahead Logging (WAL)\n- ✅ Transaction manager with locking mechanism\n- ✅ Full ACID compliance\n- ✅ Deadlock detection\n- ✅ MVCC implementation\n- ✅ Savepoints\n- ✅ Autocommit\n### Indexing\n- ✅ B-Tree index implementation\n- ✅ GIN index implementation\n- ✅ CREATE INDEX and DROP INDEX support\n- ✅ Index persistence for on-disk storage\n- ✅ Basic query optimization with indexes\n\n## Dependencies\n\n- [nom](https://github.com/Geal/nom) for SQL parsing\n- [serde](https://github.com/serde-rs/serde) for serialization\n- [bincode](https://github.com/bincode-org/bincode) for encoding\n\n## Future Improvements\n\n### Critical for Production (Highest Priority)\n\n#### Query Analysis \u0026 Optimization\n- [ ] Query Analyzer Framework\n  - [ ] Cost-based query planning\n  - [ ] Statistics collection and management\n  - [ ] Index usage analysis\n  - [ ] Join order optimization\n  - [ ] Query rewriting\n- [ ] Query Plan Visualization\n  - [ ] Visual execution plan representation\n  - [ ] Cost breakdown analysis\n  - [ ] Performance bottleneck identification\n- [ ] Statistics Management\n  - [ ] Table statistics (row counts, size)\n  - [ ] Column statistics (cardinality, distribution)\n  - [ ] Index statistics (size, depth, usage)\n  - [ ] Automatic statistics updates\n\n#### Query Processing Essentials\n- [ ] Basic aggregate functions (COUNT, SUM)\n- [x] ORDER BY implementation\n- [ ] LIMIT and OFFSET support\n- [ ] LEFT JOIN support\n- [ ] Query timeout mechanism\n\n#### Core Performance Features\n- [x] Memory-mapped storage\n  - [x] Memory-mapped file handling\n  - [x] Basic persistence\n  - [x] Concurrent access support\n  - [ ] Page-level operations\n  - [ ] Buffer management\n  - [ ] Crash recovery\n  - [ ] Dynamic file resizing\n  - [ ] Memory-mapped index support\n- [ ] Index compression\n- [ ] Parallel query execution\n\n#### Monitoring \u0026 Diagnostics Essentials\n- [ ] Query Performance Metrics\n  - [ ] Execution time tracking\n  - [ ] Resource usage monitoring\n  - [ ] Query plan effectiveness\n  - [ ] Index usage statistics\n- [ ] Transaction monitoring\n- [ ] Error logging and tracing\n\n### High Priority\n\n#### Index Improvements\n- [ ] Multi-column indexes\n- [ ] Hash indexes for equality comparisons\n- [ ] Bitmap indexes for low-cardinality columns\n- [ ] Incremental indexing\n- [ ] Index maintenance optimization\n  - [ ] Background index rebuilding\n  - [ ] Index fragmentation analysis\n  - [ ] Automatic index suggestions\n\n#### Additional JOIN Support\n- [ ] RIGHT JOIN\n- [ ] OUTER JOIN\n- [ ] CROSS JOIN\n- [ ] FULL JOIN\n- [ ] NATURAL JOIN\n- [ ] SELF JOIN\n\n#### Advanced Query Processing\n- [ ] Additional aggregate functions (AVG, MIN, MAX)\n- [ ] GROUP BY and HAVING clauses\n- [ ] Window functions\n- [ ] Common Table Expressions (CTEs)\n- [ ] Subquery optimization\n\n#### Full-text Search Enhancements\n- [ ] Advanced Index Types\n  - [x] BM25 scoring with configurable parameters\n  - [x] TF-IDF with normalization options\n  - [ ] Custom scoring functions\n  - [ ] Position-aware indexing\n  - [ ] Field norms support\n\n- [ ] Query Features\n  - [ ] Fuzzy matching with configurable distance\n  - [ ] Regular expression support\n  - [ ] Range queries\n  - [ ] Boolean queries with minimum match\n  - [ ] Phrase queries with slop\n  - [ ] Query rewriting and optimization\n  - [ ] Query expansion\n  - [ ] Prefix matching (e.g., `web:*`)\n  - [ ] Complex boolean expressions with parentheses\n  - [ ] Result ranking with `ts_rank`\n  - [ ] Text highlighting with `ts_headline`\n\n- [ ] Faceted Search\n  - [ ] Hierarchical facets\n  - [ ] Dynamic facet counting\n  - [ ] Custom facet ordering\n  - [ ] Multi-value facets\n\n- [ ] Enhanced Scoring \u0026 Ranking\n  - [ ] Configurable scoring algorithms\n  - [ ] Score explanation\n  - [ ] Custom boosting factors\n  - [ ] Field-weight customization\n  - [ ] Position-based scoring\n\n- [ ] Search Quality\n  - [ ] Highlighting with snippets\n  - [ ] Relevance tuning tools\n  - [ ] Search quality metrics\n\n#### Vector Search Capabilities\n- [ ] Vector Data Types and Operations\n  - [ ] VECTOR(dimensions) data type\n  - [ ] Vector similarity operators (\u003c-\u003e, \u003c=\u003e, \u003c#\u003e)\n  - [ ] Configurable distance metrics (L2, Cosine, Dot Product)\n  - [ ] Vector normalization options\n\n- [ ] Dimension-Optimized Indexes\n  - [ ] KD-Tree for low dimensions (≤ 8)\n  - [ ] HNSW for medium dimensions (≤ 100)\n  - [ ] Brute Force with SIMD for high dimensions\n  - [ ] Index selection based on dimensionality\n\n- [ ] Advanced Vector Search Features\n  - [ ] Approximate Nearest Neighbors (ANN)\n  - [ ] Hybrid search (combine with text/filters)\n  - [ ] Batch vector operations\n  - [ ] Vector quantization\n  - [ ] Dynamic index rebuilding\n  - [ ] Multi-vector queries\n\n- [ ] Vector Search Optimizations\n  - [ ] SIMD acceleration\n  - [ ] Parallel search\n  - [ ] Memory-mapped vectors\n  - [ ] Vector compression\n  - [ ] Incremental index updates\n  - [ ] Cache-friendly layouts\n\n### Medium Priority\n\n#### Query Plan Management\n- [ ] Plan caching\n- [ ] Adaptive query execution\n- [ ] Runtime statistics collection\n- [ ] Dynamic plan adjustment\n- [ ] Materialized view suggestions\n\n#### Constraint System\n- [ ] UNIQUE constraints\n- [ ] CHECK constraints\n- [ ] NOT NULL constraints\n- [ ] DEFAULT values\n- [ ] Enhanced FOREIGN KEY support with ON DELETE/UPDATE actions\n\n#### Advanced Features\n- [ ] Views\n- [ ] Stored procedures\n- [ ] User-defined functions\n- [ ] Triggers\n- [ ] Materialized views\n\n#### CJK (Chinese, Japanese, Korean) Support\n- [ ] Character-based tokenization\n- [ ] N-gram tokenization\n- [ ] Dictionary-based word segmentation\n- [ ] Language-specific stop words\n- [ ] Unicode normalization\n- [ ] Ideograph handling\n- [ ] Reading/pronunciation support\n  - [ ] Pinyin for Chinese\n  - [ ] Hiragana/Katakana for Japanese\n  - [ ] Hangul/Hanja for Korean\n- [ ] Mixed script handling\n- [ ] CJK-specific scoring adjustments\n- [ ] Compound word processing\n- [ ] Character variant normalization\n\n### Lower Priority\n\n#### Data Types\n- [ ] DATE and TIME types\n- [ ] DECIMAL/NUMERIC types\n- [ ] BOOLEAN type\n- [ ] BLOB/BINARY types\n- [ ] Array types\n- [ ] JSON type\n- [ ] User-defined types\n\n#### Security Features\n- [ ] User authentication\n- [ ] Role-based authorization\n- [ ] Row-level security\n- [ ] Column-level security\n- [ ] Audit logging\n- [ ] SSL/TLS support\n\n#### Distributed Features\n- [ ] Replication using raft-rs\n- [ ] Master-slave configuration\n- [ ] Sharding support\n- [ ] Distributed transactions\n- [ ] Failover support\n\n#### Developer Experience\n- [ ] Command-line interface\n- [ ] Web-based admin interface\n- [ ] Query visualization\n- [ ] Performance monitoring dashboard\n- [ ] Schema visualization\n- [ ] Comprehensive documentation\n\n## License\n\nThis project is licensed under the MIT License. See [LICENSE](LICENSE) for more information.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsachaarbonel%2Freefdb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsachaarbonel%2Freefdb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsachaarbonel%2Freefdb/lists"}