https://github.com/rohittcodes/vibe-coded-db
https://github.com/rohittcodes/vibe-coded-db
Last synced: 10 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/rohittcodes/vibe-coded-db
- Owner: rohittcodes
- Created: 2025-06-05T11:10:58.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-07T11:12:56.000Z (about 1 year ago)
- Last Synced: 2025-07-30T22:26:38.360Z (11 months ago)
- Language: Rust
- Size: 168 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Time Series Database (TSDB) with DataCrate Support
A high-performance time series database written in Rust, featuring the innovative DataCrate concept for organizing time series data. This database is optimized specifically for time-based data operations with built-in SQL-like query capabilities and efficient data management.
## Features
### Core TSDB Features
- **DataCrate Organization**: Logical grouping of time series tables with individual retention policies
- **Time Series Engine**: Optimized storage and indexing for time-stamped data
- **SQL-like Query Language**: Flexible querying with time range filtering and aggregation
- **Schema Management**: Define fields and tags for structured time series data
- **Efficient Indexing**: Separate indices for time-based and tag-based queries
- **Retention Policies**: Automatic data expiration at both DataCrate and table levels
### Time Series Database (TSDB) with DataCrate Support
- **DataCrate Concept**: Organize time series data into logical collections (similar to databases)
- **Multiple Tables**: Each datacrate can contain multiple tables with different schemas
- **Retention Policies**: Set retention at both datacrate and table level
- **Schema Management**: Define fields and tags for structured data storage
- **Efficient Indexing**: Optimized for both time-based and tag-based queries
- **Command Structure**: Dedicated command sets for crate, table, and data operations
### Query Language
- **Time Range Queries**: Efficient filtering by timestamp ranges
- **Tag-based Filtering**: Filter by metadata tags (host, region, etc.)
- **Field Selection**: Choose specific fields to retrieve
- **Aggregation Functions**: COUNT, SUM, AVG, MIN, MAX with time grouping
- **Sorting and Limiting**: ORDER BY timestamp with result limits
- **Complex WHERE Clauses**: Multiple conditions with AND/OR logic
### Performance & Storage
- **Compression**: Automatic data compression for storage efficiency
- **Time-based Partitioning**: Efficient data organization by time periods
- **Concurrent Access**: Thread-safe operations for multiple clients
- **Memory Management**: Configurable caching with high hit ratios
- **Batch Operations**: Optimized bulk insert and query operations
## Running the Application
There are multiple ways to run the application:
### Using the run.cmd Script (Windows)
Simply run the `run.cmd` script and select an option:
```
run.cmd
```
### Running the TSDB Demo (Windows)
To run a demonstration of the time series database functionality:
```
run_tsdb_demo.cmd
```
### Running the DataCrate TSDB Demo (Windows)
To run a demonstration of the datacrate-based time series database functionality:
```
run_datacrate_demo.cmd
```
### Using Cargo Directly
To run the server:
```
cargo run --bin my-database-app
```
To run the client:
```
cargo run --bin client
```
Make sure the server is running before connecting with the client.
## Database Commands
Once the client is running, you can use the following commands:
### DataCrate Commands
| Command | Description | Example |
|---------|-------------|---------|
| `TSDB_CREATE_CRATE "" [retention_days]` | Create a new DataCrate | `TSDB_CREATE_CRATE metrics "System metrics" 90` |
| `TSDB_LIST_CRATES` | List all DataCrates | `TSDB_LIST_CRATES` |
| `TSDB_GET_CRATE ` | Get DataCrate information | `TSDB_GET_CRATE metrics` |
| `TSDB_DELETE_CRATE ` | Delete a DataCrate | `TSDB_DELETE_CRATE metrics` |
### Table Commands
| Command | Description | Example |
|---------|-------------|---------|
| `TSDB_CREATE_TABLE : [, ...]` | Create a table | `TSDB_CREATE_TABLE metrics cpu_usage value:float host,region` |
| `TSDB_LIST_TABLES ` | List tables in a crate | `TSDB_LIST_TABLES metrics` |
| `TSDB_GET_SCHEMA ` | Get table schema | `TSDB_GET_SCHEMA metrics cpu_usage` |
| `TSDB_DELETE_TABLE ` | Delete a table | `TSDB_DELETE_TABLE metrics cpu_usage` |
### Data Commands
| Command | Description | Example |
|---------|-------------|---------|
| `TSDB_INSERT = [=, ...]` | Insert time series data | `TSDB_INSERT metrics cpu_usage 1645123200 value=85.2 host=server1,region=us-east` |
| `TSDB_QUERY [] [START ] [END ] [WHERE ]` | Query time series data | `TSDB_QUERY metrics cpu_usage START 1645123200 END 1645126800 WHERE host=server1` |
| `TSDB_DELETE START END [WHERE ]` | Delete time series data | `TSDB_DELETE metrics cpu_usage START 1645123200 END 1645126800` |
### SQL-like Queries
You can also use SQL-like syntax for complex queries:
```sql
SELECT value, timestamp FROM cpu_usage
WHERE host = 'server1' AND timestamp >= 1645123200
ORDER BY timestamp DESC LIMIT 100;
SELECT AVG(value) as avg_cpu, COUNT(*) as samples
FROM cpu_usage
WHERE timestamp >= 1645123200
GROUP BY timestamp / 3600;
```
### Legacy Time Series Commands (for backward compatibility)
| Command | Description | Example |
|---------|-------------|---------|
| `TS_INSERT = [=, ...]` | Insert time series data | `TS_INSERT cpu_usage 1645123200 value=85.2 host=server1` |
| `TS_QUERY [START ] [END ] [WHERE ]` | Query time series data | `TS_QUERY cpu_usage START 1645123200 WHERE host=server1` |
| `SELECT FROM WHERE ` | Query data from a measurement | `SELECT value, unit FROM temperature_reading WHERE value > 20` |
| `DELETE FROM WHERE ` | Delete data from a measurement | `DELETE FROM temperature_reading WHERE value < 0` |
| `CREATE RETENTION POLICY ON DURATION REPLICATION ` | Create a retention policy | `CREATE RETENTION POLICY one_week ON temperature_reading DURATION 7d REPLICATION 1` |
| `SHOW MEASUREMENTS` | List all measurements | `SHOW MEASUREMENTS` |
| `SHOW TAG KEYS FROM ` | Show tag keys for a measurement | `SHOW TAG KEYS FROM temperature_reading` |
| `SHOW FIELD KEYS FROM ` | Show field keys for a measurement | `SHOW FIELD KEYS FROM temperature_reading` |
## Project Structure
```
my-database-app
├── src
│ ├── main.rs # Entry point and server implementation
│ ├── database.rs # Core database operations
│ ├── compression.rs # Data compression utilities
│ ├── storage
│ │ ├── disk.rs # Persistent storage with WAL
│ │ └── memory.rs # In-memory storage
│ ├── query
│ │ ├── parser.rs # SQL and ML command parser
│ │ └── executor.rs # Query execution engine
│ ├── tsdb # Time Series Database
│ │ ├── types.rs # TSDB data structures
│ │ ├── index.rs # Time and tag indexing
│ │ ├── query.rs # SQL-like query parser and executor
│ │ └── storage.rs # TSDB storage engine
│ └── ml
│ ├── predictor.rs # ML model management
│ ├── feature_engineering.rs # Automated feature engineering
│ └── time_series.rs # Time series analysis
├── tests
│ ├── integration_tests.rs # Integration tests
│ └── tsdb_tests.rs # TSDB specific tests
├── Cargo.toml # Project configuration
└── README.md # Documentation
```
### Key Components
#### Storage Layer
- `disk.rs`: Implements persistent storage with Write-Ahead Logging
- `memory.rs`: Provides high-performance in-memory storage
- `compression.rs`: Handles data compression using multiple algorithms
#### Time Series Database
- `types.rs`: Core TSDB data structures and schemas
- `index.rs`: Efficient time-based and tag-based indexing
- `query.rs`: SQL-like query language for time series data
- `storage.rs`: Specialized storage engine for time series data
#### Query Processing
- `parser.rs`: Parses both traditional SQL and ML-specific commands
- `executor.rs`: Executes queries and manages transaction flow
#### Machine Learning
- `predictor.rs`: Manages ML models, training, and predictions
- `feature_engineering.rs`: Automates feature selection and engineering
- `time_series.rs`: Handles time series analysis and forecasting
## Usage Examples
### Basic Database Operations
```
-- Insert data
INSERT 1 {"temperature": 25.5, "humidity": 60}
-- Query data
GET 1
-- List all records
LIST
-- Count records
COUNT
-- Search records by content
SEARCH temperature
-- Delete data
DELETE 1
```
### Time Series Database Operations
```
-- Create a measurement
TS.CREATE "{\"name\":\"weather\",\"fields\":{\"temperature\":0.0,\"humidity\":0.0,\"pressure\":0.0},\"retention_policy\":{\"Duration\":2592000}}"
-- Insert time series data
TS.INSERT weather "{\"timestamp\":\"2025-06-06T12:00:00Z\",\"fields\":{\"temperature\":72.5,\"humidity\":45.0,\"pressure\":1013.2},\"tags\":{\"location\":\"NYC\",\"station\":\"central\"}}"
-- Simple query
TS.QUERY "SELECT temperature, humidity FROM weather WHERE TIME >= '2025-06-01T00:00:00Z' AND TIME <= '2025-06-07T00:00:00Z'"
-- Query with tag filter
TS.QUERY "SELECT temperature, humidity FROM weather WHERE location = 'NYC'"
-- Aggregation query
TS.QUERY "SELECT AVG(temperature), MAX(humidity), MIN(pressure) FROM weather GROUP BY TIME(6h)"
-- List all measurements
TS.LIST
```
For more detailed documentation, see [TSDB.md](docs/TSDB.md).
### Machine Learning Operations
```sql
-- Create and train a predictor
CREATE PREDICTOR weather_forecast
FROM weather_data
PREDICT temperature
USING features (humidity, pressure, wind_speed)
WITH time_series_settings (
time_column='timestamp',
horizon=24,
window_size=168
);
-- Make predictions
SELECT temperature, confidence, explanation
FROM weather_forecast
WHERE humidity = 65 AND pressure = 1013;
-- Get feature importance
EXPLAIN weather_forecast
WITH feature_importance = true;
```
### Time Series Analysis
```sql
-- Create a time series predictor
CREATE PREDICTOR stock_forecast
FROM stock_data
PREDICT price
USING features (volume, open, close, high, low)
WITH time_series_settings (
time_column='date',
horizon=7,
window_size=30
);
-- Get forecast with confidence intervals
SELECT price, confidence_interval, seasonal_components
FROM stock_forecast
NEXT 7 DAYS;
```
## Setup Instructions
1. Install Rust (if not already installed):
```bash
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```
2. Clone and build the project:
```bash
git clone
cd my-database-app
cargo build --release
```
3. Run the database server:
```bash
cargo run --release
```
The server will start on `localhost:8080` by default.
## Usage
After running the application, you can interact with the database through the provided commands. Refer to the documentation in the source files for detailed usage instructions.
## Contributing
Contributions are welcome! Please open an issue or submit a pull request for any enhancements or bug fixes.