An open API service indexing awesome lists of open source software.

https://github.com/rohittcodes/vibe-coded-db


https://github.com/rohittcodes/vibe-coded-db

Last synced: 10 months ago
JSON representation

Awesome Lists containing this project

README

          

# Time Series Database (TSDB) with DataCrate Support

A high-performance time series database written in Rust, featuring the innovative DataCrate concept for organizing time series data. This database is optimized specifically for time-based data operations with built-in SQL-like query capabilities and efficient data management.

## Features

### Core TSDB Features
- **DataCrate Organization**: Logical grouping of time series tables with individual retention policies
- **Time Series Engine**: Optimized storage and indexing for time-stamped data
- **SQL-like Query Language**: Flexible querying with time range filtering and aggregation
- **Schema Management**: Define fields and tags for structured time series data
- **Efficient Indexing**: Separate indices for time-based and tag-based queries
- **Retention Policies**: Automatic data expiration at both DataCrate and table levels

### Time Series Database (TSDB) with DataCrate Support
- **DataCrate Concept**: Organize time series data into logical collections (similar to databases)
- **Multiple Tables**: Each datacrate can contain multiple tables with different schemas
- **Retention Policies**: Set retention at both datacrate and table level
- **Schema Management**: Define fields and tags for structured data storage
- **Efficient Indexing**: Optimized for both time-based and tag-based queries
- **Command Structure**: Dedicated command sets for crate, table, and data operations

### Query Language
- **Time Range Queries**: Efficient filtering by timestamp ranges
- **Tag-based Filtering**: Filter by metadata tags (host, region, etc.)
- **Field Selection**: Choose specific fields to retrieve
- **Aggregation Functions**: COUNT, SUM, AVG, MIN, MAX with time grouping
- **Sorting and Limiting**: ORDER BY timestamp with result limits
- **Complex WHERE Clauses**: Multiple conditions with AND/OR logic

### Performance & Storage
- **Compression**: Automatic data compression for storage efficiency
- **Time-based Partitioning**: Efficient data organization by time periods
- **Concurrent Access**: Thread-safe operations for multiple clients
- **Memory Management**: Configurable caching with high hit ratios
- **Batch Operations**: Optimized bulk insert and query operations

## Running the Application

There are multiple ways to run the application:

### Using the run.cmd Script (Windows)
Simply run the `run.cmd` script and select an option:
```
run.cmd
```

### Running the TSDB Demo (Windows)
To run a demonstration of the time series database functionality:
```
run_tsdb_demo.cmd
```

### Running the DataCrate TSDB Demo (Windows)
To run a demonstration of the datacrate-based time series database functionality:
```
run_datacrate_demo.cmd
```

### Using Cargo Directly
To run the server:
```
cargo run --bin my-database-app
```

To run the client:
```
cargo run --bin client
```

Make sure the server is running before connecting with the client.

## Database Commands

Once the client is running, you can use the following commands:

### DataCrate Commands

| Command | Description | Example |
|---------|-------------|---------|
| `TSDB_CREATE_CRATE "" [retention_days]` | Create a new DataCrate | `TSDB_CREATE_CRATE metrics "System metrics" 90` |
| `TSDB_LIST_CRATES` | List all DataCrates | `TSDB_LIST_CRATES` |
| `TSDB_GET_CRATE ` | Get DataCrate information | `TSDB_GET_CRATE metrics` |
| `TSDB_DELETE_CRATE ` | Delete a DataCrate | `TSDB_DELETE_CRATE metrics` |

### Table Commands

| Command | Description | Example |
|---------|-------------|---------|
| `TSDB_CREATE_TABLE : [, ...]` | Create a table | `TSDB_CREATE_TABLE metrics cpu_usage value:float host,region` |
| `TSDB_LIST_TABLES ` | List tables in a crate | `TSDB_LIST_TABLES metrics` |
| `TSDB_GET_SCHEMA ` | Get table schema | `TSDB_GET_SCHEMA metrics cpu_usage` |
| `TSDB_DELETE_TABLE ` | Delete a table | `TSDB_DELETE_TABLE metrics cpu_usage` |

### Data Commands

| Command | Description | Example |
|---------|-------------|---------|
| `TSDB_INSERT = [=, ...]` | Insert time series data | `TSDB_INSERT metrics cpu_usage 1645123200 value=85.2 host=server1,region=us-east` |
| `TSDB_QUERY [] [START ] [END ] [WHERE ]` | Query time series data | `TSDB_QUERY metrics cpu_usage START 1645123200 END 1645126800 WHERE host=server1` |
| `TSDB_DELETE START END [WHERE ]` | Delete time series data | `TSDB_DELETE metrics cpu_usage START 1645123200 END 1645126800` |

### SQL-like Queries

You can also use SQL-like syntax for complex queries:

```sql
SELECT value, timestamp FROM cpu_usage
WHERE host = 'server1' AND timestamp >= 1645123200
ORDER BY timestamp DESC LIMIT 100;

SELECT AVG(value) as avg_cpu, COUNT(*) as samples
FROM cpu_usage
WHERE timestamp >= 1645123200
GROUP BY timestamp / 3600;
```

### Legacy Time Series Commands (for backward compatibility)

| Command | Description | Example |
|---------|-------------|---------|
| `TS_INSERT = [=, ...]` | Insert time series data | `TS_INSERT cpu_usage 1645123200 value=85.2 host=server1` |
| `TS_QUERY [START ] [END ] [WHERE ]` | Query time series data | `TS_QUERY cpu_usage START 1645123200 WHERE host=server1` |
| `SELECT FROM WHERE ` | Query data from a measurement | `SELECT value, unit FROM temperature_reading WHERE value > 20` |
| `DELETE FROM WHERE ` | Delete data from a measurement | `DELETE FROM temperature_reading WHERE value < 0` |
| `CREATE RETENTION POLICY ON DURATION REPLICATION ` | Create a retention policy | `CREATE RETENTION POLICY one_week ON temperature_reading DURATION 7d REPLICATION 1` |
| `SHOW MEASUREMENTS` | List all measurements | `SHOW MEASUREMENTS` |
| `SHOW TAG KEYS FROM ` | Show tag keys for a measurement | `SHOW TAG KEYS FROM temperature_reading` |
| `SHOW FIELD KEYS FROM ` | Show field keys for a measurement | `SHOW FIELD KEYS FROM temperature_reading` |

## Project Structure

```
my-database-app
├── src
│ ├── main.rs # Entry point and server implementation
│ ├── database.rs # Core database operations
│ ├── compression.rs # Data compression utilities
│ ├── storage
│ │ ├── disk.rs # Persistent storage with WAL
│ │ └── memory.rs # In-memory storage
│ ├── query
│ │ ├── parser.rs # SQL and ML command parser
│ │ └── executor.rs # Query execution engine
│ ├── tsdb # Time Series Database
│ │ ├── types.rs # TSDB data structures
│ │ ├── index.rs # Time and tag indexing
│ │ ├── query.rs # SQL-like query parser and executor
│ │ └── storage.rs # TSDB storage engine
│ └── ml
│ ├── predictor.rs # ML model management
│ ├── feature_engineering.rs # Automated feature engineering
│ └── time_series.rs # Time series analysis
├── tests
│ ├── integration_tests.rs # Integration tests
│ └── tsdb_tests.rs # TSDB specific tests
├── Cargo.toml # Project configuration
└── README.md # Documentation
```

### Key Components

#### Storage Layer
- `disk.rs`: Implements persistent storage with Write-Ahead Logging
- `memory.rs`: Provides high-performance in-memory storage
- `compression.rs`: Handles data compression using multiple algorithms

#### Time Series Database
- `types.rs`: Core TSDB data structures and schemas
- `index.rs`: Efficient time-based and tag-based indexing
- `query.rs`: SQL-like query language for time series data
- `storage.rs`: Specialized storage engine for time series data

#### Query Processing
- `parser.rs`: Parses both traditional SQL and ML-specific commands
- `executor.rs`: Executes queries and manages transaction flow

#### Machine Learning
- `predictor.rs`: Manages ML models, training, and predictions
- `feature_engineering.rs`: Automates feature selection and engineering
- `time_series.rs`: Handles time series analysis and forecasting

## Usage Examples

### Basic Database Operations
```
-- Insert data
INSERT 1 {"temperature": 25.5, "humidity": 60}

-- Query data
GET 1

-- List all records
LIST

-- Count records
COUNT

-- Search records by content
SEARCH temperature

-- Delete data
DELETE 1
```

### Time Series Database Operations
```
-- Create a measurement
TS.CREATE "{\"name\":\"weather\",\"fields\":{\"temperature\":0.0,\"humidity\":0.0,\"pressure\":0.0},\"retention_policy\":{\"Duration\":2592000}}"

-- Insert time series data
TS.INSERT weather "{\"timestamp\":\"2025-06-06T12:00:00Z\",\"fields\":{\"temperature\":72.5,\"humidity\":45.0,\"pressure\":1013.2},\"tags\":{\"location\":\"NYC\",\"station\":\"central\"}}"

-- Simple query
TS.QUERY "SELECT temperature, humidity FROM weather WHERE TIME >= '2025-06-01T00:00:00Z' AND TIME <= '2025-06-07T00:00:00Z'"

-- Query with tag filter
TS.QUERY "SELECT temperature, humidity FROM weather WHERE location = 'NYC'"

-- Aggregation query
TS.QUERY "SELECT AVG(temperature), MAX(humidity), MIN(pressure) FROM weather GROUP BY TIME(6h)"

-- List all measurements
TS.LIST
```

For more detailed documentation, see [TSDB.md](docs/TSDB.md).

### Machine Learning Operations
```sql
-- Create and train a predictor
CREATE PREDICTOR weather_forecast
FROM weather_data
PREDICT temperature
USING features (humidity, pressure, wind_speed)
WITH time_series_settings (
time_column='timestamp',
horizon=24,
window_size=168
);

-- Make predictions
SELECT temperature, confidence, explanation
FROM weather_forecast
WHERE humidity = 65 AND pressure = 1013;

-- Get feature importance
EXPLAIN weather_forecast
WITH feature_importance = true;
```

### Time Series Analysis
```sql
-- Create a time series predictor
CREATE PREDICTOR stock_forecast
FROM stock_data
PREDICT price
USING features (volume, open, close, high, low)
WITH time_series_settings (
time_column='date',
horizon=7,
window_size=30
);

-- Get forecast with confidence intervals
SELECT price, confidence_interval, seasonal_components
FROM stock_forecast
NEXT 7 DAYS;
```

## Setup Instructions

1. Install Rust (if not already installed):
```bash
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```

2. Clone and build the project:
```bash
git clone
cd my-database-app
cargo build --release
```

3. Run the database server:
```bash
cargo run --release
```

The server will start on `localhost:8080` by default.

## Usage

After running the application, you can interact with the database through the provided commands. Refer to the documentation in the source files for detailed usage instructions.

## Contributing

Contributions are welcome! Please open an issue or submit a pull request for any enhancements or bug fixes.