https://github.com/efficientgoose/timeseriesstore
Implementation for Time Series Store assignment
https://github.com/efficientgoose/timeseriesstore
Last synced: 8 months ago
JSON representation
Implementation for Time Series Store assignment
- Host: GitHub
- URL: https://github.com/efficientgoose/timeseriesstore
- Owner: efficientgoose
- Created: 2025-07-20T20:29:41.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-07-20T21:13:42.000Z (11 months ago)
- Last Synced: 2025-07-20T23:24:16.450Z (11 months ago)
- Language: Java
- Homepage:
- Size: 30.3 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# In-Memory Time Series Store
A high-performance, thread-safe, in-memory time series data store implementation with persistence capabilities. Built for efficient handling of time-based metric data with support for concurrent operations and tag-based filtering.

## Features
### Core Functionality
- **Fast Insertions**: Optimized for high-throughput write operations (up to 10,000 data points/second)
- **Efficient Queries**: Time range and tag-based filtering with minimal latency
- **Thread Safety**: Full concurrent read/write support using `ConcurrentSkipListMap`
- **Persistence**: Automatic data persistence to CSV files with recovery on restart
- **Memory Management**: Configurable data retention (24 hours default) with automatic cleanup
### Advanced Capabilities
- **Tag-Based Filtering**: Support for complex multi-tag queries with AND logic
- **Duplicate Prevention**: Automatic deduplication of identical data points
- **Graceful Shutdown**: Ensures data persistence before application termination
- **Performance Monitoring**: Built-in memory usage tracking and cleanup statistics
## Performance Specifications
| Metric | Target | Achieved |
|-----------------------|-------------------------|-------------|
| Write Throughput | 10,000 ops/sec | Supported |
| Query Throughput | 1,000 queries/sec | Supported |
| Maximum Metrics | 100,000 unique metrics | Supported |
| Data Retention | 24 hours (configurable) | Implemented |
| Concurrent Operations | Multi-threaded R/W | Thread-safe |
## Architecture
### Data Model
```java
DataPoint {
long timestamp; // Unix timestamp in milliseconds
String metric; // Metric name (e.g., "cpu.usage")
double value; // Numeric value
Map tags; // Key-value metadata
}
```
### Storage Structure
- **Primary Index**: `ConcurrentSkipListMap>` sorted by timestamp
- **Concurrency**: `CopyOnWriteArrayList` for handling multiple data points per timestamp
- **Persistence**: CSV format with automatic serialization/deserialization
### Test Coverage
### Unit + Performance/Stress Tests:
- Basic insert/query operations
- Time range filtering
- Tag-based filtering
- Multi-tag compound filters
- Data cleanup and expiration
- Persistence and recovery
- Concurrent operations(500k records)
- Performance under load(500k records)
## Project Structure
```
com/interview/timeseries/
├── TimeSeriesStore.java # Main interface
├── TimeSeriesStoreImpl.java # Core implementation
├── DataPoint.java # Data model
├── Main.java # Demo application
├── TimeSeriesStoreTest.java # Unit tests
└── TimeSeriesStorePerformanceTest.java # Performance tests
```
## Configuration
### Retention Policy
```java
// Default: 24 hours
private final long EXPIRY_DURATION_MS = 24L * 60 * 60 * 1000;
// Cleanup frequency: Every hour
cleanerExecuter.scheduleAtFixedRate(..., 1, 1, TimeUnit.HOURS);
```
### Persistence Settings
```java
// CSV file location
private final String persistenceFile = "data_store.csv";
// File format: timestamp,metric,value,tag1=value1;tag2=value2
```
## Performance Optimization
### Write Performance
- **Data Structure**: `ConcurrentSkipListMap` provides O(log n) insertions
- **Concurrency**: Lock-free operations for high-throughput writes
- **Deduplication**: Efficient duplicate detection using `contains()`
### Query Performance
- **Time Range**: O(log n) for range identification using `subMap()`
- **Filtering**: Stream-based parallel processing
- **Memory**: Minimal object allocation during queries
### Memory Management
- **Automatic Cleanup**: Background thread removes expired data
- **Lazy Loading**: Data loaded from disk only on startup
- **Efficient Storage**: Shared timestamp keys reduce memory overhead
## Thread Safety
### Concurrency Model
- **Write Operations**: `ConcurrentSkipListMap.compute()` ensures atomic updates
- **Read Operations**: Lock-free traversal with consistent snapshots
- **File Operations**: Synchronized using `diskLock` for data integrity
- **Cleanup Operations**: Coordinated with main data operations
### Race Condition Prevention
- Atomic insertion with duplicate checking
- Thread-safe list implementations (`CopyOnWriteArrayList`)
- Proper synchronization for disk I/O operations
## Monitoring & Metrics
### Built-in Monitoring
```java
// Memory usage tracking
private void logMemory(String label) {
Runtime runtime = Runtime.getRuntime();
long used = (runtime.totalMemory() - runtime.freeMemory()) / (1024 * 1024);
System.out.println("[" + label + "] Memory used: " + used + " MB");
}
// Cleanup statistics
System.out.println("Cleanup executed. Removed " + (before - after) + " expired timestamps.");
```
## Data Persistence
### Storage Format
```csv
# Format: timestamp,metric,value,tags
1620000000000,cpu.usage,45.2,host=server1;datacenter=us-west
1620000001000,memory.used,60.0,host=server1
```
### Recovery Process
1. **Startup**: Automatically loads existing CSV data
2. **Validation**: Parses and validates each data point
3. **Insertion**: Rebuilds in-memory index structure
4. **Cleanup**: Removes expired entries post-recovery