https://github.com/freddiehaddad/mvcompression

🧠 Thread-safe adaptive compression decision system - learns when to skip ineffective compression using lock-free algorithms.
https://github.com/freddiehaddad/mvcompression
adaptive-algorithms atomic-operations compression data-processing lock-free machine-learning optimization performance rust
Last synced: 3 months ago
JSON representation
🧠 Thread-safe adaptive compression decision system - learns when to skip ineffective compression using lock-free algorithms.
Host: GitHub
URL: https://github.com/freddiehaddad/mvcompression
Owner: freddiehaddad
License: other
Created: 2025-07-20T14:37:50.000Z (3 months ago)
Default Branch: main
Last Pushed: 2025-07-20T15:55:29.000Z (3 months ago)
Last Synced: 2025-07-20T17:19:20.961Z (3 months ago)
Topics: adaptive-algorithms, atomic-operations, compression, data-processing, lock-free, machine-learning, optimization, performance, rust
Language: Rust
Homepage:
Size: 26.4 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE-APACHE
Awesome Lists containing this project

README

          # MVCompression - Adaptive Compression Decision System

[![Rust](https://img.shields.io/badge/rust-1.70+-blue.svg)](https://www.rust-lang.org)

[![License](https://img.shields.io/badge/license-MIT%20OR%20Apache--2.0-blue.svg)](LICENSE)

[![Documentation](https://img.shields.io/badge/docs-pending%20publication-orange.svg)](#documentation)

A thread-safe, lock-free adaptive compression decision system that learns from past compression performance to intelligently decide whether to compress future data blocks.

## 🚀 Features

- **🧠 Adaptive Learning**: Automatically learns from compression effectiveness over time

- **🔒 Thread-Safe**: Lock-free atomic operations for high-performance concurrent access

- **⚡ High Performance**: Minimal overhead with atomic compare-and-swap operations

- **🎯 Self-Tuning**: Automatically adjusts behavior based on data characteristics

- **📊 Monitoring**: Built-in metrics for algorithm state and performance tracking

- **🛡️ Safe**: Memory-safe Rust implementation with comprehensive testing

## 📖 Overview

The MVCompression algorithm maintains a "compression value" score and moving averages of compressed/uncompressed block sizes to make intelligent compression decisions. It adapts its behavior based on historical compression effectiveness, automatically skipping compression when it's likely to be ineffective.

### How It Works

1. **Compression Value**: Starts at -80 and adjusts based on compression results

   - Good compression (ratio ≤ 0.9): decreases value by 10

   - Poor compression (ratio > 0.9): increases value by 4

   - Skip events: decreases value by 1

2. **Skip Logic**: When compression value becomes positive:

   - Compares incoming block size to historical average

   - Skips compression if size is within 25% of expected size

3. **Moving Averages**: Tracks compressed and uncompressed block sizes

   - Uses exponential moving average with smoothing factor

   - Helps predict future compression effectiveness

## 🚀 Quick Start

Add this to your `Cargo.toml`:

```toml

[dependencies]

mvcompression = "0.1.0"

```

### Basic Usage

```rust

use mvcompression::MVCompression;

fn main() {

    let mvc = MVCompression::new();

    

    // Process data blocks

    for block_data in data_blocks {

        if mvc.should_skip_compression(block_data.len()) {

            // Skip compression for this block

            store_uncompressed(block_data);

        } else {

            // Attempt compression

            let compressed = compress(block_data);

            

            // Update algorithm with results

            mvc.update_compression_ratio(compressed.len(), block_data.len());

            store_compressed(compressed);

        }

    }

}

```

### Thread-Safe Usage

```rust

use mvcompression::MVCompression;

use std::sync::Arc;

use std::thread;

fn main() {

    let mvc = Arc::new(MVCompression::new());

    

    // Spawn multiple worker threads

    let handles: Vec<_> = (0..4).map(|_| {

        let mvc = Arc::clone(&mvc);

        thread::spawn(move || {

            // Each thread can safely use the same MVCompression instance

            if mvc.should_skip_compression(1024) {

                // Skip compression

            } else {

                // Perform compression and update

                mvc.update_compression_ratio(512, 1024);

            }

        })

    }).collect();

    

    for handle in handles {

        handle.join().unwrap();

    }

}

```

## 📊 Algorithm Parameters

The algorithm uses several tunable constants that affect its behavior:

| Parameter | Value | Description |

|-----------|-------|-------------|

| `BLOCK_COMPRESSABLE_RATIO` | 0.9 | Threshold for good vs poor compression |

| `INITIAL_COMPRESSION_VALUE` | -80 | Starting compression value |

| `COMPRESSIBLE_BLOCK_WEIGHT` | -10 | Adjustment for good compression |

| `NON_COMPRESSIBLE_BLOCK_WEIGHT` | 4 | Adjustment for poor compression |

| `SKIP_COMPRESSION_BLOCK_WEIGHT` | -1 | Adjustment when skipping |

| `MAX_COMPRESSION_VALUE` | 200 | Upper bound for compression value |

| `MIN_COMPRESSION_VALUE` | -300 | Lower bound for compression value |

These parameters create a system that:

- Starts optimistic (negative value = always compress)

- Quickly adapts to poor compression (small positive weight vs large negative)

- Gradually returns to compression attempts through skip penalties

## 🔧 API Reference

### Core Methods

- `MVCompression::new()` - Create a new instance

- `should_skip_compression(size: usize) -> bool` - Check if compression should be skipped

- `update_compression_ratio(compressed: usize, uncompressed: usize)` - Update algorithm with compression results

### Monitoring Methods

- `get_compression_value() -> i32` - Get current compression bias value

- `get_compressed_average() -> usize` - Get smoothed compressed size average

- `get_uncompressed_average() -> usize` - Get smoothed uncompressed size average

## 📈 Performance Characteristics

- **Lock-free**: All operations use atomic compare-and-swap loops

- **Memory efficient**: Only three atomic values per instance (12-16 bytes)

- **Low overhead**: Minimal computation per decision (~10-20 CPU cycles)

- **Scalable**: Performance doesn't degrade with thread count

- **Cache-friendly**: Compact memory layout with good locality

## 🧪 Examples

### Run the Basic Example

```bash

cargo run --example basic_usage

```

This runs a simulation showing how the algorithm learns to skip ineffective compression over time.

### Run Performance Analysis

```bash

cargo run --release --example performance_analysis

```

This provides comprehensive performance metrics including throughput, latency, memory usage, and convergence analysis.

### Expected Output

```

MVCompression Algorithm Demo

============================

Simulating compression of 30 blocks (1000 bytes each)

...

Block 21: COMPRESSED 1000 -> 1000 bytes (ratio: 1.00)

Block 22: SKIPPED compression (size: 1000 bytes)

Block 23: SKIPPED compression (size: 1000 bytes)

...

✓ Algorithm successfully learned to skip ineffective compression!

```

## 🧪 Testing

Run the comprehensive test suite:

```bash

# Run all tests

cargo test

# Run tests with output

cargo test -- --nocapture

# Run specific test

cargo test test_thread_safety

```

The test suite includes:

- Basic functionality tests

- Thread safety verification

- Edge case handling

- Boundary condition testing

- Performance regression tests

## 🔍 Use Cases

This algorithm is particularly useful for:

1. **Streaming Data Processing**: Real-time decision making for large data streams

2. **Database Storage**: Adaptive compression for variable data types

3. **Network Protocols**: Dynamic compression decisions based on payload characteristics

4. **File Systems**: Intelligent compression for diverse file types

5. **Backup Systems**: Optimizing backup speed vs storage efficiency

6. **CDN/Caching**: Adaptive compression for web content delivery

## 🧮 Algorithm Analysis

### Convergence Behavior

The algorithm typically converges to optimal behavior within 20-30 blocks:

- **Highly compressible data**: Maintains negative compression value, rarely skips

- **Poorly compressible data**: Develops positive compression value, frequently skips

- **Mixed data**: Adapts dynamically based on recent block characteristics

### Mathematical Properties

- **Stability**: Bounded compression value prevents oscillation

- **Responsiveness**: Asymmetric weights (10 vs 4) provide quick adaptation

- **Memory**: Exponential moving average provides historical context

- **Convergence**: System converges to optimal skip rate for given data characteristics

## ⚡ Performance Analysis

### Single-Threaded Performance

Based on release-mode benchmarks on modern hardware:

| Operation | Throughput | Latency |

|-----------|------------|---------|

| `should_skip_compression()` | ~2.1 billion ops/sec | ~0.47 ns |

| `update_compression_ratio()` | ~109 million ops/sec | ~9.18 ns |

### Multi-Threaded Scalability

The lock-free atomic design provides excellent concurrent performance:

| Threads | Combined Throughput | Efficiency |

|---------|-------------------|------------|

| 1 | 91M ops/sec | 100% |

| 2 | 56M ops/sec | 62% |

| 4 | 95M ops/sec | 52% |

| 8 | 157M ops/sec | 43% |

| 16 | 260M ops/sec | 36% |

*Note: Efficiency decreases with thread count due to cache coherency overhead, but total throughput continues to scale.*

### Memory Characteristics

- **Struct size**: 24 bytes total

  - `AtomicI32`: 4 bytes (compression value)

  - `AtomicUsize` × 2: 16 bytes (moving averages)

  - Padding: 4 bytes

- **No heap allocations**: Stack-only data structure

- **Cache-friendly**: Fits in single cache line (64 bytes)

- **Memory bandwidth**: Minimal (3 atomic loads/stores per operation)

### Convergence Performance

Algorithm adapts quickly to data characteristics:

| Data Pattern | Convergence Point | Final Skip Rate | Stability |

|--------------|------------------|-----------------|-----------|

| Highly Compressible (20% ratio) | 10 blocks | 0% | Excellent |

| Poorly Compressible (95% ratio) | 10 blocks | 48% | Excellent |

| Mixed Data (alternating) | 10 blocks | 0% | Good |

| Random Data (30-90% ratios) | 10 blocks | 0% | Good |

### Worst-Case Scenarios

The algorithm handles pathological cases gracefully:

- **Alternating patterns**: 10K operations in <1ms

- **High contention**: 32 threads × 1K operations in <1ms

- **Lock-free guarantee**: No deadlocks or priority inversion

- **Bounded behavior**: Always terminates within value bounds

### Performance Optimization Tips

1. **Batch operations**: Group multiple decisions when possible

2. **Avoid false sharing**: Keep `MVCompression` instances on separate cache lines

3. **Release builds**: Performance is 10-100x better than debug builds

4. **CPU-specific optimization**: Use `RUSTFLAGS="-C target-cpu=native"`

### Comparison with Alternatives

| Approach | Latency | Thread Safety | Memory | Adaptability |

|----------|---------|---------------|--------|--------------|

| **MVCompression** | ~0.5ns | Lock-free | 24 bytes | Excellent |

| Mutex-based | ~20-100ns | Blocking | 32+ bytes | Good |

| Thread-local | ~0.3ns | None | 24×threads | Poor |

| Fixed threshold | ~0.1ns | Perfect | 0 bytes | None |

## 🚦 Limitations

- **Learning Period**: Requires 15-30 blocks to learn data characteristics

- **Block Size Sensitivity**: Works best with relatively consistent block sizes

- **Compression Ratio Threshold**: Fixed 0.9 threshold may not suit all use cases

- **Memory Overhead**: Small but non-zero overhead for tracking state

## 🛠️ Development

### Building

```bash

# Debug build

cargo build

# Release build

cargo build --release

# With full optimizations

RUSTFLAGS="-C target-cpu=native" cargo build --release

```

### Benchmarking

```bash

# Run comprehensive performance analysis

cargo run --release --example performance_analysis

# Run criterion benchmarks (if available)

cargo bench

# Profile with perf (Linux)

cargo build --release

perf record --call-graph dwarf target/release/examples/performance_analysis

```

### Documentation

To view the full API documentation locally:

```bash

# Generate and open documentation in your browser

cargo doc --open

```

The documentation includes:

- Complete API reference with examples

- Algorithm implementation details

- Thread safety guarantees

- Performance characteristics

*Note: Online documentation will be available at [docs.rs](https://docs.rs/mvcompression) once the crate is published.*

## 📝 License

This project is licensed under either of

 * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or

   http://www.apache.org/licenses/LICENSE-2.0)

 * MIT license ([LICENSE-MIT](LICENSE-MIT) or

   http://opensource.org/licenses/MIT)

at your option.

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

### Development Setup

1. Fork the repository

2. Create a feature branch (`git checkout -b feature/amazing-feature`)

3. Make your changes

4. Add tests for your changes

5. Ensure all tests pass (`cargo test`)

6. Commit your changes (`git commit -m 'Add amazing feature'`)

7. Push to the branch (`git push origin feature/amazing-feature`)

8. Open a Pull Request

### Code Style

- Follow standard Rust formatting (`cargo fmt`)

- Ensure no clippy warnings (`cargo clippy`)

- Add documentation for public APIs

- Include tests for new functionality

## 📚 References

- API Documentation: Run `cargo doc --open` to view locally

## 🔗 Related Projects

- [LZ4](https://github.com/lz4/lz4) - Fast compression algorithm

- [Zstd](https://github.com/facebook/zstd) - High-performance compression

- [Snappy](https://github.com/google/snappy) - Fast compression/decompression library

---

**Made with ❤️ in Rust** 🦀
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/freddiehaddad/mvcompression

Awesome Lists containing this project

README