https://github.com/tommygrammar/symmetric-compression
A **compression algorithm** that leverages _symmetry detection_ between files to identify and remove redundant data, achieving significant size reductions. This repository provides a Flask-based HTTP API for uploading text files, performing symmetry-based analysis, and returning optimized versions along with savings statistics.
https://github.com/tommygrammar/symmetric-compression
Last synced: 8 days ago
JSON representation
A **compression algorithm** that leverages _symmetry detection_ between files to identify and remove redundant data, achieving significant size reductions. This repository provides a Flask-based HTTP API for uploading text files, performing symmetry-based analysis, and returning optimized versions along with savings statistics.
- Host: GitHub
- URL: https://github.com/tommygrammar/symmetric-compression
- Owner: tommygrammar
- License: mit
- Created: 2025-05-13T08:43:54.000Z (10 days ago)
- Default Branch: blackgrammar-projects
- Last Pushed: 2025-05-13T08:45:11.000Z (10 days ago)
- Last Synced: 2025-05-13T09:39:23.007Z (10 days ago)
- Language: Python
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.MD
- License: LICENSE
Awesome Lists containing this project
README
# Symmetric Compression API
A **compression algorithm** that leverages _symmetry detection_ between files to identify and remove redundant data, achieving significant size reductions. This repository provides a Flask-based HTTP API for uploading text files, performing symmetry-based analysis, and returning optimized versions along with savings statistics.
---
## Table of Contents
1. [Features](#features)
2. [Getting Started](#getting-started)
3. [Installation](#installation)
4. [API Usage](#api-usage)
1. [Upload Request](#upload-request)
2. [Response Structure](#response-structure)
5. [How It Works](#how-it-works)
6. [Project Structure](#project-structure)
7. [Dependencies](#dependencies)
8. [License](#license)---
## Features
- **Symmetry-based analysis**
Compares each file in the uploaded set against every other as a “base,” extracting words shared in the same positions.- **Multi-threaded processing**
Uses Python’s `ThreadPoolExecutor` to analyze multiple base‐target combinations in parallel.- **Automated cleanup**
Retains only the best optimized folder (smallest total size) and removes all other intermediate output.- **MessagePack mapping**
Saves a `.msgpack` file with the mapping of shared word positions for downstream inspection or re-use.---
## Getting Started
These instructions will help you run the API locally, send upload requests, and interpret the results.
### Prerequisites
- Python 3.8+
- `pip` package manager---
## Installation
1. **Clone the repository**
```bash
git clone https://github.com/your-org/symmetric-compression.git
cd symmetric-compression
```2. **Create a virtual environment (optional but recommended)**
```bash
python3 -m venv .venv
source .venv/bin/activate
```3. **Install dependencies**
```bash
pip install -r requirements.txt
```4. **Run the API server**
```bash
python api.py
```By default, the server listens on http://0.0.0.0:5000.
---
## API Usage
### Upload Request
**Endpoint:**
`POST /upload`**Content Type:**
`multipart/form-data`**Form Field:**
- `files` — One or more text files to compress.**Example using curl**
```bash
curl -X POST http://localhost:5000/upload \
-F "files=@/path/to/file1.txt" \
-F "files=@/path/to/file2.txt"
```### Response Structure
On success, the API returns HTTP 200 with a JSON payload:
| Key | Type | Description |
|------------------------|----------|------------------------------------------------------------|
| `message` | string | Human-readable confirmation of completion. |
| `files` | array | List of uploaded files with filename and server path. |
| `total_uploaded_size` | integer | Sum of all uploaded file sizes in bytes. |
| `optimization_percentage` | float | Percentage reduction achieved by the best optimized folder. |**Sample Response**
```json
{
"message": "Optimization Analysis Complete: Here are your potential savings:",
"files": [
{ "filename": "file1.txt", "path": "/…/deck/file1.txt" },
{ "filename": "file2.txt", "path": "/…/deck/file2.txt" }
],
"total_uploaded_size": 20480,
"optimization_percentage": 35.7
}
```---
## How It Works
### Upload Handling
- Files are saved to a `deck/` folder under the project root.
- The total uploaded size is computed.### Symmetry Analysis
- Each file in `deck/` is tentatively treated as a base.
- All other files are compared to that base using Python’s `difflib.ndiff` to locate symmetrical words (shared words in identical relative positions).### Optimization
- For each base, a corresponding `_optimized/` folder is created.
- Symmetrical words are stripped from each target, producing optimized files.
- A `symmetry_mappings.msgpack` file records the exact positions of shared words for audit or reconstruction.### Selection & Cleanup
- The folder with the smallest total size across all `_optimized/` outputs is chosen as the final result.
- All other optimized folders are deleted.
- The API computes the optimization percentage relative to the original upload size.---
## Project Structure
```bash
.
├── api.py # Flask app and upload endpoint
├── optimal.py # Analyzer class orchestrating threads and cleanup
├── symmetrize.py # SymmetryProcessor class for diff, file I/O, Msgpack
├── deck/ # (auto-created) folder for uploaded files
├── requirements.txt # Python dependencies
└── README.md # This documentation
```---
## Dependencies
- `Flask` — Web framework for the HTTP API
- `flask-cors` — Enables Cross-Origin Resource Sharing
- `msgpack` — Serializes shared-word mappings into a compact binary format**Install via:**
```bash
pip install Flask flask-cors msgpack
```---
## License
This project is released under the MIT License.