{"id":46045835,"url":"https://github.com/pnguyen215/pycelize","last_synced_at":"2026-06-17T17:01:25.337Z","repository":{"id":335268030,"uuid":"1144976638","full_name":"pnguyen215/pycelize","owner":"pnguyen215","description":"Pycelize is a Flask application designed for processing Excel and CSV files. It provides RESTful APIs for common data operations including extraction, normalization, mapping, SQL generation, and file binding.","archived":false,"fork":false,"pushed_at":"2026-02-06T10:15:38.000Z","size":164,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-02-06T15:06:33.480Z","etag":null,"topics":["excel","lib","numpy","openpyxl","pandas","py-app","pycelize","python-service","python3"],"latest_commit_sha":null,"homepage":"https://github.com/pnguyen215/pycelize","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pnguyen215.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-29T09:15:31.000Z","updated_at":"2026-02-06T08:03:37.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/pnguyen215/pycelize","commit_stats":null,"previous_names":["pnguyen215/pycelize"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/pnguyen215/pycelize","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pnguyen215%2Fpycelize","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pnguyen215%2Fpycelize/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pnguyen215%2Fpycelize/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pnguyen215%2Fpycelize/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pnguyen215","download_url":"https://codeload.github.com/pnguyen215/pycelize/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pnguyen215%2Fpycelize/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29964181,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-01T06:55:38.174Z","status":"ssl_error","status_checked_at":"2026-03-01T06:53:04.810Z","response_time":124,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["excel","lib","numpy","openpyxl","pandas","py-app","pycelize","python-service","python3"],"created_at":"2026-03-01T07:35:15.093Z","updated_at":"2026-06-17T17:01:25.311Z","avatar_url":"https://github.com/pnguyen215.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pycelize\n\nA professional Flask application for Excel/CSV processing with comprehensive API support and Chat Workflows for sequential file processing.\n\n![Python](https://img.shields.io/badge/Python-3.9+-blue.svg)\n![Flask](https://img.shields.io/badge/Flask-2.3+-green.svg)\n![License](https://img.shields.io/badge/License-MIT-yellow.svg)\n\n## 📋 Table of Contents\n\n- [Overview](#overview)\n- [Features](#features)\n- [Project Structure](#project-structure)\n- [Chat Workflows](#chat-workflows)\n- [Installation](#installation)\n- [Configuration](#configuration)\n- [API Documentation](#api-documentation)\n- [Usage Examples](#usage-examples)\n- [Search and Filter APIs](#search-and-filter-apis)\n- [Excel Binding APIs](#excel-binding-apis)\n- [JSON Generation Features](#json-generation-features)\n- [Chat Workflows API Reference](#chat-workflows-api-reference)\n- [Chat Bot API Reference](#chat-bot-api-reference)\n- [WebSocket Integration](#websocket-integration)\n- [Frontend Integration Guide](#frontend-integration-guide)\n- [Design Patterns](#design-patterns)\n- [Testing](#testing)\n- [Available Normalization Types](#available-normalization-types)\n- [Recent Improvements](#recent-improvements)\n- [Troubleshooting](#troubleshooting)\n- [Contributing](#contributing)\n- [License](#license)\n\n## 🎯 Overview\n\nPycelize is a production-ready Flask application designed for processing Excel and CSV files. It provides RESTful APIs for common data operations including extraction, normalization, mapping, SQL generation, and file binding.\n\n### New: Chat Workflows \u0026 Chat Bot\n\nPycelize now includes **Chat Workflows** and a **Telegram-like Chat Bot** - powerful features for file processing:\n\n#### Chat Workflows\n- **Conversational processing**: Process files through multi-step workflows\n- **Real-time updates**: Monitor progress via WebSocket streaming\n- **Dump \u0026 Restore**: Backup and restore complete conversations\n- **Partition-based storage**: Scalable file organization\n- **Download management**: Absolute URLs for easy file access\n\n#### Chat Bot 🤖 (NEW!)\n- **Natural Language Interface**: Describe what you want to do in plain English\n- **Intent Classification**: Automatically understands your requests\n- **Workflow Suggestions**: Proposes appropriate processing steps\n- **Interactive Confirmation**: Review and modify workflows before execution\n- **Real-time Progress**: WebSocket updates during processing\n- **Smart State Management**: Context-aware responses based on conversation flow\n\n## ✨ Features\n\n### Core File Processing\n\n- **Column Extraction**: Extract data from specific columns with optional deduplication\n- **CSV to Excel Conversion**: Convert CSV files to Excel format\n- **Data Normalization**: Apply various normalization strategies (uppercase, trim, phone format, etc.)\n- **Column Mapping**: Map and transform column names\n- **SQL Generation**: Generate SQL statements with auto-increment support\n- **JSON Generation**: Generate JSON from Excel with column mapping or custom templates\n- **Excel-to-Excel Binding**: Bind values from source to target files\n- **Search and Filter**: Advanced data filtering with multiple conditions and operators\n- **Operator Suggestions**: Automatic suggestions of valid search operators based on column types\n- **Standardized API Responses**: Consistent response format using Builder pattern\n\n### Chat Workflows Features\n\n- **Sequential Processing**: Chain multiple operations in a workflow\n- **Real-time Progress**: WebSocket streaming for step-by-step updates\n- **Conversation Management**: Create, list, retrieve, and delete conversations\n- **File Upload/Download**: Manage input and output files\n- **Auto-generated Names**: Unique participant names (BlueWhale-4821, OrionAsteroid-9923, etc.)\n- **Dump \u0026 Restore**: Complete conversation backup and recovery\n- **Partition System**: Time-based or hash-based file organization\n- **SQLite Integration**: Fast metadata queries and indexing\n\n### Chat Bot Features 🤖 (NEW!)\n\n- **Intent Classification**: Understands 8+ types of requests (extract, convert, normalize, SQL, JSON, search, bind, map)\n- **Natural Language Processing**: Keyword and pattern matching for intent detection\n- **Conversational State Management**: Tracks conversation flow (idle, awaiting_file, awaiting_confirmation, processing, etc.)\n- **Message Handler Chain**: Chain of Responsibility pattern for flexible message processing\n- **Streaming Workflow Execution**: Async/await with non-blocking background execution\n- **Interactive Workflow Modification**: Users can confirm, decline, or modify suggested workflows\n- **Special Commands**: Help, cancel, yes/no commands for easy navigation\n- **Context-Aware Responses**: Bot remembers conversation history and adapts responses\n- **Error Recovery**: Clear error messages with suggestions for recovery\n\n## 📁 Project Structure\n\n```\npycelize/\n├── app/\n│   ├── __init__.py              # Application factory\n│   ├── api/\n│   │   └── routes/              # API route definitions\n│   │       ├── health_routes.py\n│   │       ├── excel_routes.py\n│   │       ├── csv_routes.py\n│   │       ├── normalization_routes.py\n│   │       ├── sql_routes.py\n│   │       ├── json_routes.py\n│   │       ├── file_routes.py\n│   │       ├── chat_routes.py   # Chat Workflows APIs\n│   │       └── chatbot_routes.py # Chat Bot APIs\n│   ├── chat/                    # Chat Workflows \u0026 Bot components\n│   │   ├── __init__.py\n│   │   ├── models.py            # Conversation, Message, WorkflowStep models\n│   │   ├── database.py          # SQLite database management\n│   │   ├── storage.py           # File storage and partitioning\n│   │   ├── repository.py        # Repository pattern implementation\n│   │   ├── workflow_executor.py # Chain of Responsibility for execution\n│   │   ├── streaming_executor.py # Async workflow executor\n│   │   ├── websocket_server.py  # WebSocket server\n│   │   ├── websocket_bridge.py  # Thread-safe Flask ↔ WebSocket bridge\n│   │   ├── name_generator.py    # Participant name generation\n│   │   ├── intent_classifier.py # NLP intent classification\n│   │   ├── state_manager.py     # Conversation state management\n│   │   ├── message_handlers.py  # Message handler chain\n│   │   └── chatbot_service.py   # Chat bot orchestration service\n│   ├── core/\n│   │   ├── config.py            # Configuration management\n│   │   ├── exceptions.py        # Custom exceptions\n│   │   └── logging.py           # Logging setup\n│   ├── models/\n│   │   ├── enums.py             # Enumeration definitions\n│   │   ├── request.py           # Request models\n│   │   └── response.py          # Response models\n│   ├── services/\n│   │   ├── excel_service.py     # Excel operations\n│   │   ├── csv_service.py       # CSV operations\n│   │   ├── search_service.py    # Search and filter operations\n│   │   ├── normalization_service.py\n│   │   ├── sql_generation_service.py\n│   │   ├── json_generation_service.py\n│   │   └── binding_service.py\n│   ├── builders/\n│   │   ├── response_builder.py  # Builder pattern implementation\n│   │   └── sql_builder.py       # SQL statement builder\n│   ├── factories/\n│   │   ├── normalizer_factory.py # Factory pattern implementation\n│   │   └── service_factory.py\n│   ├── strategies/\n│   │   ├── base_strategy.py     # Strategy interface\n│   │   └── normalization_strategies.py\n│   └── utils/\n│       ├── file_utils.py\n│       ├── validators.py\n│       └── helpers.py\n├── automation/                  # Chat Workflows storage\n│   ├── workflows/               # Conversation files (partitioned)\n│   │   └── {partition_key}/     # e.g., 2026/02/\n│   │       └── {chat_id}/\n│   │           ├── uploads/\n│   │           ├── outputs/\n│   │           └── metadata.json\n│   ├── dumps/                   # Backup archives\n│   └── sqlite/                  # Database\n│       ├── chat.db\n│       └── snapshots/\n├── configs/\n│   └── application.yml          # Application configuration\n├── tests/\n│   ├── unit/\n│   ├── integration/\n│   │   └── chat_workflows/      # Chat workflow tests\n│   ├── test_excel_service.py\n│   ├── test_csv_service.py\n│   └── test_normalization.py\n├── uploads/                     # Uploaded files (auto-created)\n├── outputs/                     # Generated files (auto-created)\n├── logs/                        # Log files (auto-created)\n├── requirements.txt\n├── Makefile\n├── run.py                       # Application entry point\n└── README.md\n```\n\n---\n\n## 🔄 Chat Workflows\n\nChat Workflows provide a powerful conversational interface for sequential file processing with real-time progress tracking.\n\n## 2. Architecture\n\n### System Architecture\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│                     Client Applications                      │\n│  (Web Browser, Mobile App, API Clients)                    │\n└────────────┬─────────────────────────────┬─────────────────┘\n             │                              │\n             │ HTTP/REST                    │ WebSocket\n             │                              │\n┌────────────▼──────────────────────────────▼────────────────┐\n│                    Flask Application                         │\n│  ┌──────────────────────────────────────────────────────┐  │\n│  │              API Routes Layer                         │  │\n│  │  - Health   - Excel    - CSV    - Chat Workflows     │  │\n│  └─────────────────────┬────────────────────────────────┘  │\n│                        │                                     │\n│  ┌─────────────────────▼──────────────────────────────┐   │\n│  │           Service Layer                             │   │\n│  │  - Excel Service  - CSV Service                     │   │\n│  │  - Normalization  - SQL Generation                  │   │\n│  │  - JSON Generation - Search Service                 │   │\n│  └─────────────────────┬──────────────────────────────┘   │\n│                        │                                     │\n│  ┌─────────────────────▼──────────────────────────────┐   │\n│  │       Chat Workflows Components                      │   │\n│  │  ┌──────────────┐  ┌──────────────┐                │   │\n│  │  │  Repository  │  │   Executor   │                 │   │\n│  │  └──────┬───────┘  └──────┬───────┘                │   │\n│  │         │                  │                         │   │\n│  │  ┌──────▼───────┐  ┌──────▼───────┐                │   │\n│  │  │   Database   │  │   Storage    │                 │   │\n│  │  │  (SQLite)    │  │  (Files)     │                 │   │\n│  │  └──────────────┘  └──────────────┘                 │   │\n│  └──────────────────────────────────────────────────────┘   │\n└──────────────────────────────────────────────────────────────┘\n             │                              │\n             │                              │\n┌────────────▼──────────────┐  ┌───────────▼─────────────────┐\n│   SQLite Database         │  │   File Storage               │\n│   - Conversations         │  │   - Partition Structure      │\n│   - Messages              │  │   - Uploaded Files           │\n│   - Workflow Steps        │  │   - Output Files             │\n│   - File Metadata         │  │   - Dumps                    │\n└───────────────────────────┘  └──────────────────────────────┘\n```\n\n### Design Patterns\n\n1. **Repository Pattern**: Separates data access logic from business logic\n2. **Builder Pattern**: Constructs complex response objects\n3. **Factory Pattern**: Creates service instances\n4. **Strategy Pattern**: Implements different normalization strategies\n5. **Chain of Responsibility**: Handles workflow step execution\n\n### Technology Stack\n\n- **Backend**: Python 3.9+, Flask 2.3+\n- **Database**: SQLite (for metadata)\n- **Storage**: File system (partitioned)\n- **WebSocket**: websockets library\n- **Data Processing**: pandas, openpyxl\n- **Testing**: pytest\n\n---\n\n## 3. Workflow Lifecycle\n\n### Workflow States\n\nA conversation progresses through the following states:\n\n```\ncreated → processing → completed\n    ↓          ↓           ↓\n    └─────→ failed ←───────┘\n```\n\n### Lifecycle Steps\n\n1. **Create Conversation**\n   - Generate unique `chat_id`\n   - Assign participant name\n   - Create partition directory\n   - Initialize in SQLite database\n   - Status: `created`\n\n2. **Upload Files**\n   - Save files to `uploads/` folder\n   - Record file paths in database\n   - Generate download URLs\n   - Status remains: `created`\n\n3. **Execute Workflow**\n   - Parse workflow steps\n   - Execute sequentially with input/output chaining\n   - Stream progress via WebSocket\n   - Save outputs to `outputs/` folder\n   - Status: `processing` → `completed` or `failed`\n\n4. **Download Results**\n   - Access files via download URLs\n   - Both uploaded and output files available\n   - Partition structure preserved\n\n5. **Dump Conversation (Optional)**\n   - Create tar.gz archive\n   - Include all files and metadata\n   - Generate download link\n   - Store in dumps directory\n\n6. **Restore Conversation (Optional)**\n   - Extract from tar.gz archive\n   - Restore to correct partition path\n   - Recreate database entries\n   - Preserve all metadata\n\n7. **Delete Conversation (Optional)**\n   - Remove all files from disk\n   - Delete database entries\n   - Cascade delete related data\n\n---\n\n## 4. Dump and Restore System\n\n### Dump Process\n\n**Purpose**: Create a complete backup of a conversation including all files and metadata.\n\n**Process**:\n\n1. Retrieve conversation from database\n2. Create tar.gz archive of conversation directory\n3. Include partition structure in archive\n4. Save metadata as JSON alongside archive\n5. Return download URL\n\n**API Endpoint**:\n\n```bash\nPOST /api/v1/chat/workflows/{chat_id}/dump\n```\n\n**Response**:\n\n```json\n{\n  \"data\": {\n    \"dump_file\": \"chat-id_20260207_141939.tar.gz\",\n    \"download_url\": \"http://localhost:5050/api/v1/chat/downloads/chat-id_20260207_141939.tar.gz\"\n  },\n  \"message\": \"Conversation dumped successfully\"\n}\n```\n\n**Archive Structure**:\n\n```\nchat-id_timestamp.tar.gz\n└── {chat_id}/\n    ├── uploads/\n    │   └── file1.xlsx\n    ├── outputs/\n    │   └── result1.xlsx\n    └── metadata.json\n```\n\n### Restore Process\n\n**Purpose**: Restore a conversation from a backup dump file.\n\n**Process**:\n\n1. Upload dump file via multipart form\n2. Extract to temporary directory\n3. Read `metadata.json` to get `partition_key`\n4. Move files to correct partition path: `{base_path}/{partition_key}/{chat_id}`\n5. Recreate database entries\n6. Return restored conversation details\n\n**API Endpoint**:\n\n```bash\nPOST /api/v1/chat/workflows/restore\nContent-Type: multipart/form-data\n\ndump_file: @path/to/dump.tar.gz\n```\n\n**Response**:\n\n```json\n{\n  \"data\": {\n    \"chat_id\": \"...\",\n    \"partition_key\": \"2026/02\",\n    \"status\": \"completed\",\n    \"uploaded_files\": [...],\n    \"output_files\": [...]\n  },\n  \"message\": \"Conversation restored successfully\"\n}\n```\n\n### Important Notes\n\n✅ **Recent Fix**: Restore now correctly places files in partitioned directories\n\n- Old behavior: Files extracted to `./automation/workflows/{chat_id}` (flat)\n- New behavior: Files extracted to `./automation/workflows/{partition_key}/{chat_id}` (partitioned)\n\n✅ **Recent Fix**: Dump file paths now correctly resolved\n\n- Uses `os.path.abspath()` for consistent path resolution\n- Downloads work correctly\n\n---\n\n## 5. File Storage Structure\n\n### Directory Layout\n\n```\nautomation/\n├── workflows/              # Conversation files (partitioned)\n│   └── {partition_key}/    # e.g., 2026/02/\n│       └── {chat_id}/\n│           ├── uploads/    # Uploaded files\n│           ├── outputs/    # Workflow outputs\n│           └── metadata.json\n├── dumps/                  # Backup archives\n│   └── {chat_id}_{timestamp}.tar.gz\n└── sqlite/                 # Database\n    ├── chat.db\n    └── snapshots/          # DB backups\n        └── chat_backup_{timestamp}.db\n```\n\n### File Types\n\n#### Uploaded Files\n\n- Location: `{base_path}/{partition_key}/{chat_id}/uploads/`\n- Naming: Original filename preserved\n- Purpose: Input files for workflow processing\n\n#### Output Files\n\n- Location: `{base_path}/{partition_key}/{chat_id}/outputs/`\n- Naming: `{original}_{operation}_{timestamp}.{ext}`\n- Purpose: Results from workflow step execution\n\n#### Dump Files\n\n- Location: `{base_path}/dumps/`\n- Naming: `{chat_id}_{timestamp}.tar.gz`\n- Purpose: Complete conversation backup\n\n### File Management\n\n- **Upload**: Files saved to uploads folder, recorded in database\n- **Processing**: Outputs saved to outputs folder, recorded in database\n- **Download**: Files accessible via absolute URLs\n- **Cleanup**: Files deleted on conversation deletion (cascade)\n\n---\n\n## 6. Partition System\n\n### Purpose\n\nPartitioning organizes conversations into hierarchical directories for:\n\n- **Performance**: Faster file system operations\n- **Scalability**: Handle millions of conversations\n- **Organization**: Logical grouping by time or hash\n- **Backup**: Easier partial backups\n\n### Partition Strategies\n\n#### 1. Time-Based (Default)\n\nPartitions by year and month:\n\n```\nautomation/workflows/\n├── 2026/\n│   ├── 01/\n│   │   ├── chat-id-1/\n│   │   └── chat-id-2/\n│   └── 02/\n│       └── chat-id-3/\n└── 2027/\n    └── 01/\n        └── chat-id-4/\n```\n\n**Format**: `YYYY/MM`  \n**Best for**: Time-series analysis, retention policies\n\n#### 2. Hash-Based\n\nPartitions by chat_id hash:\n\n```\nautomation/workflows/\n├── ab/\n│   ├── cd/\n│   │   └── abcd1234-5678-90ab-cdef-123456789012/\n│   └── ef/\n│       └── abef5678-1234-56cd-ef90-abcdef123456/\n└── 12/\n    └── 34/\n        └── 12345678-abcd-ef12-3456-789012345678/\n```\n\n**Format**: `{first_2_chars}/{next_2_chars}`  \n**Best for**: Even distribution, high volume\n\n### Configuration\n\n```yaml\nchat_workflows:\n  partition:\n    enabled: true\n    strategy: \"time-based\" # or \"hash-based\"\n```\n\n### Partition Key Generation\n\n```python\n# Time-based\npartition_key = created_at.strftime(\"%Y/%m\")  # \"2026/02\"\n\n# Hash-based\npartition_key = f\"{chat_id[:2]}/{chat_id[2:4]}\"  # \"ab/cd\"\n```\n\n---\n\n---\n\n## 🚀 Installation\n\n### Prerequisites\n\n- Python 3.9 or higher\n- pip (Python package manager)\n\n### Steps\n\n1. **Clone the repository**\n\n```bash\ngit clone https://github.com/yourusername/pycelize.git\ncd pycelize\n```\n\n2. **Create a virtual environment** (recommended)\n\n```bash\npython3 -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\n```\n\n3. **Install dependencies**\n\n```bash\nmake install\n# or\npip install -r requirements.txt\n```\n\n4. **Run the application**\n\n```bash\nmake run\n# or\npython run.py\n```\n\nThe application will start at `http://localhost:5050`\n\n## ⚙️ Configuration\n\nConfiguration is managed through `configs/application.yml`:\n\n```yaml\napp:\n  name: \"Pycelize\"\n  version: \"v0.0.1\"\n  environment: \"development\"\n  debug: true\n  host: \"0.0.0.0\"\n  port: 5050\n\napi:\n  version: \"v1\"\n  prefix: \"/api/v1\"\n  locale: \"en_US\"\n\nfile:\n  upload_folder: \"uploads\"\n  output_folder: \"outputs\"\n  allowed_extensions:\n    - \".csv\"\n    - \".xlsx\"\n    - \".xls\"\n  max_file_size_mb: 50\n\nexcel:\n  default_sheet_name: \"Sheet1\"\n  max_column_width: 50\n  include_info_sheet: true\n\nsql:\n  supported_databases:\n    - \"postgresql\"\n    - \"mysql\"\n    - \"sqlite\"\n  default_database: \"postgresql\"\n  default_batch_size: 1000\n\nnormalization:\n  enabled: true\n  backup_original: false\n  generate_report: true\n\nlogging:\n  level: \"INFO\"\n  file: \"logs/pycelize.log\"\n```\n\n## 📚 API Documentation\n\n### Base URL\n\n```\nhttp://localhost:5050/api/v1\n```\n\n### Response Format\n\nAll API responses follow this structure:\n\n```json\n{\n  \"data\": { ... },\n  \"message\": \"Success message\",\n  \"meta\": {\n    \"api_version\": \"v0.0.1\",\n    \"locale\": \"en_US\",\n    \"request_id\": \"unique-request-id\",\n    \"requested_time\": \"2024-01-29T10:00:00+00:00\"\n  },\n  \"status_code\": 200,\n  \"total\": 0\n}\n```\n\n### Endpoints\n\n#### Health Check\n\n| Method | Endpoint        | Description     |\n| ------ | --------------- | --------------- |\n| GET    | `/health`       | Health check    |\n| GET    | `/health/ready` | Readiness check |\n\n#### Excel Operations\n\n| Method | Endpoint                          | Description                                    |\n| ------ | --------------------------------- | ---------------------------------------------- |\n| POST   | `/excel/info`                     | Get Excel file information                     |\n| POST   | `/excel/extract-columns`          | Extract column data (returns JSON)             |\n| POST   | `/excel/extract-columns-to-file`  | Extract columns and save to Excel file         |\n| POST   | `/excel/map-columns`              | Apply column mapping                           |\n| POST   | `/excel/bind-single-key`          | Bind columns using single comparison column    |\n| POST   | `/excel/bind-multi-key`           | Bind columns using multiple comparison columns |\n| POST   | `/excel/search`                   | Search and filter Excel data with conditions   |\n| POST   | `/excel/search/suggest-operators` | Get suggested search operators for each column |\n\n#### CSV Operations\n\n| Method | Endpoint                        | Description                                    |\n| ------ | ------------------------------- | ---------------------------------------------- |\n| POST   | `/csv/info`                     | Get CSV file information                       |\n| POST   | `/csv/convert-to-excel`         | Convert CSV to Excel                           |\n| POST   | `/csv/search`                   | Search and filter CSV data with conditions     |\n| POST   | `/csv/search/suggest-operators` | Get suggested search operators for each column |\n\n#### Normalization\n\n| Method | Endpoint               | Description              |\n| ------ | ---------------------- | ------------------------ |\n| GET    | `/normalization/types` | List normalization types |\n| POST   | `/normalization/apply` | Apply normalization      |\n\n#### SQL Generation\n\n| Method | Endpoint                       | Description                                        |\n| ------ | ------------------------------ | -------------------------------------------------- |\n| GET    | `/sql/databases`               | List supported databases                           |\n| POST   | `/sql/generate`                | Generate SQL statements (returns JSON or SQL file) |\n| POST   | `/sql/generate-to-text`        | Generate SQL from extracted columns to text file   |\n| POST   | `/sql/generate-custom-to-text` | Generate SQL using custom template to text file    |\n\n#### JSON Generation\n\n| Method | Endpoint                       | Description                                  |\n| ------ | ------------------------------ | -------------------------------------------- |\n| POST   | `/json/generate`               | Generate JSON from Excel with column mapping |\n| POST   | `/json/generate-with-template` | Generate JSON using custom template          |\n\n#### File Operations\n\n| Method | Endpoint                      | Description                |\n| ------ | ----------------------------- | -------------------------- |\n| GET    | `/files/downloads/\u003cfilename\u003e` | Download generated files   |\n| POST   | `/files/bind`                 | Bind source to target file |\n| POST   | `/files/bind/preview`         | Preview binding operation  |\n\n## 🔧 Usage Examples (cURL)\n\n### 1. Health Check\n\n```bash\ncurl http://localhost:5050/api/v1/health\n```\n\n### 2. Get Excel File Information\n\n```bash\ncurl -X POST \\\n  -F \"file=@data.xlsx\" \\\n  http://localhost:5050/api/v1/excel/info\n```\n\n### 3. Extract Columns from Excel\n\n```bash\ncurl -X POST \\\n  -F \"file=@data.xlsx\" \\\n  -F 'columns=[\"name\", \"email\", \"phone\"]' \\\n  -F \"remove_duplicates=true\" \\\n  http://localhost:5050/api/v1/excel/extract-columns\n```\n\n### 4. Apply Column Mapping\n\n```bash\ncurl -X POST \\\n  -F \"file=@data.xlsx\" \\\n  -F 'mapping={\n    \"Customer Name\": \"name\",\n    \"Email Address\": {\"source\": \"email\", \"default\": \"N/A\"},\n    \"Status\": {\"default\": \"Active\"}\n  }' \\\n  http://localhost:5050/api/v1/excel/map-columns \\\n  --output mapped_data.xlsx\n```\n\n### 5. Convert CSV to Excel\n\n```bash\ncurl -X POST \\\n  -F \"file=@data.csv\" \\\n  -F \"sheet_name=MyData\" \\\n  http://localhost:5050/api/v1/csv/convert-to-excel \\\n  --output converted.xlsx\n```\n\n### 6. Get Normalization Types\n\n```bash\ncurl http://localhost:5050/api/v1/normalization/types\n```\n\n### 7. Apply Normalization\n\n```bash\ncurl -X POST \\\n  -F \"file=@data.xlsx\" \\\n  -F 'normalizations=[\n    {\"column_name\": \"name\", \"normalization_type\": \"trim_whitespace\"},\n    {\"column_name\": \"name\", \"normalization_type\": \"title_case\"},\n    {\"column_name\": \"email\", \"normalization_type\": \"lowercase\"},\n    {\"column_name\": \"phone\", \"normalization_type\": \"phone_format\"}\n  ]' \\\n  http://localhost:5050/api/v1/normalization/apply \\\n  --output normalized.xlsx\n```\n\n### 8. Generate SQL Statements\n\n```bash\ncurl -X POST \\\n  -F \"file=@data.xlsx\" \\\n  -F \"table_name=customers\" \\\n  -F 'column_mapping={\n    \"name\": \"Customer Name\",\n    \"email\": \"Email\",\n    \"phone\": \"Phone\"\n  }' \\\n  -F \"database_type=postgresql\" \\\n  -F 'auto_increment={\n    \"enabled\": true,\n    \"column_name\": \"id\",\n    \"increment_type\": \"postgresql_serial\",\n    \"start_value\": 1\n  }' \\\n  -F \"return_file=true\" \\\n  http://localhost:5050/api/v1/sql/generate \\\n  --output insert_customers.sql\n```\n\n### 9. Generate SQL with Custom Template\n\n```bash\ncurl -X POST \\\n  -F \"file=@data.xlsx\" \\\n  -F \"table_name=users\" \\\n  -F 'column_mapping={\"name\": \"Name\", \"email\": \"Email\"}' \\\n  -F \"database_type=postgresql\" \\\n  -F 'template=INSERT INTO users (id, name, email, created_at) VALUES ({auto_id}, {name}, {email}, {current_timestamp});' \\\n  -F 'auto_increment={\"enabled\": true, \"column_name\": \"id\", \"increment_type\": \"manual_sequence\", \"sequence_name\": \"users_id_seq\", \"start_value\": 100}' \\\n  http://localhost:5050/api/v1/sql/generate\n```\n\n### 10. Generate JSON from Excel (Standard Mapping)\n\n```bash\ncurl -X POST \\\n  -F \"file=@data.xlsx\" \\\n  -F 'column_mapping={\"Name\": \"full_name\", \"Email\": \"email\", \"Age\": \"age\"}' \\\n  -F \"pretty_print=true\" \\\n  -F \"null_handling=exclude\" \\\n  -F \"array_wrapper=true\" \\\n  http://localhost:5050/api/v1/json/generate\n```\n\n**Response:**\n\n```json\n{\n  \"data\": {\n    \"download_url\": \"/api/v1/files/downloads/data_generated_20260130_101437.json\",\n    \"total_records\": 150,\n    \"file_size\": 45632\n  },\n  \"message\": \"JSON file generated successfully\",\n  \"status_code\": 200\n}\n```\n\n**Generated JSON (data_generated_20260130_101437.json):**\n\n```json\n[\n  {\n    \"full_name\": \"Alice\",\n    \"email\": \"alice@example.com\",\n    \"age\": 25\n  },\n  {\n    \"full_name\": \"Bob\",\n    \"email\": \"bob@example.com\",\n    \"age\": 30\n  }\n]\n```\n\n### 11. Generate JSON with Custom Template\n\n```bash\ncurl -X POST \\\n  -F \"file=@users.xlsx\" \\\n  -F 'template={\"user\":{\"id\":\"{user_id}\",\"name\":\"{first_name} {last_name}\"},\"contact\":{\"email\":\"{email}\"}}' \\\n  -F 'column_mapping={\"user_id\":\"UserID\",\"first_name\":\"FirstName\",\"last_name\":\"LastName\",\"email\":\"Email\"}' \\\n  -F \"aggregation_mode=array\" \\\n  -F \"pretty_print=true\" \\\n  http://localhost:5050/api/v1/json/generate-with-template\n```\n\n**Generated JSON:**\n\n```json\n[\n  {\n    \"user\": {\n      \"id\": \"1\",\n      \"name\": \"Alice Smith\"\n    },\n    \"contact\": {\n      \"email\": \"alice@example.com\"\n    }\n  },\n  {\n    \"user\": {\n      \"id\": \"2\",\n      \"name\": \"Bob Jones\"\n    },\n    \"contact\": {\n      \"email\": \"bob@example.com\"\n    }\n  }\n]\n```\n\n### 12. Bind Excel Files\n\n```bash\ncurl -X POST \\\n  -F \"source_file=@source_data.xlsx\" \\\n  -F \"target_file=@template.xlsx\" \\\n  -F 'column_mapping={\n    \"Target_Name\": \"Source_Name\",\n    \"Target_Email\": \"Source_Email\",\n    \"Target_Phone\": \"Source_Phone\"\n  }' \\\n  http://localhost:5050/api/v1/files/bind \\\n  --output bound_result.xlsx\n```\n\n### 13. Preview Binding Operation\n\n```bash\ncurl -X POST \\\n  -F \"source_file=@source.xlsx\" \\\n  -F \"target_file=@target.xlsx\" \\\n  -F 'column_mapping={\"Target_Col\": \"Source_Col\"}' \\\n  http://localhost:5050/api/v1/files/bind/preview\n```\n\n### 14. Extract Columns to Excel File (New Feature)\n\nExtract specific columns from an Excel file and save the result to a new Excel file. Returns a download URL for the generated file.\n\n**Description:** This endpoint extracts specified columns from an uploaded Excel file and creates a new Excel file containing only those columns. The extracted data can optionally have duplicates removed. The response includes a download URL in the standardized format.\n\n**Request Parameters:**\n\n- `file`: Excel file to extract columns from\n- `columns`: JSON array of column names to extract\n- `remove_duplicates`: Optional boolean to remove duplicate rows (default: false)\n\n**Response Example:**\n\n```json\n{\n  \"data\": {\n    \"download_url\": \"http://127.0.0.1:5050/api/v1/files/downloads/extracted_columns_20260129_120000.xlsx\"\n  },\n  \"message\": \"Extracted Excel file generated successfully\",\n  \"status_code\": 200\n}\n```\n\n**cURL Example:**\n\n```bash\ncurl -X POST \\\n  -F \"file=@data.xlsx\" \\\n  -F 'columns=[\"name\", \"email\", \"phone\"]' \\\n  -F \"remove_duplicates=true\" \\\n  http://localhost:5050/api/v1/excel/extract-columns-to-file\n```\n\n### 13. Generate SQL to Text File - Standard (New Feature)\n\nGenerate SQL INSERT statements from Excel data and save to a text file. This endpoint supports column extraction, column mapping, and auto-increment primary key generation.\n\n**Description:** This endpoint reads an Excel file, optionally extracts specific columns, applies column mapping, and generates SQL INSERT statements with optional auto-increment ID support. The generated SQL is saved to a `.txt` file and a download URL is returned.\n\n**Request Parameters:**\n\n- `file`: Excel file with source data\n- `columns`: Optional JSON array of column names to extract\n- `table_name`: Target database table name\n- `column_mapping`: JSON object mapping SQL column names to Excel column names\n- `database_type`: Database type (postgresql, mysql, sqlite) - default: postgresql\n- `auto_increment`: Optional JSON object for auto-increment configuration\n- `remove_duplicates`: Optional boolean to remove duplicate rows (default: false)\n\n**Auto-increment Configuration:**\n\n```json\n{\n  \"enabled\": true,\n  \"column_name\": \"id\",\n  \"increment_type\": \"postgresql_serial\",\n  \"start_value\": 1\n}\n```\n\n**Response Example:**\n\n```json\n{\n  \"data\": {\n    \"download_url\": \"http://127.0.0.1:5050/api/v1/files/downloads/sql_statements_20260129_120000.txt\"\n  },\n  \"message\": \"SQL text file generated successfully\",\n  \"status_code\": 200\n}\n```\n\n**cURL Example:**\n\n```bash\ncurl -X POST \\\n  -F \"file=@data.xlsx\" \\\n  -F 'columns=[\"Name\", \"Email\", \"Phone\"]' \\\n  -F \"table_name=customers\" \\\n  -F 'column_mapping={\"name\": \"Name\", \"email\": \"Email\", \"phone\": \"Phone\"}' \\\n  -F \"database_type=postgresql\" \\\n  -F 'auto_increment={\"enabled\": true, \"column_name\": \"id\", \"start_value\": 1}' \\\n  -F \"remove_duplicates=false\" \\\n  http://localhost:5050/api/v1/sql/generate-to-text\n```\n\n**Generated SQL Example:**\n\n```sql\n-- Generated by Pycelize\n-- Generated at: 2026-01-29T15:03:39.969444\n-- Total statements: 7\n\nBEGIN;\nINSERT INTO customers (name, email, phone) VALUES ('John Doe', 'john@example.com', '555-1234');\nINSERT INTO customers (name, email, phone) VALUES ('Jane Smith', 'jane@example.com', '555-5678');\nCOMMIT;\n```\n\n### 14. Generate SQL to Text File - Custom Template (New Feature)\n\nGenerate SQL statements using a custom template with placeholder substitution and save to a text file.\n\n**Description:** This endpoint provides full flexibility for SQL generation by accepting a custom SQL template string with placeholders. Placeholders are substituted with actual values from the Excel data. Supports auto-increment IDs and timestamp placeholders.\n\n**Request Parameters:**\n\n- `file`: Excel file with source data\n- `columns`: Optional JSON array of column names to extract\n- `template`: Custom SQL template string with placeholders\n- `column_mapping`: JSON object mapping placeholder names to Excel columns\n- `auto_increment`: Optional JSON object for auto-increment configuration\n- `remove_duplicates`: Optional boolean to remove duplicate rows (default: false)\n\n**Template Placeholders:**\n\n- `{placeholder_name}`: Replaced with value from mapped column\n- `{auto_id}`: Auto-incremented ID value (if auto_increment is enabled)\n- `{current_timestamp}`: Replaced with CURRENT_TIMESTAMP\n\n**Response Example:**\n\n```json\n{\n  \"data\": {\n    \"download_url\": \"http://127.0.0.1:5050/api/v1/files/downloads/custom_sql_20260129_120000.txt\"\n  },\n  \"message\": \"Custom SQL text file generated successfully\",\n  \"status_code\": 200\n}\n```\n\n**cURL Example:**\n\n```bash\ncurl -X POST \\\n  -F \"file=@data.xlsx\" \\\n  -F 'columns=[\"Name\", \"Email\"]' \\\n  -F 'template=INSERT INTO users (id, name, email, created_at) VALUES ({auto_id}, {name}, {email}, {current_timestamp});' \\\n  -F 'column_mapping={\"name\": \"Name\", \"email\": \"Email\"}' \\\n  -F 'auto_increment={\"enabled\": true, \"column_name\": \"id\", \"start_value\": 100}' \\\n  http://localhost:5050/api/v1/sql/generate-custom-to-text\n```\n\n**Generated SQL Example:**\n\n```sql\n-- Generated by Pycelize\n-- Generated at: 2026-01-29T15:04:06.412720\n-- Total statements: 3\n\nINSERT INTO users (id, name, email, created_at) VALUES (100, 'John Doe', 'john@example.com', CURRENT_TIMESTAMP);\nINSERT INTO users (id, name, email, created_at) VALUES (101, 'Jane Smith', 'jane@example.com', CURRENT_TIMESTAMP);\nINSERT INTO users (id, name, email, created_at) VALUES (102, 'Bob Wilson', 'bob@example.com', CURRENT_TIMESTAMP);\n```\n\n### 15. Download Generated Files\n\nDownload files that were generated by other endpoints (extracted Excel files, SQL text files, etc.).\n\n**Description:** This endpoint serves generated files from the outputs folder. Files are automatically cleaned up after a certain period. The filename should match the one provided in the download URL from other endpoints.\n\n**cURL Example:**\n\n```bash\n# Download extracted Excel file\ncurl http://localhost:5050/api/v1/files/downloads/extracted_columns_20260129_120000.xlsx \\\n  --output result.xlsx\n\n# Download SQL text file\ncurl http://localhost:5050/api/v1/files/downloads/sql_statements_20260129_120000.txt \\\n  --output inserts.sql\n```\n\n## 🔍 Search and Filter APIs\n\nThe Search and Filter APIs provide powerful data filtering capabilities for Excel and CSV files. You can apply multiple search conditions across different columns with various operators, and export the filtered results in different formats.\n\n### API 1: Search and Filter Data\n\nSearch and filter Excel or CSV files based on multiple conditions with support for different operators and logical combinations.\n\n#### Excel Search Endpoint\n\n**Endpoint:** `POST /api/v1/excel/search`\n\n**Description:** Filter Excel file data using multiple conditions across columns. Supports various operators (equals, contains, greater_than, etc.) and logical combinations (AND/OR). Results can be exported as Excel, CSV, or JSON.\n\n**Request Parameters:**\n\n- `file`: Excel file (required)\n- `conditions`: JSON array of search conditions (required)\n  - Each condition contains: `column`, `operator`, `value`\n- `logic`: Logical operator between conditions - \"AND\" or \"OR\" (default: \"AND\")\n- `output_format`: Output file format - \"xlsx\", \"csv\", or \"json\" (default: \"xlsx\")\n- `output_filename`: Optional custom output filename\n\n**Supported Operators:**\n\n**String Operators:**\n\n- `equals` - Exact match\n- `not_equals` - Not equal to\n- `contains` - Contains substring (case-insensitive)\n- `not_contains` - Does not contain substring\n- `starts_with` - Starts with string\n- `ends_with` - Ends with string\n- `is_empty` - Field is empty or null\n- `is_not_empty` - Field is not empty\n\n**Numeric Operators:**\n\n- `equals` - Equal to number\n- `not_equals` - Not equal to number\n- `greater_than` - Greater than\n- `greater_than_or_equal` - Greater than or equal\n- `less_than` - Less than\n- `less_than_or_equal` - Less than or equal\n- `between` - Between two values (value must be [min, max])\n\n**Date Operators:**\n\n- `equals` - Exact date match\n- `before` - Before date\n- `after` - After date\n- `between` - Between two dates\n\n**Response Example:**\n\n```json\n{\n  \"data\": {\n    \"download_url\": \"http://localhost:5050/api/v1/files/downloads/search_results_20260206_120000.xlsx\",\n    \"total_rows\": 1000,\n    \"filtered_rows\": 45,\n    \"conditions_applied\": 2\n  },\n  \"message\": \"Search completed successfully. 45 rows matched.\",\n  \"meta\": {\n    \"api_version\": \"v0.0.1\",\n    \"locale\": \"en_US\",\n    \"request_id\": \"abc123...\",\n    \"requested_time\": \"2026-02-06T12:00:00+00:00\"\n  },\n  \"status_code\": 200,\n  \"total\": 0\n}\n```\n\n**cURL Examples:**\n\n**Simple equals search:**\n\n```bash\ncurl -X POST \\\n  -F \"file=@data.xlsx\" \\\n  -F 'conditions=[{\"column\": \"customer_id\", \"operator\": \"equals\", \"value\": \"021201\"}]' \\\n  -F \"logic=AND\" \\\n  -F \"output_format=xlsx\" \\\n  http://localhost:5050/api/v1/excel/search\n```\n\n**Multiple conditions with AND logic:**\n\n```bash\ncurl -X POST \\\n  -F \"file=@sales_data.xlsx\" \\\n  -F 'conditions=[\n    {\"column\": \"status\", \"operator\": \"equals\", \"value\": \"active\"},\n    {\"column\": \"amount\", \"operator\": \"greater_than\", \"value\": 1000}\n  ]' \\\n  -F \"logic=AND\" \\\n  -F \"output_format=csv\" \\\n  http://localhost:5050/api/v1/excel/search\n```\n\n**Search with OR logic:**\n\n```bash\ncurl -X POST \\\n  -F \"file=@customers.xlsx\" \\\n  -F 'conditions=[\n    {\"column\": \"region\", \"operator\": \"equals\", \"value\": \"North\"},\n    {\"column\": \"region\", \"operator\": \"equals\", \"value\": \"South\"}\n  ]' \\\n  -F \"logic=OR\" \\\n  -F \"output_format=json\" \\\n  http://localhost:5050/api/v1/excel/search\n```\n\n**Search with contains operator:**\n\n```bash\ncurl -X POST \\\n  -F \"file=@products.xlsx\" \\\n  -F 'conditions=[{\"column\": \"name\", \"operator\": \"contains\", \"value\": \"phone\"}]' \\\n  -F \"logic=AND\" \\\n  http://localhost:5050/api/v1/excel/search\n```\n\n**Search with between operator:**\n\n```bash\ncurl -X POST \\\n  -F \"file=@transactions.xlsx\" \\\n  -F 'conditions=[{\"column\": \"amount\", \"operator\": \"between\", \"value\": [100, 500]}]' \\\n  -F \"logic=AND\" \\\n  http://localhost:5050/api/v1/excel/search\n```\n\n#### CSV Search Endpoint\n\n**Endpoint:** `POST /api/v1/csv/search`\n\n**Description:** Same functionality as Excel search but for CSV files. All parameters, operators, and response format are identical.\n\n**cURL Example:**\n\n```bash\ncurl -X POST \\\n  -F \"file=@data.csv\" \\\n  -F 'conditions=[\n    {\"column\": \"email\", \"operator\": \"contains\", \"value\": \"@gmail.com\"},\n    {\"column\": \"age\", \"operator\": \"greater_than\", \"value\": 25}\n  ]' \\\n  -F \"logic=AND\" \\\n  -F \"output_format=csv\" \\\n  http://localhost:5050/api/v1/csv/search\n```\n\n### API 2: Suggest Search Operators\n\nGet suggested search operators for each column in your file based on the column's data type. This helps you understand which operators are valid for each column.\n\n#### Excel Suggest Operators Endpoint\n\n**Endpoint:** `POST /api/v1/excel/search/suggest-operators`\n\n**Description:** Analyzes an Excel file and suggests valid search operators for each column based on the detected data type.\n\n**Request Parameters:**\n\n- `file`: Excel file (required)\n\n**Response Example:**\n\n```json\n{\n  \"data\": {\n    \"customer_id\": {\n      \"type\": \"object\",\n      \"operators\": [\n        \"equals\",\n        \"not_equals\",\n        \"contains\",\n        \"not_contains\",\n        \"starts_with\",\n        \"ends_with\",\n        \"is_empty\",\n        \"is_not_empty\"\n      ]\n    },\n    \"amount\": {\n      \"type\": \"float64\",\n      \"operators\": [\n        \"equals\",\n        \"not_equals\",\n        \"greater_than\",\n        \"greater_than_or_equal\",\n        \"less_than\",\n        \"less_than_or_equal\",\n        \"between\"\n      ]\n    },\n    \"created_at\": {\n      \"type\": \"object\",\n      \"operators\": [\n        \"equals\",\n        \"not_equals\",\n        \"contains\",\n        \"not_contains\",\n        \"starts_with\",\n        \"ends_with\",\n        \"is_empty\",\n        \"is_not_empty\"\n      ]\n    },\n    \"is_active\": {\n      \"type\": \"object\",\n      \"operators\": [\n        \"equals\",\n        \"not_equals\",\n        \"contains\",\n        \"not_contains\",\n        \"starts_with\",\n        \"ends_with\",\n        \"is_empty\",\n        \"is_not_empty\"\n      ]\n    }\n  },\n  \"message\": \"Operator suggestions generated successfully\",\n  \"meta\": {\n    \"api_version\": \"v0.0.1\",\n    \"locale\": \"en_US\",\n    \"request_id\": \"xyz789...\",\n    \"requested_time\": \"2026-02-06T12:00:00+00:00\"\n  },\n  \"status_code\": 200,\n  \"total\": 0\n}\n```\n\n**cURL Example:**\n\n```bash\ncurl -X POST \\\n  -F \"file=@data.xlsx\" \\\n  http://localhost:5050/api/v1/excel/search/suggest-operators\n```\n\n#### CSV Suggest Operators Endpoint\n\n**Endpoint:** `POST /api/v1/csv/search/suggest-operators`\n\n**Description:** Same functionality as Excel suggest operators but for CSV files.\n\n**cURL Example:**\n\n```bash\ncurl -X POST \\\n  -F \"file=@data.csv\" \\\n  http://localhost:5050/api/v1/csv/search/suggest-operators\n```\n\n### Search API Use Cases\n\n**1. Filter Active Customers with High Value:**\n\n```bash\ncurl -X POST \\\n  -F \"file=@customers.xlsx\" \\\n  -F 'conditions=[\n    {\"column\": \"status\", \"operator\": \"equals\", \"value\": \"active\"},\n    {\"column\": \"lifetime_value\", \"operator\": \"greater_than\", \"value\": 10000}\n  ]' \\\n  -F \"logic=AND\" \\\n  http://localhost:5050/api/v1/excel/search\n```\n\n**2. Find All Gmail Users:**\n\n```bash\ncurl -X POST \\\n  -F \"file=@users.csv\" \\\n  -F 'conditions=[{\"column\": \"email\", \"operator\": \"ends_with\", \"value\": \"@gmail.com\"}]' \\\n  http://localhost:5050/api/v1/csv/search\n```\n\n**3. Filter Products in Price Range:**\n\n```bash\ncurl -X POST \\\n  -F \"file=@products.xlsx\" \\\n  -F 'conditions=[{\"column\": \"price\", \"operator\": \"between\", \"value\": [50, 150]}]' \\\n  -F \"output_format=json\" \\\n  http://localhost:5050/api/v1/excel/search\n```\n\n**4. Find Records with Empty Fields:**\n\n```bash\ncurl -X POST \\\n  -F \"file=@data.xlsx\" \\\n  -F 'conditions=[{\"column\": \"phone\", \"operator\": \"is_empty\", \"value\": null}]' \\\n  http://localhost:5050/api/v1/excel/search\n```\n\n## 📊 Excel Binding APIs\n\n### Overview\n\nThe Excel Binding APIs provide advanced column binding capabilities that allow you to merge data from two Excel files based on matching column values. This is useful for enriching data with reference information, adding lookup values, or combining related datasets.\n\n### Use Cases\n\n- **Data Enrichment**: Add customer details (email, phone) to transaction records by matching customer IDs\n- **Reference Lookups**: Append product information to order data using product codes\n- **Data Integration**: Merge employee data from multiple sources using composite keys (first name + last name)\n\n### 16. Bind Excel Files - Single Key\n\nBind columns from a bind file to a source file using a single comparison column. This performs a left join operation, preserving all rows from the source file.\n\n**Description:**\n\n- Takes two Excel files: source file (File A) and bind file (File B)\n- Matches rows based on a single comparison column\n- Appends specified columns from File B to File A\n- Preserves all original columns and data in File A\n- Returns NaN for rows without matches\n\n**Request Parameters:**\n\n- `source_file`: Excel file to be extended with new columns (File A)\n- `bind_file`: Excel file containing data to bind (File B)\n- `comparison_column`: Column name used to match rows between files\n- `bind_columns`: JSON array of column names to append from File B\n- `output_filename`: Optional custom output filename\n\n**Response Example:**\n\n```json\n{\n  \"data\": {\n    \"download_url\": \"http://localhost:5050/api/v1/files/downloads/source_data_bound_single_20260129_143022.xlsx\"\n  },\n  \"message\": \"Excel binding completed successfully\",\n  \"meta\": {\n    \"api_version\": \"v0.0.1\",\n    \"locale\": \"en_US\",\n    \"request_id\": \"abc123...\",\n    \"requested_time\": \"2026-01-29T14:30:22+00:00\"\n  },\n  \"status_code\": 200,\n  \"total\": 0\n}\n```\n\n**cURL Example:**\n\n```bash\ncurl -X POST \\\n  -F \"source_file=@source.xlsx\" \\\n  -F \"bind_file=@bind.xlsx\" \\\n  -F \"comparison_column=s_1_name\" \\\n  -F 'bind_columns=[\"s_1_id\"]' \\\n  http://localhost:5050/api/v1/excel/bind-single-key\n```\n\n**Example Scenario:**\n\n**File A (source.xlsx) - Before:**\n| s_1_name | existing_col |\n|----------|--------------|\n| Alice | data1 |\n| Bob | data2 |\n| Charlie | data3 |\n\n**File B (bind.xlsx):**\n| s_1_name | s_1_id |\n|----------|--------|\n| Alice | 101 |\n| Bob | 102 |\n| David | 103 |\n\n**Result (source_data_bound_single.xlsx) - After:**\n| s_1_name | existing_col | s_1_id |\n|----------|--------------|--------|\n| Alice | data1 | 101 |\n| Bob | data2 | 102 |\n| Charlie | data3 | NaN |\n\n**Note:** Charlie's row remains but has NaN for s_1_id since there's no match in File B.\n\n### 17. Bind Excel Files - Multi Key (Composite Key)\n\nBind columns from a bind file to a source file using multiple comparison columns for matching. This is useful when a single column isn't unique enough for matching.\n\n**Description:**\n\n- Similar to single-key binding but uses multiple columns together as a composite key\n- Matches rows only when ALL comparison columns match\n- Useful for matching on combinations like (first_name + last_name) or (country + city)\n- All original data in File A is preserved\n\n**Request Parameters:**\n\n- `source_file`: Excel file to be extended (File A)\n- `bind_file`: Excel file with data to bind (File B)\n- `comparison_columns`: JSON array of column names used together for matching\n- `bind_columns`: JSON array of column names to append from File B\n- `output_filename`: Optional custom output filename\n\n**Response Example:**\n\n```json\n{\n  \"data\": {\n    \"download_url\": \"http://localhost:5050/api/v1/files/downloads/source_data_bound_multi_20260129_143530.xlsx\"\n  },\n  \"message\": \"Excel binding completed successfully\",\n  \"meta\": {\n    \"api_version\": \"v0.0.1\",\n    \"locale\": \"en_US\",\n    \"request_id\": \"xyz789...\",\n    \"requested_time\": \"2026-01-29T14:35:30+00:00\"\n  },\n  \"status_code\": 200,\n  \"total\": 0\n}\n```\n\n**cURL Example:**\n\n```bash\ncurl -X POST \\\n  -F \"source_file=@source.xlsx\" \\\n  -F \"bind_file=@bind.xlsx\" \\\n  -F 'comparison_columns=[\"first_name\", \"last_name\"]' \\\n  -F 'bind_columns=[\"email\", \"phone\"]' \\\n  http://localhost:5050/api/v1/excel/bind-multi-key\n```\n\n**Example Scenario:**\n\n**File A (source.xlsx) - Before:**\n| first_name | last_name | department |\n|------------|-----------|------------|\n| John | Doe | IT |\n| Jane | Smith | HR |\n| John | Smith | Finance |\n\n**File B (bind.xlsx):**\n| first_name | last_name | email | phone |\n|------------|-----------|--------------------|--------------|\n| John | Doe | john.doe@corp.com | 555-0101 |\n| Jane | Smith | jane.smith@corp.com| 555-0102 |\n| Mike | Johnson | mike.j@corp.com | 555-0103 |\n\n**Result (source_data_bound_multi.xlsx) - After:**\n| first_name | last_name | department | email | phone |\n|------------|-----------|------------|---------------------|----------|\n| John | Doe | IT | john.doe@corp.com | 555-0101 |\n| Jane | Smith | HR | jane.smith@corp.com | 555-0102 |\n| John | Smith | Finance | NaN | NaN |\n\n**Note:** \"John Smith\" in File A doesn't match any row in File B (only \"John Doe\" and \"Jane Smith\" exist), so email and phone are NaN.\n\n### Binding Features \u0026 Behavior\n\n**Key Features:**\n\n- **Preserves Original Data**: All columns and rows from source file are kept\n- **Left Join Logic**: All source rows appear in output, matched or not\n- **Duplicate Handling**: If bind file has duplicate keys, first match is used\n- **NaN for Unmatched**: Rows without matches get NaN in bound columns\n- **Column Conflict Detection**: Raises error if bind columns already exist in source\n- **Type Preservation**: Data types are maintained during binding\n\n**Error Handling:**\n\n- `422 Validation Error`: Missing columns, column conflicts, invalid JSON\n- `400 File Processing Error`: File read/write failures\n- `500 Server Error`: Unexpected internal errors\n\n**Performance Tips:**\n\n- Remove duplicates from bind file before uploading for faster processing\n- Use single-key binding when possible (faster than multi-key)\n- Consider file size limits when working with large datasets\n\n## 📄 JSON Generation Features\n\nThe JSON generation feature provides flexible ways to transform Excel data into JSON format with support for standard column mapping and custom template-based generation.\n\n### Standard JSON Generation\n\nGenerate JSON from Excel data by mapping Excel columns to JSON keys.\n\n**Parameters:**\n\n- `file`: Excel file (required)\n- `column_mapping`: JSON object mapping Excel columns to JSON keys (optional, uses all columns if not provided)\n- `columns`: JSON array of column names to extract before generation (optional)\n- `pretty_print`: Boolean to format JSON with indentation (default: true)\n- `null_handling`: Strategy for null values - \"include\", \"exclude\", \"default\" (default: \"include\")\n- `array_wrapper`: Boolean to wrap objects in array (default: true)\n- `output_filename`: Optional custom filename\n\n**Null Handling Strategies:**\n\n- `include`: Keep null values as `null` in JSON\n- `exclude`: Remove keys with null values from JSON objects\n- `default`: Replace null values with empty strings\n\n**Example:**\n\n```bash\ncurl -X POST \\\n  -F \"file=@data.xlsx\" \\\n  -F 'column_mapping={\"Name\": \"full_name\", \"Email\": \"email\", \"Age\": \"age\"}' \\\n  -F \"null_handling=exclude\" \\\n  http://localhost:5050/api/v1/json/generate\n```\n\n### Template-Based JSON Generation\n\nGenerate JSON using custom templates with placeholder substitution. This allows for nested structures and complex JSON schemas.\n\n**Parameters:**\n\n- `file`: Excel file (required)\n- `template`: JSON template string or object with placeholders (required)\n- `column_mapping`: JSON object mapping placeholders to Excel columns (required)\n- `pretty_print`: Boolean to format JSON with indentation (default: true)\n- `aggregation_mode`: Output format - \"array\", \"single\", \"nested\" (default: \"array\")\n- `output_filename`: Optional custom filename\n\n**Aggregation Modes:**\n\n- `array`: Returns array of objects (default)\n- `single`: Returns single object (for one row) or array (for multiple rows)\n- `nested`: Returns object with `items` array and `count` field\n\n**Placeholder Syntax:**\n\n- `{column_name}`: Basic substitution\n- `{column_name:int}`: Convert to integer\n- `{column_name:float}`: Convert to float\n- `{column_name:bool}`: Convert to boolean\n- `{column_name:datetime}`: Keep as datetime string\n- `{column_name|default}`: Use default if value is null\n\n**Template Examples:**\n\n1. **Simple Template:**\n\n```json\n{\n  \"id\": \"{user_id}\",\n  \"name\": \"{full_name}\",\n  \"email\": \"{email_address}\"\n}\n```\n\n2. **Nested Structure:**\n\n```json\n{\n  \"user\": {\n    \"personal\": {\n      \"firstName\": \"{first_name}\",\n      \"lastName\": \"{last_name}\"\n    },\n    \"contact\": {\n      \"email\": \"{email}\",\n      \"phone\": \"{phone}\"\n    }\n  },\n  \"metadata\": {\n    \"source\": \"excel_import\"\n  }\n}\n```\n\n3. **With Type Conversion:**\n\n```json\n{\n  \"id\": \"{user_id:int}\",\n  \"score\": \"{score:float}\",\n  \"active\": \"{is_active:bool}\"\n}\n```\n\n4. **With Default Values:**\n\n```json\n{\n  \"name\": \"{name}\",\n  \"email\": \"{email|no-email@example.com}\",\n  \"status\": \"{status|pending}\"\n}\n```\n\n**Example Usage:**\n\n```bash\ncurl -X POST \\\n  -F \"file=@users.xlsx\" \\\n  -F 'template={\"user\":{\"id\":\"{user_id}\",\"name\":\"{first} {last}\"},\"contact\":{\"email\":\"{email}\"}}' \\\n  -F 'column_mapping={\"user_id\":\"UserID\",\"first\":\"FirstName\",\"last\":\"LastName\",\"email\":\"Email\"}' \\\n  -F \"aggregation_mode=nested\" \\\n  http://localhost:5050/api/v1/json/generate-with-template\n```\n\n### Edge Cases Handled\n\n- **Empty DataFrame**: Returns empty array `[]`\n- **Missing Columns**: Returns 422 validation error with details\n- **Invalid Template**: Returns 422 validation error\n- **Null/NaN Values**: Handled according to strategy\n- **Special Characters**: Properly escaped in JSON\n- **Mixed Data Types**: Automatic type conversion\n\n### Response Format\n\nBoth endpoints return the same response structure:\n\n```json\n{\n  \"data\": {\n    \"download_url\": \"/api/v1/files/downloads/data_generated_20260130_101437.json\",\n    \"total_records\": 150,\n    \"file_size\": 45632\n  },\n  \"message\": \"JSON file generated successfully\",\n  \"meta\": {\n    \"api_version\": \"v0.0.1\",\n    \"locale\": \"en_US\",\n    \"request_id\": \"abc123...\",\n    \"requested_time\": \"2026-01-30T10:14:37.232310+00:00\"\n  },\n  \"status_code\": 200,\n  \"total\": 0\n}\n```\n\n### File Naming Convention\n\nGenerated files follow the pattern:\n\n- Standard generation: `{original_name}_generated_{timestamp}.json`\n- Template generation: `{original_name}_generated_template_{timestamp}.json`\n\nExample: `data_generated_20260130_143022.json`\n\n---\n\n## 📡 Chat Workflows API Reference\n\n### Base URL\n\n```\nhttp://localhost:5050/api/v1\n```\n\n### Chat Workflows Endpoints\n\n#### 1. Create Conversation\n\n```bash\nPOST /chat/workflows\n```\n\n**Response**:\n\n```json\n{\n  \"data\": {\n    \"chat_id\": \"uuid\",\n    \"participant_name\": \"BlueWhale-4821\",\n    \"status\": \"created\",\n    \"partition_key\": \"2026/02\",\n    \"created_at\": \"2026-02-06T18:11:05.349996\",\n    \"uploaded_files\": [],\n    \"output_files\": []\n  }\n}\n```\n\n#### 2. List Conversations\n\n```bash\nGET /chat/workflows?status=created\u0026limit=100\u0026offset=0\n```\n\n**Query Parameters**:\n\n- `status`: Filter by status (created, processing, completed, failed)\n- `limit`: Results per page (default: 100)\n- `offset`: Pagination offset (default: 0)\n\n#### 3. Get Conversation\n\n```bash\nGET /chat/workflows/{chat_id}\n```\n\n#### 4. Upload File\n\n```bash\nPOST /chat/workflows/{chat_id}/upload\nContent-Type: multipart/form-data\n\nfile: @path/to/file.xlsx\n```\n\n**Response**:\n\n```json\n{\n  \"data\": {\n    \"filename\": \"data.xlsx\",\n    \"file_path\": \"./automation/workflows/2026/02/{chat_id}/uploads/data.xlsx\",\n    \"download_url\": \"http://localhost:5050/api/v1/chat/workflows/{chat_id}/files/data.xlsx\"\n  }\n}\n```\n\n✅ **Recent Fix**: Download URLs are now absolute and clickable\n\n#### 5. Execute Workflow\n\n```bash\nPOST /chat/workflows/{chat_id}/execute\nContent-Type: application/json\n\n{\n  \"steps\": [\n    {\n      \"operation\": \"excel/extract-columns-to-file\",\n      \"arguments\": {\n        \"columns\": [\"customer_id\", \"amount\"],\n        \"remove_duplicates\": true\n      }\n    }\n  ]\n}\n```\n\n**Response**:\n\n```json\n{\n  \"data\": {\n    \"results\": [\n      {\n        \"output_file_path\": \"...\",\n        \"download_url\": \"http://localhost:5050/api/v1/chat/workflows/{chat_id}/files/output.xlsx\"\n      }\n    ],\n    \"output_files\": [\n      {\n        \"file_path\": \"...\",\n        \"download_url\": \"http://localhost:5050/api/v1/chat/workflows/{chat_id}/files/output.xlsx\"\n      }\n    ]\n  }\n}\n```\n\n✅ **Recent Fix**: Each output file includes download_url\n\n#### 6. Delete Conversation\n\n```bash\nDELETE /chat/workflows/{chat_id}\n```\n\n#### 7. Dump Conversation\n\n```bash\nPOST /chat/workflows/{chat_id}/dump\n```\n\n✅ **Recent Fix**: Dump files now created correctly and downloadable\n\n#### 8. Restore Conversation\n\n```bash\nPOST /chat/workflows/restore\nContent-Type: multipart/form-data\n\ndump_file: @path/to/dump.tar.gz\n```\n\n✅ **Recent Fix**: Files now restored to correct partition paths\n\n#### 9. Download Workflow File\n\n```bash\nGET /chat/workflows/{chat_id}/files/{filename}\n```\n\n✅ **New Endpoint**: Download uploaded or output files\n\n#### 10. Download Dump File\n\n```bash\nGET /chat/downloads/{filename}\n```\n\n#### 11. Backup SQLite Database\n\n---\n\n## 🔌 WebSocket Integration\n\n### Connection\n\n```javascript\nconst ws = new WebSocket(\"ws://127.0.0.1:5051/chat/{chat_id}\");\n```\n\n### Message Types\n\n#### 1. Connected (Welcome)\n\n```json\n{\n  \"type\": \"connected\",\n  \"chat_id\": \"...\",\n  \"message\": \"Connected to chat workflow\"\n}\n```\n\n#### 2. Workflow Started\n\n```json\n{\n  \"type\": \"workflow_started\",\n  \"chat_id\": \"...\",\n  \"total_steps\": 3,\n  \"message\": \"Workflow execution started\"\n}\n```\n\n#### 3. Progress Update\n\n```json\n{\n  \"type\": \"progress\",\n  \"chat_id\": \"...\",\n  \"operation\": \"excel/extract-columns-to-file\",\n  \"progress\": 45,\n  \"status\": \"running\",\n  \"message\": \"Processing column 'customer_id'\"\n}\n```\n\n#### 4. Workflow Completed\n\n```json\n{\n  \"type\": \"workflow_completed\",\n  \"chat_id\": \"...\",\n  \"total_steps\": 3,\n  \"output_files_count\": 2,\n  \"message\": \"Workflow execution completed successfully\"\n}\n```\n\n#### 5. Workflow Failed\n\n```json\n{\n  \"type\": \"workflow_failed\",\n  \"chat_id\": \"...\",\n  \"error\": \"Operation failed: ...\",\n  \"message\": \"Workflow execution failed\"\n}\n```\n\n### Client Messages\n\n#### Ping/Pong (Keepalive)\n\n```json\n// Send\n{\"type\": \"ping\"}\n\n// Receive\n{\"type\": \"pong\"}\n```\n\n#### Change Subscription\n\n```json\n{ \"type\": \"subscribe\", \"chat_id\": \"new-chat-id\" }\n```\n\n---\n\nFor detailed WebSocket documentation, see [WEBSOCKET_USAGE.md](./docs/WEBSOCKET_USAGE.md)\n\n---\n\n## 🤖 Chat Bot API Reference\n\n### Overview\n\nThe Chat Bot feature provides a **Telegram-like conversational interface** for file processing. Users can interact with the bot using natural language to describe what they want to do with their files, and the bot will:\n\n1. **Interpret the intent** using NLP/pattern matching\n2. **Suggest appropriate workflows** based on the request\n3. **Request confirmation** before execution\n4. **Execute workflows** with real-time progress updates\n5. **Provide download links** for the results\n\n### Architecture\n\n```\nUser → Message → IntentClassifier → StateManager → MessageHandlers\n                                                           ↓\n                                            StreamingWorkflowExecutor\n                                                           ↓\n                                            WebSocket Progress Updates\n                                                           ↓\n                                            Download Links\n```\n\n### Supported Intents\n\nThe bot can understand and process the following types of requests:\n\n| Intent Type | Keywords | Operations |\n|-------------|----------|------------|\n| **Extract Columns** | extract, get, select, column | `excel/extract-columns`, `excel/extract-columns-to-file` |\n| **Convert Format** | convert, transform, export as | `csv/convert-to-excel`, `json/generate` |\n| **Normalize Data** | normalize, clean, standardize, trim, uppercase | `normalization/apply` |\n| **Generate SQL** | sql, insert, database, query | `sql/generate`, `sql/generate-to-text` |\n| **Generate JSON** | json, export json | `json/generate`, `json/generate-with-template` |\n| **Search/Filter** | search, filter, find, query, where | `excel/search`, `csv/search` |\n| **Bind Data** | bind, merge, join, combine | `excel/bind-single-key`, `excel/bind-multi-key` |\n| **Map Columns** | map, rename, remap | `excel/map-columns` |\n\n### Chat Bot API Endpoints\n\n#### 1. Start New Bot Conversation\n\nCreate a new chat bot conversation.\n\n```bash\nPOST /api/v1/chat/bot/conversations\nContent-Type: application/json\n```\n\n**Request Body (Optional)**:\n\n```json\n{\n  \"chat_id\": \"optional-custom-id\"\n}\n```\n\n**Response**:\n\n```json\n{\n  \"status_code\": 201,\n  \"message\": \"Bot conversation started successfully\",\n  \"data\": {\n    \"chat_id\": \"550e8400-e29b-41d4-a716-446655440000\",\n    \"participant_name\": \"BlueWhale-4821\",\n    \"status\": \"created\",\n    \"bot_message\": \"👋 Welcome to Pycelize Chat Bot!\\n\\nI'm here to help you process Excel and CSV files...\",\n    \"state\": \"idle\",\n    \"created_at\": \"2026-02-08T10:30:00.000000\"\n  }\n}\n```\n\n**cURL Example**:\n\n```bash\ncurl -X POST http://localhost:5050/api/v1/chat/bot/conversations \\\n  -H \"Content-Type: application/json\" \\\n  -d '{}'\n```\n\n#### 2. Send Message to Bot\n\nSend a text message to the bot describing what you want to do.\n\n```bash\nPOST /api/v1/chat/bot/conversations/{chat_id}/message\nContent-Type: application/json\n```\n\n**Request Body**:\n\n```json\n{\n  \"message\": \"extract columns: name, email, phone\"\n}\n```\n\n**Response**:\n\n```json\n{\n  \"status_code\": 200,\n  \"message\": \"Message processed successfully\",\n  \"data\": {\n    \"bot_response\": \"I can help you extract specific columns from your file. I suggest: Extract specific columns to a new file\\n\\n📋 **Suggested Workflow:**\\n1. excel/extract-columns-to-file: Extract specific columns to a new file\\n\\nWould you like me to proceed with this workflow? (yes/no)\\nOr you can ask me to modify specific parameters.\",\n    \"intent\": {\n      \"type\": \"extract_columns\",\n      \"confidence\": 0.9\n    },\n    \"suggested_workflow\": [\n      {\n        \"operation\": \"excel/extract-columns-to-file\",\n        \"arguments\": {\n          \"columns\": [\"name\", \"email\", \"phone\"],\n          \"remove_duplicates\": false\n        },\n        \"description\": \"Extract specific columns to a new file\"\n      }\n    ],\n    \"requires_confirmation\": true,\n    \"requires_file\": true\n  }\n}\n```\n\n**cURL Examples**:\n\n```bash\n# Example 1: Extract columns\ncurl -X POST http://localhost:5050/api/v1/chat/bot/conversations/{chat_id}/message \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"message\": \"extract columns: name, email, phone\"}'\n\n# Example 2: Convert to JSON\ncurl -X POST http://localhost:5050/api/v1/chat/bot/conversations/{chat_id}/message \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"message\": \"convert to JSON\"}'\n\n# Example 3: Generate SQL\ncurl -X POST http://localhost:5050/api/v1/chat/bot/conversations/{chat_id}/message \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"message\": \"generate SQL insert statements for table users\"}'\n\n# Example 4: Normalize data\ncurl -X POST http://localhost:5050/api/v1/chat/bot/conversations/{chat_id}/message \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"message\": \"normalize data - uppercase and trim\"}'\n\n# Example 5: Help command\ncurl -X POST http://localhost:5050/api/v1/chat/bot/conversations/{chat_id}/message \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"message\": \"help\"}'\n```\n\n#### 3. Upload File to Bot\n\nUpload a file to the bot conversation.\n\n```bash\nPOST /api/v1/chat/bot/conversations/{chat_id}/upload\nContent-Type: multipart/form-data\n```\n\n**Request**:\n\n```\nfile: @path/to/file.xlsx\n```\n\n**Response**:\n\n```json\n{\n  \"status_code\": 200,\n  \"message\": \"File uploaded successfully\",\n  \"data\": {\n    \"file_path\": \"./automation/workflows/2026/02/{chat_id}/uploads/data.xlsx\",\n    \"filename\": \"data.xlsx\",\n    \"download_url\": \"http://localhost:5050/api/v1/chat/workflows/{chat_id}/files/data.xlsx\",\n    \"bot_response\": \"✅ File 'data.xlsx' uploaded successfully!\\n\\n📋 **Suggested Workflow:**\\n1. excel/extract-columns-to-file: Extract specific columns to a new file\\n\\nWould you like me to proceed with this workflow? (yes/no)\",\n    \"suggested_workflow\": [\n      {\n        \"operation\": \"excel/extract-columns-to-file\",\n        \"arguments\": {\n          \"columns\": [\"name\", \"email\", \"phone\"],\n          \"remove_duplicates\": false\n        }\n      }\n    ],\n    \"requires_confirmation\": true\n  }\n}\n```\n\n**cURL Example**:\n\n```bash\ncurl -X POST http://localhost:5050/api/v1/chat/bot/conversations/{chat_id}/upload \\\n  -F \"file=@data.xlsx\"\n```\n\n#### 4. Confirm/Decline Workflow\n\nConfirm or decline the bot's suggested workflow.\n\n```bash\nPOST /api/v1/chat/bot/conversations/{chat_id}/confirm\nContent-Type: application/json\n```\n\n**Request Body**:\n\n```json\n{\n  \"confirmed\": true,\n  \"modified_workflow\": [\n    {\n      \"operation\": \"excel/extract-columns-to-file\",\n      \"arguments\": {\n        \"columns\": [\"name\", \"email\"],\n        \"remove_duplicates\": true\n      }\n    }\n  ]\n}\n```\n\n**Response (Success)**:\n\n```json\n{\n  \"status_code\": 200,\n  \"message\": \"Workflow confirmation processed\",\n  \"data\": {\n    \"bot_response\": \"✅ Workflow completed successfully! Your files are ready for download.\",\n    \"output_files\": [\n      {\n        \"file_path\": \"./automation/workflows/2026/02/{chat_id}/outputs/result.xlsx\",\n        \"download_url\": \"http://localhost:5050/api/v1/chat/workflows/{chat_id}/files/result.xlsx\"\n      }\n    ],\n    \"results\": [\n      {\n        \"output_file_path\": \"...\",\n        \"download_url\": \"...\"\n      }\n    ]\n  }\n}\n```\n\n**cURL Examples**:\n\n```bash\n# Confirm workflow\ncurl -X POST http://localhost:5050/api/v1/chat/bot/conversations/{chat_id}/confirm \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"confirmed\": true}'\n\n# Decline workflow\ncurl -X POST http://localhost:5050/api/v1/chat/bot/conversations/{chat_id}/confirm \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"confirmed\": false}'\n\n# Confirm with modifications\ncurl -X POST http://localhost:5050/api/v1/chat/bot/conversations/{chat_id}/confirm \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"confirmed\": true,\n    \"modified_workflow\": [\n      {\n        \"operation\": \"excel/extract-columns-to-file\",\n        \"arguments\": {\n          \"columns\": [\"name\", \"email\"],\n          \"remove_duplicates\": true\n        }\n      }\n    ]\n  }'\n```\n\n#### 5. Get Conversation History\n\nRetrieve the full conversation history with the bot.\n\n```bash\nGET /api/v1/chat/bot/conversations/{chat_id}/history?limit=50\n```\n\n**Query Parameters**:\n\n- `limit`: Optional maximum number of messages to return\n\n**Response**:\n\n```json\n{\n  \"status_code\": 200,\n  \"message\": \"Conversation history retrieved successfully\",\n  \"data\": {\n    \"chat_id\": \"550e8400-e29b-41d4-a716-446655440000\",\n    \"participant_name\": \"BlueWhale-4821\",\n    \"status\": \"completed\",\n    \"current_state\": \"idle\",\n    \"messages\": [\n      {\n        \"message_id\": \"msg-1\",\n        \"message_type\": \"system\",\n        \"content\": \"👋 Welcome to Pycelize Chat Bot!...\",\n        \"created_at\": \"2026-02-08T10:30:00.000000\"\n      },\n      {\n        \"message_id\": \"msg-2\",\n        \"message_type\": \"user\",\n        \"content\": \"extract columns: name, email\",\n        \"created_at\": \"2026-02-08T10:31:00.000000\"\n      },\n      {\n        \"message_id\": \"msg-3\",\n        \"message_type\": \"system\",\n        \"content\": \"I can help you extract specific columns...\",\n        \"created_at\": \"2026-02-08T10:31:01.000000\"\n      }\n    ],\n    \"uploaded_files\": [\n      {\n        \"file_path\": \"...\",\n        \"download_url\": \"http://localhost:5050/api/v1/chat/workflows/{chat_id}/files/data.xlsx\"\n      }\n    ],\n    \"output_files\": [\n      {\n        \"file_path\": \"...\",\n        \"download_url\": \"http://localhost:5050/api/v1/chat/workflows/{chat_id}/files/result.xlsx\"\n      }\n    ],\n    \"workflow_steps\": [...]\n  }\n}\n```\n\n**cURL Example**:\n\n```bash\ncurl -X GET \"http://localhost:5050/api/v1/chat/bot/conversations/{chat_id}/history?limit=50\"\n```\n\n#### 6. Delete Bot Conversation\n\nDelete a bot conversation and all associated data.\n\n```bash\nDELETE /api/v1/chat/bot/conversations/{chat_id}\n```\n\n**Response**:\n\n```json\n{\n  \"status_code\": 200,\n  \"message\": \"Bot conversation deleted successfully\",\n  \"data\": {\n    \"chat_id\": \"550e8400-e29b-41d4-a716-446655440000\"\n  }\n}\n```\n\n**cURL Example**:\n\n```bash\ncurl -X DELETE http://localhost:5050/api/v1/chat/bot/conversations/{chat_id}\n```\n\n#### 7. Get Supported Operations\n\nGet a list of all supported operations and intents.\n\n```bash\nGET /api/v1/chat/bot/operations\n```\n\n**Response**:\n\n```json\n{\n  \"status_code\": 200,\n  \"message\": \"Supported operations retrieved successfully\",\n  \"data\": {\n    \"operations\": {\n      \"extract_columns\": [\"excel/extract-columns\", \"excel/extract-columns-to-file\"],\n      \"convert_format\": [\"csv/convert-to-excel\", \"json/generate\"],\n      \"normalize_data\": [\"normalization/apply\"],\n      \"generate_sql\": [\"sql/generate\", \"sql/generate-to-text\"],\n      \"generate_json\": [\"json/generate\", \"json/generate-with-template\"],\n      \"search_filter\": [\"excel/search\", \"csv/search\"],\n      \"bind_data\": [\"excel/bind-single-key\", \"excel/bind-multi-key\"],\n      \"map_columns\": [\"excel/map-columns\"]\n    },\n    \"total_intents\": 8\n  }\n}\n```\n\n**cURL Example**:\n\n```bash\ncurl -X GET http://localhost:5050/api/v1/chat/bot/operations\n```\n\n### Bot Conversation States\n\nThe bot manages conversation state to provide context-aware responses:\n\n```\nidle → awaiting_file → awaiting_confirmation → processing → completed → idle\n  ↓          ↓                    ↓                ↓\ncancelled ← cancelled ← cancelled ← failed → idle\n```\n\n| State | Description |\n|-------|-------------|\n| **idle** | Ready for new request |\n| **awaiting_file** | Waiting for file upload |\n| **awaiting_confirmation** | Waiting for user to confirm/decline workflow |\n| **awaiting_parameters** | Waiting for additional parameters |\n| **processing** | Workflow execution in progress |\n| **completed** | Workflow completed successfully |\n| **failed** | Workflow failed |\n| **cancelled** | User cancelled the operation |\n\n### WebSocket Messages for Bot\n\nThe bot uses the same WebSocket infrastructure as Chat Workflows. Connect to:\n\n```\nws://127.0.0.1:5051/chat/{chat_id}\n```\n\n**Bot-Specific Messages**:\n\n```json\n// Workflow started\n{\n  \"type\": \"workflow_started\",\n  \"chat_id\": \"...\",\n  \"total_steps\": 3,\n  \"message\": \"🚀 Starting workflow execution...\"\n}\n\n// Progress update\n{\n  \"type\": \"progress\",\n  \"chat_id\": \"...\",\n  \"step_id\": \"...\",\n  \"operation\": \"excel/extract-columns-to-file\",\n  \"progress\": 50,\n  \"status\": \"running\",\n  \"message\": \"Processing...\"\n}\n\n// Workflow completed\n{\n  \"type\": \"workflow_completed\",\n  \"chat_id\": \"...\",\n  \"total_steps\": 3,\n  \"output_files_count\": 1,\n  \"message\": \"✅ Workflow completed successfully!\"\n}\n\n// Workflow failed\n{\n  \"type\": \"workflow_failed\",\n  \"chat_id\": \"...\",\n  \"error\": \"...\",\n  \"message\": \"❌ Workflow failed: ...\"\n}\n```\n\n### Complete Chat Bot Workflow Example\n\n```bash\n# 1. Start conversation\nCHAT_ID=$(curl -s -X POST http://localhost:5050/api/v1/chat/bot/conversations | jq -r '.data.chat_id')\n\necho \"Chat ID: $CHAT_ID\"\n\n# 2. Send message\ncurl -X POST http://localhost:5050/api/v1/chat/bot/conversations/$CHAT_ID/message \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"message\": \"extract columns: name, email, phone\"}'\n\n# 3. Upload file\ncurl -X POST http://localhost:5050/api/v1/chat/bot/conversations/$CHAT_ID/upload \\\n  -F \"file=@data.xlsx\"\n\n# 4. Confirm workflow\ncurl -X POST http://localhost:5050/api/v1/chat/bot/conversations/$CHAT_ID/confirm \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"confirmed\": true}'\n\n# 5. Get conversation history\ncurl -X GET http://localhost:5050/api/v1/chat/bot/conversations/$CHAT_ID/history\n\n# 6. Delete conversation (cleanup)\ncurl -X DELETE http://localhost:5050/api/v1/chat/bot/conversations/$CHAT_ID\n```\n\n### Special Bot Commands\n\nThe bot recognizes several special commands:\n\n| Command | Description |\n|---------|-------------|\n| **help** or **?** | Display help information |\n| **cancel**, **stop**, **quit** | Cancel current operation |\n| **yes**, **y**, **ok**, **proceed** | Confirm workflow |\n| **no**, **n** | Decline workflow |\n\n### Message Examples\n\nHere are various ways to interact with the bot:\n\n```bash\n# Extract columns\n\"extract columns: name, email, phone\"\n\"get the name and email columns\"\n\"select columns: customer_id, amount\"\n\n# Convert formats\n\"convert to JSON\"\n\"export as CSV\"\n\"transform to Excel\"\n\n# Normalize data\n\"normalize data - uppercase and trim\"\n\"clean the data\"\n\"standardize phone numbers\"\n\n# Generate SQL\n\"generate SQL for table users\"\n\"create insert statements\"\n\"generate SQL with auto-increment\"\n\n# Search/Filter\n\"search for records where status = active\"\n\"filter data where amount \u003e 1000\"\n\"find customers in New York\"\n\n# Bind/Merge\n\"bind data from another file\"\n\"merge with customer_info.xlsx\"\n\n# Map columns\n\"rename columns\"\n\"map column names\"\n```\n\n### Error Handling\n\nThe bot provides clear error messages and recovery options:\n\n```json\n{\n  \"success\": false,\n  \"error\": \"No uploaded file found\",\n  \"bot_response\": \"📎 Please upload your file first before I can process it.\"\n}\n```\n\nCommon errors:\n\n- **No file uploaded**: Bot will request file upload\n- **Unknown intent**: Bot will ask for clarification or show help\n- **Workflow execution failed**: Bot will explain the error and suggest retry\n- **Invalid parameters**: Bot will request correct parameters\n\n---\n\n## 💻 Frontend Integration Guide\n\n### Quick Start\n\n#### 1. Create Conversation\n\n```javascript\nconst response = await fetch(\"http://localhost:5050/api/v1/chat/workflows\", {\n  method: \"POST\",\n});\nconst { data } = await response.json();\nconst chatId = data.chat_id;\n```\n\n#### 2. Connect to WebSocket\n\n```javascript\nconst ws = new WebSocket(`ws://127.0.0.1:5051/chat/${chatId}`);\n\nws.onmessage = (event) =\u003e {\n  const message = JSON.parse(event.data);\n\n  switch (message.type) {\n    case \"workflow_started\":\n      console.log(\"Workflow started\");\n      break;\n    case \"progress\":\n      updateProgressBar(message.progress);\n      break;\n    case \"workflow_completed\":\n      console.log(\"Workflow completed\");\n      break;\n    case \"workflow_failed\":\n      console.error(\"Workflow failed:\", message.error);\n      break;\n  }\n};\n```\n\n#### 3. Upload File\n\n```javascript\nconst formData = new FormData();\nformData.append(\"file\", file);\n\nconst response = await fetch(\n  `http://localhost:5050/api/v1/chat/workflows/${chatId}/upload`,\n  {\n    method: \"POST\",\n    body: formData,\n  },\n);\nconst { data } = await response.json();\n// Use data.download_url for downloads\n```\n\n#### 4. Execute Workflow\n\n```javascript\nconst response = await fetch(\n  `http://localhost:5050/api/v1/chat/workflows/${chatId}/execute`,\n  {\n    method: \"POST\",\n    headers: { \"Content-Type\": \"application/json\" },\n    body: JSON.stringify({\n      steps: [\n        {\n          operation: \"excel/extract-columns-to-file\",\n          arguments: {\n            columns: [\"customer_id\"],\n            remove_duplicates: true,\n          },\n        },\n      ],\n    }),\n  },\n);\n```\n\n#### 5. Download Results\n\n```javascript\n// Download URLs are absolute and ready to use\nconst downloadUrl = data.results[0].download_url;\nwindow.open(downloadUrl);\n\n// Or use in anchor tag\n\u003ca href={downloadUrl} download\u003e\n  Download\n\u003c/a\u003e;\n```\n\n### React Component Example\n\n```jsx\nimport React, { useState, useEffect } from \"react\";\n\nfunction ChatWorkflow({ chatId }) {\n  const [progress, setProgress] = useState(0);\n  const [status, setStatus] = useState(\"idle\");\n  const [message, setMessage] = useState(\"\");\n\n  useEffect(() =\u003e {\n    const ws = new WebSocket(`ws://127.0.0.1:5051/chat/${chatId}`);\n\n    ws.onmessage = (event) =\u003e {\n      const data = JSON.parse(event.data);\n\n      switch (data.type) {\n        case \"workflow_started\":\n          setStatus(\"running\");\n          setMessage(\"Workflow started\");\n          break;\n        case \"progress\":\n          setProgress(data.progress);\n          setMessage(data.message);\n          break;\n        case \"workflow_completed\":\n          setStatus(\"completed\");\n          setProgress(100);\n          setMessage(\"Completed successfully\");\n          break;\n        case \"workflow_failed\":\n          setStatus(\"failed\");\n          setMessage(data.error);\n          break;\n      }\n    };\n\n    return () =\u003e ws.close();\n  }, [chatId]);\n\n  return (\n    \u003cdiv\u003e\n      \u003cdiv\u003eStatus: {status}\u003c/div\u003e\n      \u003cdiv\u003eProgress: {progress}%\u003c/div\u003e\n      \u003cdiv\u003eMessage: {message}\u003c/div\u003e\n      \u003cprogress value={progress} max=\"100\" /\u003e\n    \u003c/div\u003e\n  );\n}\n```\n\n### Best Practices\n\n1. **Error Handling**: Always handle WebSocket disconnections\n2. **Reconnection**: Implement exponential backoff for reconnects\n3. **Progress Updates**: Update UI smoothly with progress percentage\n4. **Download URLs**: Use absolute URLs directly, no construction needed\n5. **Message Validation**: Validate message structure before use\n\n---\n\n---\n\n## 🎨 Design Patterns\n\n### 1. Builder Pattern (ResponseBuilder)\n\nUsed for constructing standardized API responses:\n\n```python\nresponse = (\n    ResponseBuilder()\n    .with_data({'id': 1, 'name': 'John'})\n    .with_message('User retrieved successfully')\n    .with_status_code(200)\n    .build()\n)\n```\n\n### 2. Factory Pattern (NormalizerFactory)\n\nUsed for creating normalization strategy instances:\n\n```python\nstrategy = NormalizerFactory.create(NormalizationType.UPPERCASE)\nnormalized_series = strategy.normalize(data_series)\n```\n\n### 3. Strategy Pattern (Normalization Strategies)\n\nUsed for implementing interchangeable normalization algorithms:\n\n```python\nclass UppercaseStrategy(NormalizationStrategy):\n    def normalize(self, series: pd.Series) -\u003e pd.Series:\n        return series.astype(str).str.upper()\n```\n\n## 🧪 Testing\n\nRun the test suite:\n\n```bash\nmake test\n```\n\nRun tests with coverage:\n\n```bash\nmake test-cov\n```\n\n## 📝 Available Normalization Types\n\n| Type                   | Description                        |\n| ---------------------- | ---------------------------------- |\n| `uppercase`            | Convert to uppercase               |\n| `lowercase`            | Convert to lowercase               |\n| `title_case`           | Convert to title case              |\n| `trim_whitespace`      | Remove leading/trailing whitespace |\n| `remove_special_chars` | Remove special characters          |\n| `phone_format`         | Format phone numbers               |\n| `email_format`         | Format email addresses             |\n| `name_format`          | Format personal names              |\n| `min_max_scale`        | Scale to 0-1 range                 |\n| `z_score`              | Standardize using z-score          |\n| `round_decimal`        | Round to decimals                  |\n| `integer_convert`      | Convert to integer                 |\n| `currency_format`      | Parse currency values              |\n| `date_format`          | Format dates                       |\n| `datetime_format`      | Format datetime                    |\n| `boolean_convert`      | Convert to boolean                 |\n| `yes_no_convert`       | Convert to Yes/No                  |\n| `regex_replace`        | Replace using regex                |\n| `fill_null_values`     | Fill null values                   |\n| `outlier_removal`      | Remove outliers                    |\n\n---\n\n## 🤝 Contributing\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n---\n\n## Made with ❤️ using Flask and Python\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpnguyen215%2Fpycelize","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpnguyen215%2Fpycelize","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpnguyen215%2Fpycelize/lists"}