https://github.com/solrikk/magicxml
MagicXML is a high-performance web application built with FastAPI that converts data between XML, CSV, Excel, JSON, PDF, and image formats. Designed for data analysts, developers, and e-commerce professionals, MagicXML handles complex structures with advanced parsing capabilities, asyncio-powered processing, and intelligent data classification.
https://github.com/solrikk/magicxml
async convert converter converter-app csv csv-export data-extraction data-processing data-transformation excel fastapi-template multilingual open-source python web-application xml xml-parser xml-processing
Last synced: 9 days ago
JSON representation
MagicXML is a high-performance web application built with FastAPI that converts data between XML, CSV, Excel, JSON, PDF, and image formats. Designed for data analysts, developers, and e-commerce professionals, MagicXML handles complex structures with advanced parsing capabilities, asyncio-powered processing, and intelligent data classification.
- Host: GitHub
- URL: https://github.com/solrikk/magicxml
- Owner: Solrikk
- License: apache-2.0
- Created: 2024-04-16T07:43:33.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-08-13T13:47:54.000Z (9 months ago)
- Last Synced: 2025-08-13T15:43:04.993Z (9 months ago)
- Topics: async, convert, converter, converter-app, csv, csv-export, data-extraction, data-processing, data-transformation, excel, fastapi-template, multilingual, open-source, python, web-application, xml, xml-parser, xml-processing
- Language: Python
- Homepage: https://magic-xml.replit.app/
- Size: 6.23 MB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
MagicXML ๐งโโ๏ธ
Advanced XML to CSV Conversion Tool
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
[](https://fastapi.tiangolo.com/)
-----------------
## ๐ Overview
**MagicXML** is a high-performance web application built with FastAPI that converts data between XML, CSV, Excel, JSON, PDF, and image formats. Designed for data analysts, developers, and e-commerce professionals, MagicXML handles complex structures with advanced parsing capabilities, asyncio-powered processing, and intelligent data classification.
### Supported Conversions
- Convert CSV to XML
- Convert CSV to Excel
- Convert Excel to CSV
- Convert JSON to CSV
- Convert CSV to JSON
- Convert XML to JSON
- JPEGโPNG image conversion
- Convert PDF to CSV
- Convert PDF to Excel
- Convert PDF to JSON
- Convert CSV to PDF
- Convert Excel to PDF
๐ **Live Demo**: [https://magic-xml.replit.app](https://magic-xml.replit.app)
## โจ Key Features
- **High-Performance Processing**: Asynchronous architecture for efficient handling of large XML files
- **Intelligent Data Extraction**: Contextual parsing of complex nested XML structures
- **Data Cleaning & Sanitization**: Automatic cleaning of HTML tags and special characters
- **Multilingual Support**: Interface available in English, Russian, and more languages
- **RESTful API**: Programmatic access for seamless integration with your systems
- **Callback Support**: Optional webhook notifications when processing is complete
- **Robust Error Handling**: Comprehensive error management with detailed reporting
- **Versatile Format Conversions**: Convert between CSV, XML, Excel, JSON, PDF, and JPEG/PNG images
## ๐ ๏ธ Technical Architecture
MagicXML leverages several advanced technologies to deliver exceptional performance:
- **FastAPI Backend**: High-performance asynchronous API framework
- **Asyncio & Aiohttp**: Non-blocking I/O operations for concurrent processing
- **XML ElementTree**: Efficient XML parsing and traversal
- **BeautifulSoup**: Intelligent HTML content cleaning
- **Modern Frontend**: Responsive design with custom CSS and JavaScript
## ๐ Use Cases
- **E-commerce Data Processing**: Convert product feeds from XML to CSV
- **Data Analysis**: Transform XML datasets into analysis-ready CSV format
- **System Integration**: Bridge XML-based systems with CSV-compatible tools
- **Catalog Management**: Process large product catalogs efficiently
- **Automated Workflows**: Integrate with data pipelines via API
## ๐ง Installation & Setup
### Prerequisites
- Python 3.8+
- Git
### Quick Start
```bash
# Clone the repository
git clone https://github.com/Solrikk/MagicXML.git
cd MagicXML
# Install dependencies
poetry install
# Run the application
poetry run uvicorn main:app --host 0.0.0.0 --port 8080 --reload
```
Alternatively, install dependencies with `pip`:
```bash
pip install -r requirements.txt
```
## ๐ API Reference
### Convert XML to CSV
```bash
curl -X 'POST' \
'https://magic-xml.replit.app/process_link' \
-H 'Content-Type: application/json' \
-d '{
"link_url": "https://example.com/data.xml",
"preset_id": "optional-tracking-id",
"return_url": "https://your-callback-url.com/webhook"
}'
```
#### Response
```json
{
"file_url": "https://magic-xml.replit.app/download/data_files/example_com.csv",
"preset_id": "optional-tracking-id",
"status": "completed"
}
```
### Check Processing Status
```bash
curl -X 'GET' 'https://magic-xml.replit.app/status/{preset_id}'
```
### Download Generated CSV
```bash
curl -X 'GET' 'https://magic-xml.replit.app/download/data_files/{filename}'
```
## ๐ Implementation Details
### Asynchronous Processing
MagicXML processes XML files asynchronously using Python's `asyncio` and `aiohttp`:
```python
async def process_offers_chunk(offers_chunk, build_category_path, format_type):
offers = []
for offer_elem in offers_chunk:
offer_data = await process_offer(offer_elem, build_category_path, format_type)
offers.append(offer_data)
return {"offers": offers}
```
This approach enables efficient concurrent processing, drastically reducing conversion time for large XML files.
### Text Processing & Data Cleaning
The application implements sophisticated text processing to ensure data quality:
```python
def clean_description(description):
if not description:
return ''
soup = BeautifulSoup(description, 'html5lib')
allowed_tags = ['p', 'br']
for tag in soup.find_all(True):
if tag.name not in allowed_tags:
tag.unwrap()
# Additional cleaning logic...
return str(soup)
```