https://github.com/nicklitwinow/xlsx-assembler-public
XLSX Assembler – ETL Tool for automating the extraction, transformation, and loading of data from multiple Excel files. Built with: Python, Airflow, Cron, Redis, Pandas, Openpyxl, PyQT5, Docker.
https://github.com/nicklitwinow/xlsx-assembler-public
airflow cron desktop-application docker etl openpyxl pandas pyqt pyqt5 python redis xlsx xlsx-files
Last synced: about 1 month ago
JSON representation
XLSX Assembler – ETL Tool for automating the extraction, transformation, and loading of data from multiple Excel files. Built with: Python, Airflow, Cron, Redis, Pandas, Openpyxl, PyQT5, Docker.
- Host: GitHub
- URL: https://github.com/nicklitwinow/xlsx-assembler-public
- Owner: NickLitwinow
- Created: 2024-07-31T15:15:23.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-02T07:56:59.000Z (8 months ago)
- Last Synced: 2025-04-02T14:53:58.070Z (6 months ago)
- Topics: airflow, cron, desktop-application, docker, etl, openpyxl, pandas, pyqt, pyqt5, python, redis, xlsx, xlsx-files
- Language: Python
- Homepage:
- Size: 2.12 MB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
XLSX Assembler – ETL Tool for Merging Excel Data
![]()
## Architecture
![]()
[](https://forthebadge.com)
[](https://forthebadge.com)
[](https://forthebadge.com)

## Built With
This project was built using these technologies.
- Python
- Airflow
- Cron
- Redis
- Pandas
- Openpyxl
- PyQT5
- Docker## Features
**🚀 Efficient ETL Process**
Automates the extraction, transformation, and loading (ETL) of data from multiple Excel files using Airflow.\
(Only specific excel structure)**📊 Advanced Data Processing**
Leverages the power of Pandas and Openpyxl for fast and accurate data reading, processing, and styling.
**💻 Intuitive GUI with PyQt5**
Includes a user-friendly graphical interface for selecting files and tracking real-time progress.
**⚡ Performance Optimization**
Optimized for reduced system load and faster data processing using Redis, ensuring efficient handling of large datasets.
## Getting Started
Prerequisites:
- `Python` and `Docker` installed on your machine## 🛠 Installation and Setup Instructions
1. Clone the repository:
`git clone https://github.com/NickLitwinow/XLSXAssembler_Public.git`2. Navigate into the `src` directory `cd src/`
4. (Terminal 1) Run the ETL client:
`python app.py`5. (Terminal 2) Build the Docker image (`sudo` may require):
`docker build . --tag extending_airflow:latest`6. (Terminal 2) Run `docker-compose up -d` command to start docker services.
8. (Terminal 2) (Optional) Run `docker-compose down -v` command to end docker services.The PyQt5 GUI will launch, where you can select multiple Excel files and begin the ETL process.
*Runs the app in the development mode.*## Usage Instructions Example
1. In the ETL client click `Add File` button and select files from the `example files` (You can add them again later if you want so)
2. (Optional) To remove a file from selected, click on it's path (element) in the black selection window. Click `Remove File` to remove the file.
3. Click `Merge Files` to name the output file and choose it's destination. The ETL process will start afterwards.
4. To view the Airflow Dag process:
- Open `http://localhost:8080/home` in your browser.
- Enter Login: `airflow` and Password: `airflow`.
- (Info) If you just ran the `docker-compose up -d` it may take some time for airflow to load.
6. To view the Radis database:
- Open `http://localhost:8001/` in your browser.
- Accept "EULA and Privacy Settings"
- Click `I already have a database`
- Click `Connect to a Radis Database` with Host: `redis`, Port: `6379`, Name: `redis-local`
- Click `ADD REDIS DATABASE`
- Select the `redis-local` database.
### Show your supportGive a ⭐ if you like this project!