https://github.com/pcpp94/raw_etl_pipeline
A streamlined ETL solution for ingesting and processing legacy data formats with minimal resources. Includes daily and weekly .bat scripts on Task Scheduler for automated extraction, cleaning, and normalization, turning complex files into structured data effortlessly.
https://github.com/pcpp94/raw_etl_pipeline
etl legacy proof-of-concept
Last synced: 12 months ago
JSON representation
A streamlined ETL solution for ingesting and processing legacy data formats with minimal resources. Includes daily and weekly .bat scripts on Task Scheduler for automated extraction, cleaning, and normalization, turning complex files into structured data effortlessly.
- Host: GitHub
- URL: https://github.com/pcpp94/raw_etl_pipeline
- Owner: pcpp94
- License: other
- Created: 2024-11-11T14:31:35.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-11T14:48:24.000Z (over 1 year ago)
- Last Synced: 2025-01-26T19:11:14.281Z (over 1 year ago)
- Topics: etl, legacy, proof-of-concept
- Language: Python
- Homepage:
- Size: 32.2 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: copying_outputs.bat
Awesome Lists containing this project
README
**Automated ETL Pipeline for Legacy Data Formats on Windows VM**
A robust ETL repository designed to handle legacy data formats and perform automated data ingestion with minimal resources. This repository integrates various data extraction and cleaning tools for complex formats, including Excel parsing, secure web scraping, and custom data normalization. It includes .bat scripts that run daily and weekly via Task Scheduler on a Windows VM, demonstrating effective, resource-efficient automation for legacy document processing and ingestion into structured, analysis-ready formats. Ideal for modernizing data workflows with minimal manual effort.