https://github.com/pcpp94/demanda_from_messy_excels_thefuzz
A specialized tool for processing a designated folder of Excel files with varying sheets, formats, and inconsistent data types. This repository corrects file extensions, ensures accurate data types for each column, and compiles all data into a single, structured table with millions of rows, ready for analysis. Handles many inconsistent Excels.
https://github.com/pcpp94/demanda_from_messy_excels_thefuzz
etl excel
Last synced: about 1 year ago
JSON representation
A specialized tool for processing a designated folder of Excel files with varying sheets, formats, and inconsistent data types. This repository corrects file extensions, ensures accurate data types for each column, and compiles all data into a single, structured table with millions of rows, ready for analysis. Handles many inconsistent Excels.
- Host: GitHub
- URL: https://github.com/pcpp94/demanda_from_messy_excels_thefuzz
- Owner: pcpp94
- Created: 2024-11-11T14:14:34.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-11T14:30:00.000Z (over 1 year ago)
- Last Synced: 2025-01-26T19:11:14.276Z (over 1 year ago)
- Topics: etl, excel
- Language: Python
- Homepage:
- Size: 17.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
**Excel Parser and Data Normalizer for Large, Inconsistent Datasets**
A powerful tool for parsing and normalizing large collections of Excel files with varying sheet counts, formats, and column data types. This repository automatically corrects file extensions, detects and applies accurate data types, and consolidates all data into a single, clean, tabular format suitable for analysis—capable of handling millions of rows efficiently. Perfect for transforming complex Excel datasets into usable, structured tables.