Projects in Awesome Lists tagged with extract-transform-load
A curated list of projects in awesome lists tagged with extract-transform-load .
https://github.com/python-bonobo/bonobo
Extract Transform Load for Python 3.5+
automation bonobo data-processing extract-transform-load parallelization python3
Last synced: 14 May 2025
https://github.com/networktocode/diffsync
A utility library for comparing and synchronizing different datasets.
data-synchronization diffsync etl extract-transform-load python source-of-truth synchronization
Last synced: 15 May 2025
https://github.com/fab2s/yaetl
Yet Another ETL in PHP
etl extract-transform-load extractor flow joiner loader php php-etl qualifier transformer workflow
Last synced: 02 Aug 2025
https://github.com/python-bonobo/bonobo-sqlalchemy
PREVIEW - SQL databases in Bonobo, using sqlalchemy
bonobo data-processing databases extract-transform-load python3 sqlalchemy
Last synced: 02 Mar 2026
https://github.com/python-bonobo/bonobo-docker
PREVIEW - Run Bonobo data processing graphs in docker containers.
bonobo containers data-processing docker extract-transform-load python3 runtime
Last synced: 06 May 2025
https://github.com/abrahamkoloboe27/airflow-pipeline-dashboard-compagnie-aerienne
Lien de l'application
airflow atlas data-engineering docker docker-compose dockerfile duckdb etl etl-pipeline etl-pipelines extract-transform-load makefile mongodb mongodb-atlas orchestration postgresql python streamlit streamlit-dashboard
Last synced: 13 Apr 2025
https://github.com/python-bonobo/bonobo-selenium
PRE-ALPHA - Write web crawlers using Bonobo
automation bonobo browser crawling data-processing extract-transform-load python3 selenium
Last synced: 06 May 2025
https://github.com/GreenInfo-Network/nyc-crash-mapper-etl-script
Extract, Transform, and Load script for fetching new data from the NYC Open Data Portal's vehicle collision data and loading into the NYC Crash Mapper table on CARTO.
carto carto-api extract-transform-load heroku-scheduler nyc open-data python socrata-api
Last synced: 03 Apr 2025
https://github.com/abrahamkoloboe27/random-user-streaming-pipeline
Data Engeenering Project - Data Pipeline
airflow airflow-dags api docker docker-compose etl etl-pipeline extract-transform-load kafka kafka-consumer kafka-producer makefile orchestration postgresql python schema-registry spark spark-streaming zookeeper
Last synced: 13 Apr 2026
https://github.com/sayamalt/amazon-products-api-etl-and-ml-pipeline
In this project, I've created an end-to-end ETL pipeline and subsequently developed a machine learning model to predict the price of Amazon products based on several product-related features.
apache-spark azure-data-factory azure-data-lake-storage-gen2 azure-databricks data-ingestion delta-lake etl-pipeline extract-transform-load feature-engineering linear-regression machine-learning model-training-and-evaluation regression-models spark-mllib spark-sql
Last synced: 20 May 2026
https://github.com/r-mahesh45/hr---resume-text-classification
Text Classification for Resumes: Conducted Exploratory Data Analysis (EDA) on a vast collection of resumes. Organized the data using Bag of Words (BoW) and TF-IDF techniques. Built and evaluated multiple models, with Logistic Regression delivering standout performance. Created Word Clouds and Histograms.
data datacleaning extract-transform-load feature-extraction nlp nltk-tokenizer text-mining text-processing
Last synced: 12 Sep 2025
https://github.com/andreasscherbaum/faa
FAA Airline On-Time Performance Data
airline database etl extract-transform-load faa gpfdist greenplum greenplum-database otp otp-staging postgresql sql staging-tables
Last synced: 20 Apr 2026
https://github.com/vaxdata22/water-quality-dw-on-sql-server
This is an MSSQL Data Warehouse and ETL implementation on specially formatted Water Quality dataset from DEFRA, UK
advanced-sql data-cleaning data-transformation data-warehousing database-schema dimension-tables etl extract-transform-load fact-table jupyter-notebook microsoft-sql-server pandas-dataframe pyodbc python sql-server-management-studio staging-area t-sql
Last synced: 17 Mar 2026
https://github.com/R-Mahesh45/HR---Resume-Text-Classification
Text Classification for Resumes: Conducted Exploratory Data Analysis (EDA) on a vast collection of resumes. Organized the data using Bag of Words (BoW) and TF-IDF techniques. Built and evaluated multiple models, with Logistic Regression delivering standout performance. Created Word Clouds and Histograms.
data datacleaning extract-transform-load feature-extraction nlp nltk-tokenizer text-mining text-processing
Last synced: 13 Oct 2025
https://github.com/pawsanie/steam_statistics_etl
This pipeline can be used to collect statistical information about all games, distributed through the Steam platform.
crawler-python data-crawler etl etl-pipeline extract-transform-load games luigi python python-3 python3 scraper scraping scraping-websites statistics steam steam-games steam-store steam-web-api
Last synced: 18 Apr 2026
https://github.com/ryanfranklin237/data-cleansing
A group of python scripts that clean large data sets by removing duplicate data, putting data in correct formats, and removing redundant cells
data-analysis data-cleaning data-science extract-transform-load pandas-dataframe python
Last synced: 23 Jun 2026
https://github.com/willie-conway/meta-database-engineer-portfolio
"π Portfolio showcasing projects and skills acquired through the Meta Database Engineer Professional Certificate programπ, including data modeling π, SQL analysis π, and data visualization π."
big-data-technology business-intelligence-tools cloud-database-solutions data-governance-and-compliance data-integrity-and-constraints data-modeling data-normalization-and-denormalization data-visualization-techniques data-warehousing database-backup-and-recovery-strategies database-design database-security-best-practices extract-transform-load indexing-and-query-optimization nosql-databases performance-tuning real-time-data-processing relational-database-management-system scripting-for-database-automation sql-querying
Last synced: 30 Jun 2025
https://github.com/r-mahesh45/svm-classification-models-for-salary-data-and-forest-fire-size
This project uses SVM to classify salary categories and forest fire sizes. GridSearchCV is applied for hyperparameter tuning, achieving high accuracy on both datasets.
classification extract-transform-load machine-learning-algorithms python3 svm
Last synced: 16 May 2026
https://github.com/filip-kustura/data-warehouse-olympics
This project, part of the elective Advanced Database Systems course, involved building a data warehouse based on the already existing database in PostgreSQL. It focuses on analyzing Olympic Games data across time, covering athletes' performance by discipline, location, and other dimensions. Implemented in Spring 2022.
data-analysis data-warehouse database extract-transform-load olympic-games postgresql sql star-schema university-project
Last synced: 01 May 2026
https://github.com/r-mahesh45/salary-prediction-using-naive-bayes
This project uses the Naive Bayes classification algorithm to predict an individual's salary based on features like age, education, occupation, and more. It evaluates model accuracy on training and test datasets. The model achieved a 77% accuracy on both sets.
extract-transform-load machine-learning-algorithms naive-bayes-algorithm python3
Last synced: 27 Apr 2026
https://github.com/sayamalt/tmdb-movies-end-to-end-etl-and-ml-pipeline
This project encompasses end-to-end ETL and ML pipeline development. Data ingestion from TMDB API covered top-rated, current, upcoming, and popular movies with genres. Performed EDA to derive several valuable insights and observations. Developed a regression model with 97% r2 score to predict average movie ratings accurately.
azure-databricks azure-key-vault data-ingestion data-transformation data-visualization etl-pipeline exploratory-data-analysis extract-transform-load feature-engineering mlflow mlflow-tracking model-training-and-evaluation pyspark-mllib regression-models spark
Last synced: 15 May 2026
https://github.com/praveendecode/data_pipeline
Implemented ETL projects with interactive Streamlit UI for user-friendly data extraction, transformation, and loading tasks
data-harvesting data-warehousing database database-management extract-transform-load mysql postgresql python
Last synced: 14 Apr 2026
https://github.com/leftcoastnerdgirl/extract_transform_load
This mini project introduces data cleaning through ETL
data-cleaning etl extract-transform-load json merge-sort numpy pandas-dataframe pandas-python
Last synced: 07 May 2026
https://github.com/fistgang/etl_vs_elt
Comparison between ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform)
data-pipeline extract-load-transform extract-transform-load
Last synced: 31 Jan 2026
https://github.com/r-mahesh45/text-mining-assignment
This project performs sentiment analysis on Elon Musk's tweets and emotion mining on product reviews from an e-commerce website. It involves data preprocessing techniques such as stemming, lemmatization, and removing stop words. The goal is to extract meaningful insights and classify text based on sentiment and emotion.
extract-transform-load lemmatization nltk-python python3 text-mining
Last synced: 15 Apr 2026
https://github.com/maxinexiong/cloud-data-warehousing-with-aws-redshift
This project builds a cloud-based ETL pipeline for Sparkify to move data to a cloud data warehouse. It extracts song and user activity data from AWS S3, stages it in Redshift, and transforms it into a star-schema data model with fact and dimension tables, enabling efficient querying to answer business questions.
aws-boto3 aws-redshift aws-s3 cloud-data-warehouse data-engineering data-warehouse data-warehousing dimensional-model dimensional-modeling etl etl-pipeline extract-transform-load infrastructure-as-code postgresql postgresql-database redshift-cluster
Last synced: 27 Feb 2026
https://github.com/kmohamedalie/python-project-for-data-engineering
Python Project for Data Engineering
data-engineering extract-transform-load python webscrapping
Last synced: 28 Oct 2025
https://github.com/dfornika/ncov-db
Store SARS-CoV-2 genomic analysis results from ncov2019-artic-nf and ncov-tools to a sqlite DB
data-management extract-transform-load sars-cov-2 sqlite
Last synced: 18 Apr 2026
https://github.com/jv456/network-security-system
This project is about creating a powerful network security system using machine learning and cloud technologies.
data-ingestion data-transformation data-validation extract-transform-load model-evaluation model-training
Last synced: 14 May 2026
https://github.com/r-mahesh45/zoo-and-glass-classification-using-knn
This project uses a K-Nearest Neighbors (KNN) classifier to categorize animals and classify glass types based on various features, with data preprocessing, model training, and accuracy evaluation through cross-validation.
extract-transform-load knn knn-algorithm python3
Last synced: 28 Apr 2026
https://github.com/r-mahesh45/fraud-detection-and-sales-analysis-using-random-forest
This project uses Random Forest to classify fraud risk based on taxable income and analyze key factors driving high sales for a cloth manufacturing company.
classification data-visualization extract-transform-load python3 random-forest
Last synced: 30 Apr 2026
https://github.com/vaxdata22/water-quality-dw-on-oracle-database
This is an Oracle DB Data Warehouse and ETL implementation on specially formatted Water Quality dataset from DEFRA, UK
advanced-sql data-cleansing data-transformation data-warehouse database-schema dimension-tables etl extract-transform-load fact-table jupyter-notebook oracle-21c oracle-database oracle-sql-developer pandas-dataframe pl-sql pl-sql-cursors pyodbc python staging-area
Last synced: 30 Apr 2026
https://github.com/bayu-siddhi/etl-new-student-admissions
Data Lakehouse course final project (5th semester). This project implements ETL (Extract, Transform, Load) pipeline using Pentaho Data Integration (Kettle) to build a data warehouse focused on new student admissions data from three sources.
duckdb extract-transform-load minio pentaho-data-integration sqlserver student-admission tsql
Last synced: 20 May 2026
https://github.com/r-mahesh45/association-rule-mining-using-apriori-algorithm
This project applies the Apriori algorithm to generate association rules from transaction datasets. It explores the impact of varying support, confidence, and minimum length parameters on rule generation. Results are visualized using scatterplots, heatmaps, and bar charts for better insights.
apriori-algorithm extract-transform-load python3
Last synced: 30 Apr 2026
https://github.com/jennynzhuang/ca_state_water_climate_impact
California's Water Resources & Impact of Climate Variability
beautifulsoup data-visualization extract-transform-load jupyter python tableau webscraping
Last synced: 18 May 2026