An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with extract-transform-load

A curated list of projects in awesome lists tagged with extract-transform-load .

https://github.com/networktocode/diffsync

A utility library for comparing and synchronizing different datasets.

data-synchronization diffsync etl extract-transform-load python source-of-truth synchronization

Last synced: 15 May 2025

https://github.com/python-bonobo/bonobo-sqlalchemy

PREVIEW - SQL databases in Bonobo, using sqlalchemy

bonobo data-processing databases extract-transform-load python3 sqlalchemy

Last synced: 02 Mar 2026

https://github.com/python-bonobo/bonobo-docker

PREVIEW - Run Bonobo data processing graphs in docker containers.

bonobo containers data-processing docker extract-transform-load python3 runtime

Last synced: 06 May 2025

https://github.com/GreenInfo-Network/nyc-crash-mapper-etl-script

Extract, Transform, and Load script for fetching new data from the NYC Open Data Portal's vehicle collision data and loading into the NYC Crash Mapper table on CARTO.

carto carto-api extract-transform-load heroku-scheduler nyc open-data python socrata-api

Last synced: 03 Apr 2025

https://github.com/sayamalt/amazon-products-api-etl-and-ml-pipeline

In this project, I've created an end-to-end ETL pipeline and subsequently developed a machine learning model to predict the price of Amazon products based on several product-related features.

apache-spark azure-data-factory azure-data-lake-storage-gen2 azure-databricks data-ingestion delta-lake etl-pipeline extract-transform-load feature-engineering linear-regression machine-learning model-training-and-evaluation regression-models spark-mllib spark-sql

Last synced: 20 May 2026

https://github.com/r-mahesh45/hr---resume-text-classification

Text Classification for Resumes: Conducted Exploratory Data Analysis (EDA) on a vast collection of resumes. Organized the data using Bag of Words (BoW) and TF-IDF techniques. Built and evaluated multiple models, with Logistic Regression delivering standout performance. Created Word Clouds and Histograms.

data datacleaning extract-transform-load feature-extraction nlp nltk-tokenizer text-mining text-processing

Last synced: 12 Sep 2025

https://github.com/R-Mahesh45/HR---Resume-Text-Classification

Text Classification for Resumes: Conducted Exploratory Data Analysis (EDA) on a vast collection of resumes. Organized the data using Bag of Words (BoW) and TF-IDF techniques. Built and evaluated multiple models, with Logistic Regression delivering standout performance. Created Word Clouds and Histograms.

data datacleaning extract-transform-load feature-extraction nlp nltk-tokenizer text-mining text-processing

Last synced: 13 Oct 2025

https://github.com/pawsanie/steam_statistics_etl

This pipeline can be used to collect statistical information about all games, distributed through the Steam platform.

crawler-python data-crawler etl etl-pipeline extract-transform-load games luigi python python-3 python3 scraper scraping scraping-websites statistics steam steam-games steam-store steam-web-api

Last synced: 18 Apr 2026

https://github.com/ryanfranklin237/data-cleansing

A group of python scripts that clean large data sets by removing duplicate data, putting data in correct formats, and removing redundant cells

data-analysis data-cleaning data-science extract-transform-load pandas-dataframe python

Last synced: 23 Jun 2026

https://github.com/r-mahesh45/svm-classification-models-for-salary-data-and-forest-fire-size

This project uses SVM to classify salary categories and forest fire sizes. GridSearchCV is applied for hyperparameter tuning, achieving high accuracy on both datasets.

classification extract-transform-load machine-learning-algorithms python3 svm

Last synced: 16 May 2026

https://github.com/filip-kustura/data-warehouse-olympics

This project, part of the elective Advanced Database Systems course, involved building a data warehouse based on the already existing database in PostgreSQL. It focuses on analyzing Olympic Games data across time, covering athletes' performance by discipline, location, and other dimensions. Implemented in Spring 2022.

data-analysis data-warehouse database extract-transform-load olympic-games postgresql sql star-schema university-project

Last synced: 01 May 2026

https://github.com/r-mahesh45/salary-prediction-using-naive-bayes

This project uses the Naive Bayes classification algorithm to predict an individual's salary based on features like age, education, occupation, and more. It evaluates model accuracy on training and test datasets. The model achieved a 77% accuracy on both sets.

extract-transform-load machine-learning-algorithms naive-bayes-algorithm python3

Last synced: 27 Apr 2026

https://github.com/sayamalt/tmdb-movies-end-to-end-etl-and-ml-pipeline

This project encompasses end-to-end ETL and ML pipeline development. Data ingestion from TMDB API covered top-rated, current, upcoming, and popular movies with genres. Performed EDA to derive several valuable insights and observations. Developed a regression model with 97% r2 score to predict average movie ratings accurately.

azure-databricks azure-key-vault data-ingestion data-transformation data-visualization etl-pipeline exploratory-data-analysis extract-transform-load feature-engineering mlflow mlflow-tracking model-training-and-evaluation pyspark-mllib regression-models spark

Last synced: 15 May 2026

https://github.com/praveendecode/data_pipeline

Implemented ETL projects with interactive Streamlit UI for user-friendly data extraction, transformation, and loading tasks

data-harvesting data-warehousing database database-management extract-transform-load mysql postgresql python

Last synced: 14 Apr 2026

https://github.com/fistgang/etl_vs_elt

Comparison between ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform)

data-pipeline extract-load-transform extract-transform-load

Last synced: 31 Jan 2026

https://github.com/r-mahesh45/text-mining-assignment

This project performs sentiment analysis on Elon Musk's tweets and emotion mining on product reviews from an e-commerce website. It involves data preprocessing techniques such as stemming, lemmatization, and removing stop words. The goal is to extract meaningful insights and classify text based on sentiment and emotion.

extract-transform-load lemmatization nltk-python python3 text-mining

Last synced: 15 Apr 2026

https://github.com/maxinexiong/cloud-data-warehousing-with-aws-redshift

This project builds a cloud-based ETL pipeline for Sparkify to move data to a cloud data warehouse. It extracts song and user activity data from AWS S3, stages it in Redshift, and transforms it into a star-schema data model with fact and dimension tables, enabling efficient querying to answer business questions.

aws-boto3 aws-redshift aws-s3 cloud-data-warehouse data-engineering data-warehouse data-warehousing dimensional-model dimensional-modeling etl etl-pipeline extract-transform-load infrastructure-as-code postgresql postgresql-database redshift-cluster

Last synced: 27 Feb 2026

https://github.com/dfornika/ncov-db

Store SARS-CoV-2 genomic analysis results from ncov2019-artic-nf and ncov-tools to a sqlite DB

data-management extract-transform-load sars-cov-2 sqlite

Last synced: 18 Apr 2026

https://github.com/jv456/network-security-system

This project is about creating a powerful network security system using machine learning and cloud technologies.

data-ingestion data-transformation data-validation extract-transform-load model-evaluation model-training

Last synced: 14 May 2026

https://github.com/r-mahesh45/zoo-and-glass-classification-using-knn

This project uses a K-Nearest Neighbors (KNN) classifier to categorize animals and classify glass types based on various features, with data preprocessing, model training, and accuracy evaluation through cross-validation.

extract-transform-load knn knn-algorithm python3

Last synced: 28 Apr 2026

https://github.com/r-mahesh45/fraud-detection-and-sales-analysis-using-random-forest

This project uses Random Forest to classify fraud risk based on taxable income and analyze key factors driving high sales for a cloth manufacturing company.

classification data-visualization extract-transform-load python3 random-forest

Last synced: 30 Apr 2026

https://github.com/bayu-siddhi/etl-new-student-admissions

Data Lakehouse course final project (5th semester). This project implements ETL (Extract, Transform, Load) pipeline using Pentaho Data Integration (Kettle) to build a data warehouse focused on new student admissions data from three sources.

duckdb extract-transform-load minio pentaho-data-integration sqlserver student-admission tsql

Last synced: 20 May 2026

https://github.com/r-mahesh45/association-rule-mining-using-apriori-algorithm

This project applies the Apriori algorithm to generate association rules from transaction datasets. It explores the impact of varying support, confidence, and minimum length parameters on rule generation. Results are visualized using scatterplots, heatmaps, and bar charts for better insights.

apriori-algorithm extract-transform-load python3

Last synced: 30 Apr 2026