https://github.com/dmarks84/coursework_capstone_full_data_engineering

Final Project for IBM Data Engineering & Python Professional Certificate -- Applied all skills and methods utilized in the series of courses for this certification
https://github.com/dmarks84/coursework_capstone_full_data_engineering

apache-airflow apache-hadoop apache-kafka apache-spark api beautifulsoup cassandra dags etl mongodb nosql pandas plotly postgresql python scipy seaborn sql

Last synced: 4 months ago
JSON representation

Final Project for IBM Data Engineering & Python Professional Certificate -- Applied all skills and methods utilized in the series of courses for this certification

Host: GitHub
URL: https://github.com/dmarks84/coursework_capstone_full_data_engineering
Owner: dmarks84
License: bsd-3-clause
Created: 2024-01-17T17:04:11.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-01-17T23:27:56.000Z (over 2 years ago)
Last Synced: 2025-03-14T07:45:48.562Z (over 1 year ago)
Topics: apache-airflow, apache-hadoop, apache-kafka, apache-spark, api, beautifulsoup, cassandra, dags, etl, mongodb, nosql, pandas, plotly, postgresql, python, scipy, seaborn, sql
Language: Jupyter Notebook
Homepage:
Size: 4.25 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## Project(CapstoneProject_Full_Data_Engineering)
### Part of the Coursera series: IBM Data Engineering & Python

## Summary
In this project, I applied all of the skills and knowledge gained during the courses leading up to it. We were tasked with taking in OLTP data via reading a .csv file as well as querying a SQL (MySQL) database. This data was then exported for additional querying and manipulatoin in a NoSQL database (MongoDB). We then agglomerated the data in a datawarehoues and performed addional SQL queries and manipulation, this time using PostgreSQL. On the data, we created some visualizations before setting up a pipeline to handle automation of ETL going forward, and we ended the project by developing an automated process to create a machine learning model to predict future behavior.

## Skills (Developed & Applied)
Programming, Python, RDBMS & SQL, SQL (MySQL), SQL (PostgreSQL), SQL (SQLite), NoSQL (Cassandra), NoSQL (MongoDB), Databases, Statistics, Probability, Linear Algebra, SciPy, Numpy, Pandas, Seaborn, Matplotlib, Plotly, BeautifulSoup, Dataframes, ETL &| ELT & Data Pipelines, DAGs, Apache Airflow, Apache Kafka, Apache Spark, Apache Hadoop, Automation, Linux/Bash/Shell Commands, Webscraping, APIs, Data Modeling, EDA, Data Visualization, Data Summarization, Data Reporting, Regression, Supervised ML, Communication, Technical Writing

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dmarks84/coursework_capstone_full_data_engineering

Awesome Lists containing this project

README