{"id":15056707,"url":"https://github.com/dmarks84/coursework_capstone_full_data_engineering","last_synced_at":"2026-02-25T23:05:09.202Z","repository":{"id":217712129,"uuid":"744617923","full_name":"dmarks84/Coursework_Capstone_Full_Data_Engineering","owner":"dmarks84","description":"Final Project for IBM Data Engineering \u0026 Python Professional Certificate -- Applied all skills and methods utilized in the series of courses for this certification","archived":false,"fork":false,"pushed_at":"2024-01-17T23:27:56.000Z","size":4459,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-14T07:45:48.562Z","etag":null,"topics":["apache-airflow","apache-hadoop","apache-kafka","apache-spark","api","beautifulsoup","cassandra","dags","etl","mongodb","nosql","pandas","plotly","postgresql","python","scipy","seaborn","sql"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dmarks84.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2024-01-17T17:04:11.000Z","updated_at":"2024-03-05T21:13:43.000Z","dependencies_parsed_at":"2024-01-18T01:28:01.965Z","dependency_job_id":null,"html_url":"https://github.com/dmarks84/Coursework_Capstone_Full_Data_Engineering","commit_stats":null,"previous_names":["dmarks84/capstoneproject_full_data_engineering","dmarks84/coursework_capstone_full_data_engineering"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmarks84%2FCoursework_Capstone_Full_Data_Engineering","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmarks84%2FCoursework_Capstone_Full_Data_Engineering/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmarks84%2FCoursework_Capstone_Full_Data_Engineering/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmarks84%2FCoursework_Capstone_Full_Data_Engineering/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dmarks84","download_url":"https://codeload.github.com/dmarks84/Coursework_Capstone_Full_Data_Engineering/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243544665,"owners_count":20308168,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-airflow","apache-hadoop","apache-kafka","apache-spark","api","beautifulsoup","cassandra","dags","etl","mongodb","nosql","pandas","plotly","postgresql","python","scipy","seaborn","sql"],"created_at":"2024-09-24T21:55:18.047Z","updated_at":"2025-11-07T15:06:16.022Z","avatar_url":"https://github.com/dmarks84.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Project(CapstoneProject_Full_Data_Engineering)\n### Part of the Coursera series: IBM Data Engineering \u0026 Python\n    \n## Summary\nIn this project, I applied all of the skills and knowledge gained during the courses leading up to it.   We were tasked with taking in OLTP data via reading a .csv file as well as querying a SQL (MySQL) database.  This data was then exported for additional querying and manipulatoin in a NoSQL database (MongoDB).  We then agglomerated the data in a datawarehoues and performed addional SQL queries and manipulation, this time using PostgreSQL.  On the data, we created some visualizations before setting up a pipeline to handle automation of ETL going forward,  and we ended the project by developing an automated process to create a machine learning model to predict future behavior.\n\n## Skills (Developed \u0026 Applied)\nProgramming, Python, RDBMS \u0026 SQL, SQL (MySQL), SQL (PostgreSQL), SQL (SQLite), NoSQL (Cassandra), NoSQL (MongoDB), Databases, Statistics, Probability, Linear Algebra, SciPy, Numpy, Pandas, Seaborn, Matplotlib, Plotly, BeautifulSoup, Dataframes, ETL \u0026| ELT \u0026 Data Pipelines, DAGs, Apache Airflow, Apache Kafka, Apache Spark, Apache Hadoop, Automation, Linux/Bash/Shell Commands, Webscraping, APIs, Data Modeling, EDA, Data Visualization, Data Summarization, Data Reporting, Regression, Supervised ML, Communication, Technical Writing\n    ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmarks84%2Fcoursework_capstone_full_data_engineering","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdmarks84%2Fcoursework_capstone_full_data_engineering","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmarks84%2Fcoursework_capstone_full_data_engineering/lists"}