Projects in Awesome Lists by dmarks84
A curated list of projects in awesome lists by dmarks84 .
https://github.com/dmarks84/coursework_project_banks-web-scraping-sql
Project for IBM Data Engineering & Python course on ETL & Big Data -- Scraped website data and made API calls for additional data; wrangled and transformed this data and loaded into a SQL database.
apis beautifulsoup databases elt etl nosql numpy pandas pipelines python sql sqlite web-scraping
Last synced: 10 Apr 2026
https://github.com/dmarks84/coursework_project_airfoil-noise-prediction
Project for IBM Data Engineering & Python course on ML & AI -- Created predictions for noise of an airfoil based on various physical features
apache-spark api automation data-modeling etl linear-algebra numpy pandas pipelines python regression statistics supervised-ml
Last synced: 13 Apr 2026
https://github.com/dmarks84/ind_project_california-housing-data--kaggle
Independent Project - Kaggle Dataset-- I worked on the California Housing dataset, performing data cleaning and preparation; exploratory data analysis; feature engineering; regression model buildings; model evaluation.
cross-validation data-modeling data-reporting data-visualization eda folium grid-search matplotlib model-evaluation numpy pandas pca python seaborn sklearn statistics supervised-ml unsupervised-ml
Last synced: 08 Apr 2026
https://github.com/dmarks84/coursework_coursework_project_automobile-sales-visualization
Project for IBM Data Science course on Visualization & Dashboards -- Analyzed historical sales data, performing EDA and setting up an interactive dashboard
communication dash dashboards data-modeling elt etl folium matplotlib numpy pandas pipelines plotly python scipy seaborn visualization
Last synced: 10 Apr 2026
https://github.com/dmarks84/coursework_project_ml-classifier-eval-selection
Project for University of Michigan Applied Data Science Specialization -- Predicted viewer engagement based on features related to video metrics; evaluated a large set of classifiers under different scoring metrics to select the "optimal" one.
classification cross-validation data-modeling data-reporting data-visualization databases dataframes eda grid-search matplotlib numpy pandas python scikit-learn statistics supervised-ml
Last synced: 02 Apr 2026
https://github.com/dmarks84/coursework_project_data-analysis-apache-spark
Project for IBM Data Engineering & Python course on ETL & Big Data -- Read in data, wrote to SQL database and performed queries, performed statistical analysis and issued reports
apache-sprk automation dag data-modeling eda elt etl numpy pandas pipelines python sql statistics visualization
Last synced: 11 Apr 2026
https://github.com/dmarks84/ind_project_mall-customer-clustering--kaggle
Independent Project - Kaggle Dataset-- I worked with the Mall Customer Segmentation Dataset, which provided a various instances of shoppers of different ages, incomes, etc. I utilized unsupervised ML clustering algorithms to identify useful customer segments.
clustering dataframes dbscan kmeans-clustering market-segmentation mean-shift pandas python sklearn technical-analysis technical-communication unsupervised-ml
Last synced: 12 Apr 2026
https://github.com/dmarks84/coursework_capstone_spacex_predictions
Final Project for IBM Data Science Professional Certificate -- Applied all skills and methods utilized in the series of courses for this certification to predict the success of SpaceX landings; issued full report to stakeholders
api classification dash eda folium linear-algebra matplotlib mysql numpy pandas plotly probability python seaborn sql statistics supervised-ml technical-writing web-scraping
Last synced: 08 Apr 2026
https://github.com/dmarks84/coursework_project_boston-data-project
Project for IBM Data Science course on Statistics -- Read in a large data set and performed several statistical analyses and hypothesis testing
communication data-modeling data-reporting dataframes eda hypothesis-testing matplotlib numpy pandas probability python scipy seaborn statistics visualization
Last synced: 08 Apr 2026
https://github.com/dmarks84/ind_project_superstore-sales-time-series-analysis--kaggle
Independent Project - Kaggle Dataset-- I worked on the Superstore Sales Dataset, performing (as Part 1) data cleaning and preparation and exploratory data analysis. The main task was to make predictions for future sales based on time-series analysis, which is found in Part 2.
chloropleth data-modeling data-visualization eda linear-regression matplotlib numpy pandas python seaborn sklearn statistics statsmodels supervised-ml time-series-analysis
Last synced: 09 Apr 2026
https://github.com/dmarks84/ind_project_european-soccer-top-points-contributors--kaggle
Independent Project - Kaggle Dataset-- I worked on the European Soccer Dataset, using SQL (SQLite) to read in the data and then data wrangling before running statistical analysis and hypothesis testing on questions of who helps earn the most points for their team.
data-wrangling hypothesis-testing numpy p-values pandas python scipy-stats statistics t-test
Last synced: 09 Apr 2026
https://github.com/dmarks84/coursework_project_apache-airflow-kafka-on-toll-booth-data
Project for IBM Data Engineering & Python course on ETL & Big Data -- Read in live toll booth data, wrangles and transformed, and wrote into a SQL database
apache-airflow apache-kafka automation dags data-modeling databases eda elt etl mysql numpy pandas pipelines python sql
Last synced: 11 Apr 2026
https://github.com/dmarks84/coursework_project_linux-file-backup
Project for IBM Data Engineering & Python course on Linux & Shell Scripts -- Wrote and executed bash scripts to manipulate folders and files to create a full directory backup with automation using crontab
automation bash crontab elt etl linux pipelines python shell-scripts
Last synced: 12 Apr 2026
https://github.com/dmarks84/ind_project_obesity-multi-class-classification--kaggle
Independent Project - Kaggle Competition -- I worked on the obesity classification data set as part of a Kaggle Competition of the same name, scoring (for accuracy) above 0.9
classification correlation-analysis cross-validation data-modeling data-visualization dataframes eda gridsearchcv matplotlib multiclass-classification numpy pandas python seaborn sklearn statistics supervised-ml
Last synced: 11 Apr 2026
https://github.com/dmarks84/coursework_project_text-mining-spam-analysis
Project for University of Michigan Applied Data Science Specialization -- Performed NLP in order to build features of email messages; trained various classification models to help predict if a message was spam.
classification databases eda nlp numpy pandas python scikit seaborn sentiment-analysis statistics supervised-ml text-mining unsupervised-ml visualization
Last synced: 11 Apr 2026
https://github.com/dmarks84/coursework_project_ml-classification
Project for IBM Data Science course on Machine Learning -- Trained ML models for classification, evaluating based on a variety of metrics
classification communication data-modeling dataframes numpy pandas python scikit-learn supervised-ml
Last synced: 11 Apr 2026
https://github.com/dmarks84/ind_project_new-topic-nlp-analysis-classification--kaggle
Independent Project - Kaggle Dataset-- I worked with the News Category Dataset, which provided a headline and description, etc. in .json format; used NLTK for NLP, tokenizing, lemmatizing, and finding part-of-speech; trained and tuned parameters on classifier models to predict news category based on headline text.
classification hyperparameter-tuning json lemmatization model-evaluation model-refinement nlp nltk pandas python sklearn supervised-ml
Last synced: 11 Apr 2026
https://github.com/dmarks84/coursework_project_text-mining-topic-modeling
Project for University of Michigan Applied Data Science Specialization -- Developed functions to score similarity between text passages.
data-modeling data-reporting data-visualization databases eda nlp numpy pandas python statistics text-mining
Last synced: 12 Apr 2026
https://github.com/dmarks84/coursework_project_sentiment-analysis
Project for University of Michigan Python Programming Specialization -- Read in tweets and analyzed their content to perform basic sentiment analysis
classification programming python sentiment-analysis statistics web-scraping
Last synced: 09 Apr 2025
https://github.com/dmarks84/coursework_project_nlp-with-nltk
Project for University of Michigan Applied Data Science Specialization -- Utilized NLTK library to process natural language, and then built several spelling recommenders for a list of misspelled words.
data-modeling databases dataframes eda nlp numpy pandas python reporting statistics text-mining visualization
Last synced: 13 Apr 2026
https://github.com/dmarks84/coursework_project_network-analysis-node-link-prediction
Project for University of Michigan Applied Data Science Specialization -- Analyzed network nodes and edges, developing custom features based on various scoring metrics; used features to train classifier model to predict node attribute (employee salary type) and future edges (employee connections)
classification cross-validation data-reporting databases eda grid-search matplotlib network-analysis numpy pandas python scikit-learn statistics supervised-ml visualization
Last synced: 13 Apr 2026
https://github.com/dmarks84/professional_certifications
A full set of the certificates achieved my the work I completed as part of various Professional Certifications, Specializations, Courses, and Projects.
Last synced: 21 Jan 2026
https://github.com/dmarks84/coursework_project_image-text-recognition
Project for University of Michigan Python Programming Specialization -- Read in documents with images and text, and utilized CV libraries/packages to extract specific types of images and text, pairing them together
classification computer-vision image-classification numpy pandas programming python text-classification
Last synced: 14 Apr 2026
https://github.com/dmarks84/coursework_capstone_full_data_engineering
Final Project for IBM Data Engineering & Python Professional Certificate -- Applied all skills and methods utilized in the series of courses for this certification
apache-airflow apache-hadoop apache-kafka apache-spark api beautifulsoup cassandra dags etl mongodb nosql pandas plotly postgresql python scipy seaborn sql
Last synced: 25 Feb 2026
https://github.com/dmarks84/ind_project_readme-generator
Independent (personal) project in which I automatically generate README files for each of my repositories from my coursework
dataframes etl numpy pandas programming python
Last synced: 29 Apr 2026
https://github.com/dmarks84/ind_project_movie-database-sqlite
Independent Project - I joined and manipulated data from disparate tables of movie information using Python & SQLite; defined schema, created tables/views, queried data, etc. Utilized CTE's, Window Functions, and other DDL, DQL, DML, and DCL scripts.
advanced-sql cte databases dcl ddl dml dql group-by joins python query sql sqlite tables views window-functions
Last synced: 02 May 2026
https://github.com/dmarks84/ind_project_data-science-london-scikit-learn--kaggle
Independent Project - Kaggle Competition -- I worked on the Data Science London data set for the Data Science London + Scikit-learn competition.
classification cross-validation data-modeling data-reporting data-visualization dataframes eda grid-search matplotlib numpy pandas python sklearn statistics supervised-ml
Last synced: 06 Apr 2026
https://github.com/dmarks84/ind_project_docker-image-pnw-weather-app
Independent Project - I created a Docker image that stands up a website that live weather alerts on an interactive map.
api dash devops docker docker-images dockerfile folium geopandas json plotly python requests webapp websites
Last synced: 05 May 2026
https://github.com/dmarks84/ibm_ds
A temporary repository for the work I'm doing in the IBM Data Science course
Last synced: 09 Apr 2025
https://github.com/dmarks84/ibm-ds-capstone
Files for my capstone project for the IBM Data Science Professional Certificate
Last synced: 09 Apr 2025
https://github.com/dmarks84/coursework_project_ml-model-eval-refine
Project for IBM Data Science course on ML Models & Analysis -- Read in large dataset of home sales and utilized polynomial linear regression analysis to make predictions of future home sales prices
classification communication data-modeling dataframes machine-learning matplotlib numpy pandas programming python regression scikit-learn scipy seaborn supervised-ml visualization
Last synced: 09 Apr 2026