Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/saltiola7/data-analysis-portfolio

Data engineering & analysis portfolio, which showcases my use of Python & SQL
https://github.com/saltiola7/data-analysis-portfolio

airflow airtable-block anaconda automation back4app chatgpt csv-parser data-analysis data-engineering docker-compose gcp graphql-api jupyter-notebook nosql prefect python rest-api sql streamlit web-scraping

Last synced: about 1 month ago
JSON representation

Data engineering & analysis portfolio, which showcases my use of Python & SQL

Awesome Lists containing this project

README

        

# Data Engineering & Analysis Portfolio

Welcome to my portfolio, which showcases a python & JavaScript web scraping ETL pipeline, Jupyter Notebooks analyzing many different datasets as well as data visualizations using Tableau.

## Table of Contents
- Data Engineer certification from Datacamp
- Recommendation Letter from Data Analyst Mentor
- Testimonial from Web Scraping Client
- Web Scraping ETL Pipeline
- Jupyter Notebooks
- Utility Scripts With Python for Google Sheets, Airtable, Shopify, ChatGPT
- Tableau Visualizations

## Data Engineer certification from [Datacamp](https://www.datacamp.com/)

## Recommendation Letter from [Data Analyst Lecturer](https://www.researchgate.net/profile/Elnaz-Gholipour)

## Testimonial from Web Scraping Client
**Vesa Karjalainen, Polq Oy:** I had the opportunity to work with Tommi on developing a critical scraping tool and server for our company. His technical expertise, innovative approach, and dedication to understanding our specific needs resulted in a seamless and efficient solution. The tool has significantly improved our data collection processes, demonstrating Tommi's ability to deliver high-quality work under tight deadlines. His professionalism and willingness to go the extra mile made a remarkable difference. I highly recommend Tommi to anyone looking for exceptional technical solutions in data management and infrastructure.

## [Web Scraping ETL](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main/Web-Scraper-ETL)
Scraping job board data from multiple websites to custom job board application.

It was first build with the community Docker Compose setup, but was moved to Prefect.io before launch as it was a more streamlined solution for the client.

## [Jupyter Notebooks](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main//Notebooks)

I use various python data science packages, e.g.: numPy, matplotlib, pandas, seaborn, scipy.

- Data cleaning & fixing structural errors
- Check for outliers
- Descriptive Statistic
- Correlations
- Normality tests

#### I answer questions like
- Why does higher % of gender 1 have malignant tumours?
- What other features may be linked to malignant tumours?
- What is Wallmarts most sold product?
- What are the most documented use cases for cannabis, where?

### [Cancer Patients Dataset](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main/Notebooks/cancer-patient-dataset.ipynb)
#### Why does higher % of gender 1 have malignant tumours?
Gender & Cancer Level Crosstab
Gender & Alcohol use
Gender & Air pollution
Gender & Genetic Risk
#### What other features may be linked to malignant tumours?
Cancer Level & Obesity
Age bins & Cancer Level

### [Airline](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main/Notebooks/airline.ipynb)

### [McDonalds Dataset](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main/Notebooks/mcdonalds.ipynb)

## [Utility Scripts](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main/Scripts)
#### Python for Spreadsheets and Databases
### [FDA Compliancy Script](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main/Scripts/FDA-Compliancy-Scraper-ChatGPT)
- Scrapes all pages of a website into a csv which can be imported to ChatGPT for analysis. We also give lates guidelines together with the CSV and prompt ChatGPT to point out any content that is against the guidelines. Saves time for creating compliant CBD content.

### [Shopify API](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main/Scripts/Shopify-API)
- Querying the most popular products so we can display them in headless ecommerce with live data accordingly in the popular products section

### [Airtable Scripts and Extensions](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main/Scripts/Airtable)
- Splitting data in one column into multiple columns with
- Built my own markdown to html extension so that we can write markdown into airtable and sync it as html to Webflow CMS

### [Google Sheets for Lead Generation](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main/Scripts/GSheets)
- Script for checking the pagespeeds for URLs in column. Useful for lead generation. Also other smaller data cleaning scripts

## Tableau
- [Scientifically Documented Use Cases for *Cannabis Sativa L.*](https://public.tableau.com/views/UseofdifferentpartsofCannabisfordifferentmedicalusesindifferentcountries/Sheet8?:language=en-US&:display_count=n&:origin=viz_share_link)
- [Wallmart Sales Analysis](https://public.tableau.com/views/WallmartSalesAnalysis_16593931691930/Story1?:language=en-US&:display_count=n&:origin=viz_share_link)