{"id":15020059,"url":"https://github.com/saltiola7/data-analysis-portfolio","last_synced_at":"2026-01-21T15:33:15.276Z","repository":{"id":180339741,"uuid":"570743659","full_name":"Saltiola7/Data-Analysis-Portfolio","owner":"Saltiola7","description":"Data engineering \u0026 analysis portfolio, which showcases my use of Python \u0026 SQL","archived":false,"fork":false,"pushed_at":"2023-12-08T00:15:09.000Z","size":31905,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-04T22:04:35.387Z","etag":null,"topics":["airflow","airtable-block","anaconda","automation","back4app","chatgpt","csv-parser","data-analysis","data-engineering","docker-compose","gcp","graphql-api","jupyter-notebook","nosql","prefect","python","rest-api","sql","streamlit","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Saltiola7.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-26T01:38:11.000Z","updated_at":"2025-01-15T04:09:32.000Z","dependencies_parsed_at":"2024-09-22T15:03:55.117Z","dependency_job_id":"f2082990-448d-472d-909c-77b246140784","html_url":"https://github.com/Saltiola7/Data-Analysis-Portfolio","commit_stats":{"total_commits":59,"total_committers":3,"mean_commits":"19.666666666666668","dds":"0.44067796610169496","last_synced_commit":"a313e3b8a40bbb321efd6f7c9258a6f5a870a162"},"previous_names":["saltiola7/data-analysis-portfolio"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Saltiola7%2FData-Analysis-Portfolio","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Saltiola7%2FData-Analysis-Portfolio/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Saltiola7%2FData-Analysis-Portfolio/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Saltiola7%2FData-Analysis-Portfolio/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Saltiola7","download_url":"https://codeload.github.com/Saltiola7/Data-Analysis-Portfolio/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247770189,"owners_count":20993142,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","airtable-block","anaconda","automation","back4app","chatgpt","csv-parser","data-analysis","data-engineering","docker-compose","gcp","graphql-api","jupyter-notebook","nosql","prefect","python","rest-api","sql","streamlit","web-scraping"],"created_at":"2024-09-24T19:54:32.001Z","updated_at":"2026-01-21T15:33:15.269Z","avatar_url":"https://github.com/Saltiola7.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Engineering \u0026 Analysis Portfolio\n\nWelcome to my portfolio, which showcases a python \u0026 JavaScript web scraping ETL pipeline, Jupyter Notebooks analyzing many different datasets as well as data visualizations using Tableau.\n\n## Table of Contents\n- Data Engineer certification from Datacamp\n- Recommendation Letter from Data Analyst Mentor\n- Testimonial from Web Scraping Client\n- Web Scraping ETL Pipeline\n- Jupyter Notebooks\n- Utility Scripts With Python for Google Sheets, Airtable, Shopify, ChatGPT\n- Tableau Visualizations\n\n## Data Engineer certification from [Datacamp](https://www.datacamp.com/)\n\u003cimg src=\"DEA0017031997389.jpg\" width=\"555\" /\u003e\n\n## Recommendation Letter from [Data Analyst Lecturer](https://www.researchgate.net/profile/Elnaz-Gholipour)\n\u003cimg src=\"Recommendation-Letter.jpg\" width=\"555\" /\u003e\n\n## Testimonial from Web Scraping Client\n**Vesa Karjalainen, Polq Oy:** I had the opportunity to work with Tommi on developing a critical scraping tool and server for our company. His technical expertise, innovative approach, and dedication to understanding our specific needs resulted in a seamless and efficient solution. The tool has significantly improved our data collection processes, demonstrating Tommi's ability to deliver high-quality work under tight deadlines. His professionalism and willingness to go the extra mile made a remarkable difference. I highly recommend Tommi to anyone looking for exceptional technical solutions in data management and infrastructure.\n\n## [Web Scraping ETL](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main/Web-Scraper-ETL)\nScraping job board data from multiple websites to custom job board application.\n\nIt was first build with the community Docker Compose setup, but was moved to Prefect.io before launch as it was a more streamlined solution for the client.\n\n## [Jupyter Notebooks](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main//Notebooks)\n\nI use various python data science packages, e.g.: numPy, matplotlib, pandas, seaborn, scipy.\n\n- Data cleaning \u0026 fixing structural errors\n- Check for outliers\n- Descriptive Statistic\n- Correlations\n- Normality tests\n\n#### I answer questions like\n- Why does higher % of gender 1 have malignant tumours?\n- What other features may be linked to malignant tumours?\n- What is Wallmarts most sold product?\n- What are the most documented use cases for cannabis, where?\n\n### [Cancer Patients Dataset](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main/Notebooks/cancer-patient-dataset.ipynb)\n#### Why does higher % of gender 1 have malignant tumours?\nGender \u0026 Cancer Level Crosstab\nGender \u0026 Alcohol use\nGender \u0026 Air pollution\nGender \u0026 Genetic Risk\n#### What other features may be linked to malignant tumours?\nCancer Level \u0026 Obesity\nAge bins \u0026 Cancer Level\n\n### [Airline](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main/Notebooks/airline.ipynb)\n\n### [McDonalds Dataset](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main/Notebooks/mcdonalds.ipynb)\n\n## [Utility Scripts](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main/Scripts)\n#### Python for Spreadsheets and Databases\n### [FDA Compliancy Script](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main/Scripts/FDA-Compliancy-Scraper-ChatGPT)\n- Scrapes all pages of a website into a csv which can be imported to ChatGPT for analysis. We also give lates guidelines together with the CSV and prompt ChatGPT to point out any content that is against the guidelines. Saves time for creating compliant CBD content.\n\n### [Shopify API](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main/Scripts/Shopify-API)\n- Querying the most popular products so we can display them in headless ecommerce with live data accordingly in the popular products section\n\n### [Airtable Scripts and Extensions](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main/Scripts/Airtable)\n- Splitting data in one column into multiple columns with\n- Built my own markdown to html extension so that we can write markdown into airtable and sync it as html to Webflow CMS\n\n### [Google Sheets for Lead Generation](https://github.com/Saltiola7/Data-Analysis-Portfolio/blob/main/Scripts/GSheets)\n- Script for checking the pagespeeds for URLs in column. Useful for lead generation. Also other smaller data cleaning scripts\n\n## Tableau\n- [Scientifically Documented Use Cases for *Cannabis Sativa L.*](https://public.tableau.com/views/UseofdifferentpartsofCannabisfordifferentmedicalusesindifferentcountries/Sheet8?:language=en-US\u0026:display_count=n\u0026:origin=viz_share_link)\n- [Wallmart Sales Analysis](https://public.tableau.com/views/WallmartSalesAnalysis_16593931691930/Story1?:language=en-US\u0026:display_count=n\u0026:origin=viz_share_link)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaltiola7%2Fdata-analysis-portfolio","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsaltiola7%2Fdata-analysis-portfolio","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaltiola7%2Fdata-analysis-portfolio/lists"}