Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nchinling/google_advanced_data_analytics
https://github.com/nchinling/google_advanced_data_analytics
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/nchinling/google_advanced_data_analytics
- Owner: nchinling
- Created: 2024-01-09T07:23:14.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-03-04T09:35:18.000Z (11 months ago)
- Last Synced: 2024-11-11T18:09:54.446Z (3 months ago)
- Language: Jupyter Notebook
- Size: 13.9 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Google Advanced Data Analytics Portfolio Project
## Overview
The Google Advanced Data Analytics Portfolio project is based on the certification of the namesake. The portfolio showcases artifacts on project planning, machine learning, predictive modeling, and experimental design to collect and analyse large amounts of data.
The project is based on a business requirement of Tik Tok. TikTok users have the ability to report videos and comments that contain user claims. These reports identify content that needs to be reviewed by moderators. This process generates a large number of user reports that are difficult to address quickly. The project involves creating a machine learning model to determine whether a video contains a claim or offers an opinion. With a successful prediction model, TikTok can reduce the backlog of user reports and prioritise them more efficiently.
## Skills learnt
Regression analysis
Python
Translating data
Statistics
Machine learning## Modules
### 1. Foundations of Data Science (Project Proposal)
#### Description
This module focuses on the PACE document framework which outlines the various stages of the project. The module project involves creating the PACE document and project proposal.
### 2. Preparing and organising data
#### Description
This module focuses on the basics of Python as a programming language. The module project involves building a dataframe for the claims classification data (.ipynb file). Key variables and observations are identified that will be further explored at the exploratory data analysis stage.
### 3. Exploratory Data Analysis (EDA)
The module project involves performing an EDA on the dataset, and analysing and creating visualisations using Python and Tableau. From the EDA, suggestions on factors to be considered in building the classification model is proposed.
### 4. Statistical methods (Hypothesis testing)
The module project involves the use of statistical methods, specifically hypothesis testing. The project also explores which hypothesis testing is suitable for the data. The chosen approach is then used and results are reported.
### 5. Regression model
The module project involves the creation of a regression model. The model is then evaluated and the results are interpreted for stakeholders.
### 6. Machine Learning model
The module project involves the creation of the final machine learning model for the claims classification data. The model is evaluated and the results are interpreted.