Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/edochiari/tiktok-project

This project builds a predictive model to help TikTok classify user-reported content claims, improving moderation efficiency by identifying and prioritizing content that may need review. Insights from this model enable TikTok to manage reports more effectively, ensuring a safer and more engaging platform.
https://github.com/edochiari/tiktok-project

content-claims dataanalysis datacleaning hypothesis-testing jupyter-notebook regression tiktok

Last synced: about 17 hours ago
JSON representation

Host: GitHub
URL: https://github.com/edochiari/tiktok-project
Owner: EdoChiari
Created: 2024-11-07T20:46:34.000Z (3 months ago)
Default Branch: main
Last Pushed: 2024-11-08T10:30:31.000Z (3 months ago)
Last Synced: 2024-12-09T13:40:21.494Z (about 2 months ago)
Topics: content-claims, dataanalysis, datacleaning, hypothesis-testing, jupyter-notebook, regression, tiktok
Language: Jupyter Notebook
Homepage:
Size: 1.49 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# TikTok Claims Classification Project

## Overview
This project focuses on developing a predictive model to support TikTok’s moderation team by classifying user-submitted content claims efficiently. By analyzing user reports on videos and comments, the project aims to build a model that distinguishes between content with user claims versus opinions. This approach will help TikTok reduce the backlog of reports, prioritize moderation efforts, and maintain a safe and engaging community.

## Project Goals
1. **Classify Content Claims**: Build and evaluate a model to predict whether a video contains a claim or an opinion, enabling TikTok to streamline content moderation.
2. **Enhance Moderation Efficiency**: Provide a scalable solution for handling high volumes of user reports, improving the speed and accuracy of content review processes.
3. **Deliver Insights for Stakeholders**: Generate actionable insights from user reports to aid TikTok leadership in understanding content trends and moderation needs.

## Deliverables
The final project deliverables include:

- **Model Evaluation**: Comprehensive assessment of the classification model, including accuracy, precision, and recall, to gauge its effectiveness in content claim prediction.
- **Data Visualizations**: Interactive Tableau dashboards summarizing user report trends, claim types, and other key insights, accessible to non-technical stakeholders.
- **Feature Analysis**: Examination of features that contribute most to accurate claim classification, with discussions on potential causative relationships.
- **Future Model Improvements**: Recommendations for additional features and data sources that may enhance the accuracy and relevance of the model.

## Tools and Libraries Used
- **Data Analysis and Visualization**: Pandas, NumPy, Matplotlib, Seaborn, Tableau
- **Machine Learning**: Scikit-learn (for regression and classification models)
- **Notebook Environment**: Jupyter Notebook

## Project Structure
The project is organized as follows:

1. **Data Preparation**: Building and organizing a comprehensive dataset from user reports for claims classification, ensuring data quality and suitability for analysis.
2. **Exploratory Data Analysis (EDA)**: Analyzing user reports to identify claim patterns, trends, and factors that may impact claim classification.
3. **Hypothesis Testing**: Conducting hypothesis tests to determine the significance of various factors within user reports, informing model feature selection.
4. **Model Building and Evaluation**: Developing and testing a regression model to classify content claims, followed by evaluation using key performance metrics.
5. **Executive Summary**: A presentation-ready summary for stakeholders, highlighting findings, model performance, and potential impact on moderation efforts.

## Conclusion
This project offers TikTok a data-driven approach to improve content moderation by predicting user claims more effectively. With a model that enhances the prioritization of user reports, TikTok can maintain a safe, enjoyable platform while efficiently managing moderation resources.

## Badges

Add badges from somewhere like: [shields.io](https://shields.io/)

[![MIT License](https://img.shields.io/badge/License-MIT-green.svg)](https://choosealicense.com/licenses/mit/)
[![GPLv3 License](https://img.shields.io/badge/License-GPL%20v3-yellow.svg)](https://opensource.org/licenses/)
[![AGPL License](https://img.shields.io/badge/license-AGPL-blue.svg)](http://www.gnu.org/licenses/agpl-3.0)