https://github.com/ginga1402/youtube_analysis
Exploratory Data Analysis on YouTube data
https://github.com/ginga1402/youtube_analysis
college-project data-analysis pandas-python
Last synced: about 1 year ago
JSON representation
Exploratory Data Analysis on YouTube data
- Host: GitHub
- URL: https://github.com/ginga1402/youtube_analysis
- Owner: Ginga1402
- Created: 2023-04-19T07:19:10.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2023-07-01T12:56:54.000Z (almost 3 years ago)
- Last Synced: 2025-02-05T05:28:53.395Z (over 1 year ago)
- Topics: college-project, data-analysis, pandas-python
- Language: Jupyter Notebook
- Homepage:
- Size: 1.01 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# YOUTUBE ANALYSIS
### Exploratory Data Analysis on YouTube data.
#### Domain: Social media

## Context and Content:
In a fairly recent move by Youtube, it announced the decision to hide the
number of dislikes from users around November 2021. However, the official YouTube Data API allowed you to
get information about dislikes until December 13, 2021. Doing an EDA-exercise can help to draw some unseen
insights from this dataset.
## Objective:
To do data analysis and explore the youtube dislikes dataset using numpy and pandas libraries and drive meaningful insights by performing Exploratory data analysis.
## Data Description:
Dataset link: https://www.kaggle.com/datasets/dmitrynikolaev/youtube-dislikes-dataset


## Questions:
Q1) Import required libraries and read the provided dataset (youtube_dislike_dataset.csv) and retrieve top 5 and bottom 5 records.
Q2) Check the info of the dataframe and write your inferences on data types and shape of the dataset.
Q3) Check for the Percentage of the missing values and drop or impute them.
Q4) Check the statistical summary of both numerical and categorical columns and write your inferences.
Q5) Convert datatype of column published_at from object to pandas datetime.
Q6) Create a new column as 'published_month' using the column published_at (display the months only)
Q7) Replace the numbers in the column published_month as names of the months i,e., 1 as 'Jan', 2 as 'Feb' and so on.....
Q8) Find the number of videos published each month and arrange the months in a decreasing order based on the video count.
Q9) Find the count of unique video_id, channel_id and channel_title.
Q10) Find the top10 channel names having the highest number of videos in the dataset and the bottom 10 having lowest number of videos.
Q11) Find the title of the video which has the maximum number of likes and the title of the video having minimum likes and write your inferences.
Q12) Find the title of the video which has the maximum number of dislikes and the title of the video having minimum dislikes and write your inferences.
Q13) Does the number of views have any effect on how many people disliked the video? Support your answer with a metric and a plot.