https://github.com/wlun001/youtube-video-analysis

YouTube video analysis based on datasets on Kaggle
https://github.com/wlun001/youtube-video-analysis

big-data-analytics dataset kaggle scala spark

Last synced: 6 months ago
JSON representation

YouTube video analysis based on datasets on Kaggle

Host: GitHub
URL: https://github.com/wlun001/youtube-video-analysis
Owner: WLun001
Created: 2018-04-07T13:30:46.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2018-04-12T08:01:24.000Z (over 7 years ago)
Last Synced: 2025-02-09T13:41:35.692Z (8 months ago)
Topics: big-data-analytics, dataset, kaggle, scala, spark
Language: Scala
Homepage: https://www.kaggle.com/datasnaek/youtube-new
Size: 11.6 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Youtube Video Analysis
YouTube video analysis based on datasets on Kaggle

# How to run
### If havent start spark-shell
```spark-shell -i file.scala```
### If started spark-shell
```:load file.scala```

# File explanation
- [setup.scala](setup.scala) - initial setup. Read from csv and clean date
- [saveToParquet.scala](saveToParquet.scala) - save RDD to Parquet. Assume Parquet is created with Hive.

```CREATE EXTERNAL TABLE videos(video_id STRING, trending_date STRING, title STRING, channel_title STRING, category_id STRING, publish_time STRING, tags STRING, views INT, likes INT, dislikes INT, comment_count INT, thumbnail_link STRING, comments_disabled BOOLEAN, ratings_disabled BOOLEAN, video_error_removed BOOLEAN, description STRING) STORED AS PARQUET LOCATION '/user/cloudera/labs';```
- [readFromParquet.scala](readFromParquet.scala) - read from Parquet after saved
- [trending.scala](trending.scala) - Video Trending analysis

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/wlun001/youtube-video-analysis

Awesome Lists containing this project

README