https://github.com/amrrs/ted-analysis-in-r
TED Video Data Analysis in R
https://github.com/amrrs/ted-analysis-in-r
r ted
Last synced: about 1 month ago
JSON representation
TED Video Data Analysis in R
- Host: GitHub
- URL: https://github.com/amrrs/ted-analysis-in-r
- Owner: amrrs
- Created: 2017-12-14T15:05:02.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2017-12-20T15:35:14.000Z (over 8 years ago)
- Last Synced: 2025-01-15T12:01:00.811Z (over 1 year ago)
- Topics: r, ted
- Language: Jupyter Notebook
- Homepage:
- Size: 256 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Top Misc Things - TED Analysis in R
* DS+ Article: [5 interesting subtle insights from TED videos data analysis in R](https://datascienceplus.com/5-interesting-subtle-insights-from-ted-videos-data-analysis-in-r/)
* Kaggle Kernel: https://www.kaggle.com/nulldata/top-misc-things-ted-analysis-in-r
```R
options(scipen=999)
```
```R
library(dplyr); library(ggplot2); library(ggthemes);
```
```R
transcripts <- read.csv('../input/transcripts.csv',stringsAsFactors=F, header = T)
main <- read.csv('../input/ted_main.csv',stringsAsFactors=F, header = T)
```
### Total Number of Rows/Entries in the Main Dataset
```R
nrow(main)
```
2550
### Entries with more >= 1M views
```R
paste0('Total Number of videos with more than 1M views: ',main %>% filter(views > 1000000) %>% count() )
paste0('% of videos with more than 1M views: ', round((main %>% filter(views > 1000000) %>% count() / nrow(main))*100,2),'%')
```
'Total Number of videos with more than 1M views: 1503'
'% of videos with more than 1M views: 58.94%'
### Not so one-trick Pony!
```R
main %>% filter(views > 1000000) %>%
group_by(main_speaker) %>%
count() %>%
filter(n >2) %>%
arrange(desc(n)) %>%
head(20) %>%
ggplot() + geom_bar(aes(reorder(main_speaker,-n),n),stat='identity') + theme_solarized() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab('Speakers') +
ggtitle('To 20 Frequently Appeared Speakers in all videos with 1M+ views')
```

### Less Time More Impact
```R
main %>% filter(views > 1000000) %>% arrange(duration) %>% slice(1:10) %>% select('name','duration','views','event')
```
namedurationviewsevent
Derek Sivers: Weird, or just different? 162 2835976 TEDIndia 2009
Paolo Cardini: Forget multitasking, try monotasking172 2324212 TEDGlobal 2012
Mitchell Joachim: Don't build your home, grow it! 176 1332785 TED2010
Arthur Benjamin: Teach statistics before calculus! 178 2175141 TED2009
Terry Moore: How to tie your shoes 179 6263759 TED2005
Malcolm London: "High School Training Ground" 180 1188177 TED Talks Education
Bobby McFerrin: Watch me play ... the audience! 184 3302312 World Science Festival
Derek Sivers: How to start a movement 189 6475731 TED2010
Bruno Maisonnier: Dance, tiny robots! 189 1193896 TEDxConcorde
Dean Ornish: Your genes are not your fate 192 1384333 TED2008
### Skeeeeewed Views
```R
ggplot(main) + geom_histogram(aes(views)) + ggtitle('Histogram of Views') + theme_solarized()
```
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

```R
main$first_letter <- substr(main$main_speaker,1,1)
```
### J/S/A - Seems the magical First Letter!
```R
main %>%
group_by(first_letter = toupper(first_letter)) %>%
count() %>%
arrange(desc(n)) %>%
ggplot() +
geom_bar(aes(reorder(first_letter,-n),n),stat = 'identity') + theme_solarized() +
xlab('Speaker First Letter') +
ylab('Count') +
ggtitle('Popular First Letter of Author Names appearing in TED Talks')
```

```R
tedx <- main %>% filter(grepl('tedx',tolower(event)))
tedx %>% count()
```
n
471
### TEDx %in% TED
```R
tedx %>% filter(views > 1000000) %>%
group_by(event) %>%
count() %>%
filter(n >2) %>%
arrange(desc(n)) %>%
head(20) %>%
ggplot() + geom_bar(aes(reorder(event,-n),n),stat='identity') + theme_solarized() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab('TEDx Events') +
ggtitle('Top 20 TEDx Events that more talks with 1M+ views on TED.com')
```

### Top Comments - Atheism/Schools/Science!
```R
main %>%
arrange(desc(comments)) %>%
head(10) %>%
ggplot() +
geom_bar(aes(reorder(title,-comments),comments),stat = 'identity') + theme_solarized() +
xlab('Talk Name') +
ylab('Count') +
ggtitle('Talks with Most comments') +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
```

```R
transcripts$first_word <- unlist(lapply(transcripts$transcript, function(x) strsplit(x," ")[[1]][1]))
```
### Narcissim ?
```R
transcripts %>% group_by(first_word) %>% count() %>% arrange(desc(n)) %>% head(25) %>%
ggplot() +
geom_bar(aes(reorder(first_word,-n),n),stat = 'identity') + theme_solarized() +
xlab('First Word of the Talk') +
ylab('Count') +
ggtitle('Top First Word of the Talk') +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
```
