An open API service indexing awesome lists of open source software.

https://github.com/amrrs/ted-analysis-in-r

TED Video Data Analysis in R
https://github.com/amrrs/ted-analysis-in-r

r ted

Last synced: about 1 month ago
JSON representation

TED Video Data Analysis in R

Awesome Lists containing this project

README

          

## Top Misc Things - TED Analysis in R

* DS+ Article: [5 interesting subtle insights from TED videos data analysis in R](https://datascienceplus.com/5-interesting-subtle-insights-from-ted-videos-data-analysis-in-r/)

* Kaggle Kernel: https://www.kaggle.com/nulldata/top-misc-things-ted-analysis-in-r

```R
options(scipen=999)
```

```R
library(dplyr); library(ggplot2); library(ggthemes);
```

```R
transcripts <- read.csv('../input/transcripts.csv',stringsAsFactors=F, header = T)
main <- read.csv('../input/ted_main.csv',stringsAsFactors=F, header = T)
```

### Total Number of Rows/Entries in the Main Dataset

```R
nrow(main)
```

2550

### Entries with more >= 1M views

```R
paste0('Total Number of videos with more than 1M views: ',main %>% filter(views > 1000000) %>% count() )
paste0('% of videos with more than 1M views: ', round((main %>% filter(views > 1000000) %>% count() / nrow(main))*100,2),'%')
```

'Total Number of videos with more than 1M views: 1503'

'% of videos with more than 1M views: 58.94%'

### Not so one-trick Pony!

```R
main %>% filter(views > 1000000) %>%
group_by(main_speaker) %>%
count() %>%
filter(n >2) %>%
arrange(desc(n)) %>%
head(20) %>%
ggplot() + geom_bar(aes(reorder(main_speaker,-n),n),stat='identity') + theme_solarized() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab('Speakers') +
ggtitle('To 20 Frequently Appeared Speakers in all videos with 1M+ views')
```

![png](output_8_1.png)

### Less Time More Impact

```R
main %>% filter(views > 1000000) %>% arrange(duration) %>% slice(1:10) %>% select('name','duration','views','event')
```

namedurationviewsevent

Derek Sivers: Weird, or just different? 162 2835976 TEDIndia 2009
Paolo Cardini: Forget multitasking, try monotasking172 2324212 TEDGlobal 2012
Mitchell Joachim: Don't build your home, grow it! 176 1332785 TED2010
Arthur Benjamin: Teach statistics before calculus! 178 2175141 TED2009
Terry Moore: How to tie your shoes 179 6263759 TED2005
Malcolm London: "High School Training Ground" 180 1188177 TED Talks Education
Bobby McFerrin: Watch me play ... the audience! 184 3302312 World Science Festival
Derek Sivers: How to start a movement 189 6475731 TED2010
Bruno Maisonnier: Dance, tiny robots! 189 1193896 TEDxConcorde
Dean Ornish: Your genes are not your fate 192 1384333 TED2008

### Skeeeeewed Views

```R
ggplot(main) + geom_histogram(aes(views)) + ggtitle('Histogram of Views') + theme_solarized()
```

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

![png](output_12_2.png)

```R
main$first_letter <- substr(main$main_speaker,1,1)
```

### J/S/A - Seems the magical First Letter!

```R
main %>%
group_by(first_letter = toupper(first_letter)) %>%
count() %>%
arrange(desc(n)) %>%
ggplot() +
geom_bar(aes(reorder(first_letter,-n),n),stat = 'identity') + theme_solarized() +
xlab('Speaker First Letter') +
ylab('Count') +
ggtitle('Popular First Letter of Author Names appearing in TED Talks')
```

![png](output_15_1.png)

```R
tedx <- main %>% filter(grepl('tedx',tolower(event)))

tedx %>% count()
```

n

471

### TEDx %in% TED

```R
tedx %>% filter(views > 1000000) %>%
group_by(event) %>%
count() %>%
filter(n >2) %>%
arrange(desc(n)) %>%
head(20) %>%
ggplot() + geom_bar(aes(reorder(event,-n),n),stat='identity') + theme_solarized() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab('TEDx Events') +
ggtitle('Top 20 TEDx Events that more talks with 1M+ views on TED.com')
```

![png](output_18_1.png)

### Top Comments - Atheism/Schools/Science!

```R
main %>%
arrange(desc(comments)) %>%
head(10) %>%
ggplot() +
geom_bar(aes(reorder(title,-comments),comments),stat = 'identity') + theme_solarized() +
xlab('Talk Name') +
ylab('Count') +
ggtitle('Talks with Most comments') +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
```

![png](output_20_1.png)

```R
transcripts$first_word <- unlist(lapply(transcripts$transcript, function(x) strsplit(x," ")[[1]][1]))
```

### Narcissim ?

```R
transcripts %>% group_by(first_word) %>% count() %>% arrange(desc(n)) %>% head(25) %>%
ggplot() +
geom_bar(aes(reorder(first_word,-n),n),stat = 'identity') + theme_solarized() +
xlab('First Word of the Talk') +
ylab('Count') +
ggtitle('Top First Word of the Talk') +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
```

![png](output_23_1.png)