https://github.com/amrrs/ted-analysis-in-r

TED Video Data Analysis in R
https://github.com/amrrs/ted-analysis-in-r

r ted

Last synced: about 1 month ago
JSON representation

TED Video Data Analysis in R

Host: GitHub
URL: https://github.com/amrrs/ted-analysis-in-r
Owner: amrrs
Created: 2017-12-14T15:05:02.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2017-12-20T15:35:14.000Z (over 8 years ago)
Last Synced: 2025-01-15T12:01:00.811Z (over 1 year ago)
Topics: r, ted
Language: Jupyter Notebook
Homepage:
Size: 256 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          ## Top Misc Things - TED Analysis in R

* DS+ Article: [5 interesting subtle insights from TED videos data analysis in R](https://datascienceplus.com/5-interesting-subtle-insights-from-ted-videos-data-analysis-in-r/)

* Kaggle Kernel: https://www.kaggle.com/nulldata/top-misc-things-ted-analysis-in-r

```R

options(scipen=999)

```

```R

library(dplyr); library(ggplot2); library(ggthemes);

```

```R

transcripts <- read.csv('../input/transcripts.csv',stringsAsFactors=F, header = T)

main <- read.csv('../input/ted_main.csv',stringsAsFactors=F, header = T)

```

### Total Number of Rows/Entries in the Main Dataset

```R

nrow(main)

```

2550

### Entries with more >= 1M views

```R

paste0('Total Number of videos with more than 1M views: ',main %>% filter(views > 1000000) %>% count() )

paste0('% of videos with more than 1M views: ', round((main %>% filter(views > 1000000) %>% count() / nrow(main))*100,2),'%')

```

'Total Number of videos with more than 1M views: 1503'

'% of videos with more than 1M views: 58.94%'

###  Not so one-trick Pony!

```R

main %>% filter(views > 1000000) %>% 

group_by(main_speaker) %>% 

count() %>% 

filter(n >2) %>% 

arrange(desc(n)) %>% 

head(20) %>% 

ggplot() + geom_bar(aes(reorder(main_speaker,-n),n),stat='identity') + theme_solarized() + 

theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab('Speakers') + 

ggtitle('To 20 Frequently Appeared Speakers in all videos with 1M+ views')

```

![png](output_8_1.png)

### Less Time More Impact 

```R

main %>% filter(views > 1000000) %>% arrange(duration) %>% slice(1:10) %>% select('name','duration','views','event')

```

namedurationviewsevent

	Derek Sivers: Weird, or just different?            162                                                2835976                                            TEDIndia 2009                                      

	Paolo Cardini: Forget multitasking, try monotasking172                                                2324212                                            TEDGlobal 2012                                     

	Mitchell Joachim: Don't build your home, grow it!  176                                                1332785                                            TED2010                                            

	Arthur Benjamin: Teach statistics before calculus! 178                                                2175141                                            TED2009                                            

	Terry Moore: How to tie your shoes                 179                                                6263759                                            TED2005                                            

	Malcolm London: "High School Training Ground"      180                                                1188177                                            TED Talks Education                                

	Bobby McFerrin: Watch me play ... the audience!    184                                                3302312                                            World Science Festival                             

	Derek Sivers: How to start a movement              189                                                6475731                                            TED2010                                            

	Bruno Maisonnier: Dance, tiny robots!              189                                                1193896                                            TEDxConcorde                                       

	Dean Ornish: Your genes are not your fate          192                                                1384333                                            TED2008                                            

### Skeeeeewed Views

```R

ggplot(main) + geom_histogram(aes(views)) + ggtitle('Histogram of Views') + theme_solarized()

```

    `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

    

![png](output_12_2.png)

```R

main$first_letter <- substr(main$main_speaker,1,1)

```

### J/S/A - Seems the magical First Letter!

```R

main %>% 

group_by(first_letter = toupper(first_letter)) %>% 

count() %>% 

arrange(desc(n)) %>% 

ggplot() + 

geom_bar(aes(reorder(first_letter,-n),n),stat = 'identity') + theme_solarized() + 

xlab('Speaker First Letter') +

ylab('Count') + 

ggtitle('Popular First Letter of Author Names appearing in TED Talks')

```

![png](output_15_1.png)

```R

tedx <- main %>% filter(grepl('tedx',tolower(event)))

tedx %>% count()

```

n

	471

### TEDx %in% TED

```R

tedx %>% filter(views > 1000000) %>% 

group_by(event) %>% 

count() %>% 

filter(n >2) %>% 

arrange(desc(n)) %>% 

head(20) %>% 

ggplot() + geom_bar(aes(reorder(event,-n),n),stat='identity') + theme_solarized() + 

theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab('TEDx Events') + 

ggtitle('Top 20 TEDx Events that more talks with 1M+ views on TED.com')

```

![png](output_18_1.png)

### Top Comments - Atheism/Schools/Science!

```R

main %>% 

arrange(desc(comments)) %>% 

head(10) %>% 

ggplot() + 

geom_bar(aes(reorder(title,-comments),comments),stat = 'identity') + theme_solarized() + 

xlab('Talk Name') +

ylab('Count') + 

ggtitle('Talks with Most comments') + 

theme(axis.text.x = element_text(angle = 60, hjust = 1)) 

```

![png](output_20_1.png)

```R

transcripts$first_word <- unlist(lapply(transcripts$transcript, function(x) strsplit(x," ")[[1]][1]))

```

### Narcissim ?

```R

transcripts %>% group_by(first_word) %>% count() %>% arrange(desc(n)) %>% head(25) %>%

ggplot() + 

geom_bar(aes(reorder(first_word,-n),n),stat = 'identity') + theme_solarized() + 

xlab('First Word of the Talk') +

ylab('Count') + 

ggtitle('Top First Word of the Talk') + 

theme(axis.text.x = element_text(angle = 60, hjust = 1))

```

![png](output_23_1.png)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/amrrs/ted-analysis-in-r

Awesome Lists containing this project

README