Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/nolanbconaway/friends-omg

People say "oh my god" a lot in the show "Friends".
https://github.com/nolanbconaway/friends-omg

heroku python television

Last synced: 11 days ago
JSON representation

People say "oh my god" a lot in the show "Friends".

Host: GitHub
URL: https://github.com/nolanbconaway/friends-omg
Owner: nolanbconaway
Created: 2020-08-31T00:14:17.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2024-03-23T16:05:51.000Z (11 months ago)
Last Synced: 2025-01-11T23:53:49.301Z (28 days ago)
Topics: heroku, python, television
Language: Python
Homepage: https://friends-omg.onrender.com/
Size: 24.4 KB
Stars: 2
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

README

# Friends OMG

We were rewatching some old Friends episodes at home when I took notice that the phrase _"oh my god"_
comes up a lot in that show. As any reasonable person would, I compiled script data and built a website to
prove my point.

This webapp lets you check how often phrases like "oh my god" are said in Friends, Seinfeld,
and Sex and the City.

## Building the Dataset

I tried to make the data build as self-contained as possible. To that end I have hosted some source files on my personal heliohosting server. There are two unrecoverable aspects to the data:

1. The source seinfeld data is no longer available online. [Colin](https://github.com/colinpollock) sent a copy to me, and thats how i have it.
2. The Sex and the City data were messy in their public form (via Kaggle). I cleaned those data up a fair amount and saved the file.

The [download-all](bin/download-all) script contains relevant URLs to download the source data used tobuild the final dataset. After downloading those raw files, the [build](build/) module contains code to process the files in their raw form in order to prodice a final dataset.

You can download a copy of it [here](http://nolanc.heliohost.org/omg-data/data.db.gz)!

### Data Credits

I did basically zero work obtaining the source data. Below are shout-outs to those who did that hard work:

1. [Colin Pollock](https://github.com/colinpollock/seinfeld-scripts) for the _excellent_ Seinfeld dataset.
2. [Yusuf Sohoye](https://quotennial.github.io/Friends-engineering) for the regex to parse through Friends script files.
3. I stole the Sex and the City data from [Kaggle](https://www.kaggle.com/snapcrack/every-sex-and-the-city-script).

You can download my the full (gzipped) SQLite3 database [here](http://nolanc.heliohost.org/omg-data/data.db.gz)!