https://github.com/s-bose/walks-into-a-bar-dataset
A dataset containing 1000+ walks-into-a-bar jokes scraped from the internet.
https://github.com/s-bose/walks-into-a-bar-dataset
bar dataset jokes kaggle-dataset nlp text-mining webscraping
Last synced: about 6 hours ago
JSON representation
A dataset containing 1000+ walks-into-a-bar jokes scraped from the internet.
- Host: GitHub
- URL: https://github.com/s-bose/walks-into-a-bar-dataset
- Owner: s-bose
- License: cc0-1.0
- Created: 2021-12-20T18:26:49.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2022-06-28T16:12:47.000Z (over 3 years ago)
- Last Synced: 2023-03-04T11:00:08.077Z (over 2 years ago)
- Topics: bar, dataset, jokes, kaggle-dataset, nlp, text-mining, webscraping
- Language: Jupyter Notebook
- Homepage:
- Size: 2.34 MB
- Stars: 1
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Walks Into A Bar Dataset
This dataset contains 1434 bar jokes webscraped from various sources on the internet.
The sources used are listed below.
| **Name** | **URL** |
|:--------------|:-------:|
| `grammarbook` | https://www.grammarbook.com/blog/definitions/walks-into-a-bar/ |
| `thrillist` | https://www.thrillist.com/culture/best-walks-into-a-bar-jokes |
| `jokojokes` | https://jokojokes.com/walks-into-a-bar-jokes.html |
| `gamertelligence` | https://www.gamertelligence.com/walks-into-a-bar-jokes/ |## Files
* The main dataset can be found in `data/jokes.csv`.
* The primary notebook used for scrapping the aforementioned websites is `notebooks/walks_into_bar_scrapper.ipynb`.
* `notebooks/seleniumconfig.py` is a helper module for obtaining a chrome `WebDriver` with predefined configurations.
* **Note** - Running the scrapper notebook requires installing all the packages in `requirements.txt`. Additionally, a chromedriver executable suitable for your operating system needs to be present in the root directory.## Disclaimer
Please note that the data has been webscrapped with minimal editing of the original text.
Therefore some jokes might be repeated, or might be NSFW. Certain websites had user-provided jokes, which as a result might not conform to the general structure of a walks-into-a-bar joke.Feel free to contribute to this dataset if you can come across further sources for bar jokes.
## Further Links
[
](https://www.kaggle.com/datasets/shiladityabasu/walks-into-a-bar-dataset)