https://github.com/lisa-ho/breadit

Respository for scraping and analysing data from the Reddit/Sourdough community to explore lockdown baking trends.
https://github.com/lisa-ho/breadit

data-analysis data-viz nltk python reddit-api sentiment-analysis web-scraping

Last synced: 2 months ago
JSON representation

Respository for scraping and analysing data from the Reddit/Sourdough community to explore lockdown baking trends.

Host: GitHub
URL: https://github.com/lisa-ho/breadit
Owner: Lisa-Ho
Created: 2020-12-11T21:14:43.000Z (over 5 years ago)
Default Branch: main
Last Pushed: 2021-02-21T17:35:30.000Z (over 5 years ago)
Last Synced: 2025-04-05T08:30:32.655Z (over 1 year ago)
Topics: data-analysis, data-viz, nltk, python, reddit-api, sentiment-analysis, web-scraping
Language: Jupyter Notebook
Homepage:
Size: 2.48 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# About this project

This is a personal project scraping and analysing data from the [Reddit/Sourdough](https://www.reddit.com/r/Sourdough/) community in 2020.

As a sourdough baker myself, I wanted to explore lockdown baking trends in more detail, see when engagement peaked and what bakers were talking about.

Thanks to [pushshift.io](https://pushshift.io/api-parameters/) I was able to retrieve data from Reddit relatively easily.

The write up of my analysis can be found on [my blog](https://inside-numbers.com/blog).

## Notebooks

This project is organised in two different jupyter notebooks.

1. Webscraping (data collection)
2. Data cleaning and analysis

## Requirements

This project is run on python 3 and a number of python libraries specified in ```requirements.txt```.

## Notes on methodology

### Users

Users are those who posted a submission in r/Sourdough in 2020. Some users have since then been deleted and are counted as one single [deleted] user.

### Score / upvoting data

Unfortunately, the data retrieved through pushshift for submission scores (upvotes) seemed to be incorrect, so could not be used for analysis.

### Dates and times

When converting unix timestamp to datetime, I did not account for different timezones of users at the time of their submission. Hence, analysis of submissions by days and hours of the day might be slightly distorted.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lisa-ho/breadit

Awesome Lists containing this project

README