Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/abduldevhub/reddit-data-scrapping
This project is a portfolio of my work on data scraping from Reddit using Python. Read the README file to learn more.
https://github.com/abduldevhub/reddit-data-scrapping
gensim jupyter-notebook matplotlib nltk pandas praw python
Last synced: 23 days ago
JSON representation
This project is a portfolio of my work on data scraping from Reddit using Python. Read the README file to learn more.
- Host: GitHub
- URL: https://github.com/abduldevhub/reddit-data-scrapping
- Owner: AbdulDevHub
- Created: 2024-04-02T06:40:06.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-09-07T21:28:59.000Z (6 months ago)
- Last Synced: 2024-11-30T13:14:58.820Z (3 months ago)
- Topics: gensim, jupyter-notebook, matplotlib, nltk, pandas, praw, python
- Language: Jupyter Notebook
- Homepage:
- Size: 7.94 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Reddit Data Scrapping With Python
Welcome to my repository! This project is a portfolio of my work on data scraping from Reddit using Python.
## Project Overview
This project involves scraping data from Reddit using the Python Reddit API Wrapper (PRAW), performing data analysis, and visualizing the results. The repository contains 2 Jupyter notebooks with all the utilized code, a PDF and PowerPoint of the findings, and a folder with the produced images/figures.
## Dependencies
This project uses the following Python libraries:
- `praw`: For interacting with the Reddit API.
- `pandas`: For data manipulation and analysis.
- `datetime`: For working with dates and times.
- `nltk`: For natural language processing tasks.
- `gensim`: For topic modelling and document similarity analysis.
- `matplotlib`: For creating static, animated, and interactive visualizations in Python.
- `pyLDAvis`: For interactive topic model visualization.
- `wordcloud`: For creating word cloud images.## Getting Started
To run the notebook, you can either extract the code to a separate Python file, or just run these two files directly on Jupyter Notebook.
You'll also need to set up PRAW with your own Reddit API credentials. You can do this in the notebooks where the PRAW instance is initialized.
## Contact
If you have any questions or feedback, feel free to open an issue or submit a pull request.
Enjoy exploring the repository!