https://github.com/pakagronglb/reddit-scraper
A powerful Reddit data scraping tool with a user-friendly Streamlit interface. Extract posts and comments from subreddits or specific posts with ease.
https://github.com/pakagronglb/reddit-scraper
python reddit reddit-api web-scraping
Last synced: 8 months ago
JSON representation
A powerful Reddit data scraping tool with a user-friendly Streamlit interface. Extract posts and comments from subreddits or specific posts with ease.
- Host: GitHub
- URL: https://github.com/pakagronglb/reddit-scraper
- Owner: pakagronglb
- License: mit
- Created: 2025-03-12T23:44:04.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-13T03:42:43.000Z (over 1 year ago)
- Last Synced: 2025-05-24T07:08:57.410Z (about 1 year ago)
- Topics: python, reddit, reddit-api, web-scraping
- Language: Python
- Homepage: https://reddit-scraper-123.streamlit.app/
- Size: 21.5 KB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
Awesome Lists containing this project
README
# Reddit Data Scraper 📊

[](https://www.python.org)
[](https://streamlit.io)
[](https://praw.readthedocs.io)
[](https://pandas.pydata.org)
A powerful Reddit data scraping tool with a user-friendly Streamlit interface. Extract posts and comments from subreddits or specific posts with ease.
## 🚀 Features
- 📱 User-friendly web interface
- 🔍 Scrape posts from any subreddit
- 💬 Extract comments from specific posts
- 📊 Export data to CSV
- ⏱️ Time-based filtering
- 🔄 Caching for better performance
## 🛠️ Tech Stack
- **Python** - Core programming language
- **Streamlit** - Web interface framework
- **PRAW** - Reddit API wrapper
- **Pandas** - Data manipulation and analysis
- **python-dotenv** - Environment variable management
## 📋 Prerequisites
- Python 3.9 or higher
- Reddit API credentials ([Get them here](https://www.reddit.com/prefs/apps))
## ⚙️ Installation
1. Clone the repository:
```bash
git clone https://github.com/pakagronglb/reddit-scraper.git
cd reddit-scraper
```
2. Create a virtual environment:
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```
4. Set up environment variables:
Create a `.env` file in the project root:
```env
REDDIT_CLIENT_ID=your_client_id
REDDIT_CLIENT_SECRET=your_client_secret
REDDIT_USER_AGENT=your_user_agent
```
## 🚀 Usage
1. Start the application:
```bash
streamlit run main.py
```
2. Access the web interface at `http://localhost:8501`
3. Choose your scraping option:
- **Subreddit Posts**: Enter subreddit name, post limit, and time filter
- **Specific Post**: Enter the Reddit post URL
4. Click "Scrape" and download the results as CSV
## 🌐 Deployment
### Streamlit Cloud
1. Push your code to GitHub
2. Visit [share.streamlit.io](https://share.streamlit.io)
3. Connect your repository
4. Add your Reddit API credentials in Streamlit secrets
### Heroku
1. Create a Heroku app:
```bash
heroku create your-app-name
```
2. Set environment variables:
```bash
heroku config:set REDDIT_CLIENT_ID=your_client_id
heroku config:set REDDIT_CLIENT_SECRET=your_client_secret
heroku config:set REDDIT_USER_AGENT=your_user_agent
```
3. Deploy:
```bash
git push heroku main
```
## 📝 Configuration
- `requirements.txt` - Project dependencies
- `.env` - Local environment variables
- `Procfile` - Heroku deployment configuration
- `runtime.txt` - Python runtime specification
## 🔒 Security
- Never commit your `.env` file or `.streamlit/secrets.toml`
- Use environment variables for sensitive data
- Keep your Reddit API credentials secure
## 🤝 Contributing
1. Fork the repository
2. Create a feature branch
3. Commit your changes
4. Push to the branch
5. Open a Pull Request
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 👏 Acknowledgments
- [PRAW Documentation](https://praw.readthedocs.io/)
- [Streamlit Documentation](https://docs.streamlit.io/)
- [Reddit API Documentation](https://www.reddit.com/dev/api/)
## 📧 Contact
Your Name - [@pakagronglb](https://twitter.com/pakagronglb)
Project Link: [https://github.com/pakagronglb/reddit-scraper](https://github.com/pakagronglb/reddit-scraper)