https://github.com/ashmit-kumar/redditmindmap
RedditPersonaExtractor is a Python tool that scrapes a Reddit user's posts and comments, then uses an LLM to generate a detailed, citation-backed user persona.
https://github.com/ashmit-kumar/redditmindmap
generative-ai llm persona-generation web-scraping
Last synced: 8 months ago
JSON representation
RedditPersonaExtractor is a Python tool that scrapes a Reddit user's posts and comments, then uses an LLM to generate a detailed, citation-backed user persona.
- Host: GitHub
- URL: https://github.com/ashmit-kumar/redditmindmap
- Owner: Ashmit-Kumar
- Created: 2025-07-15T15:09:57.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-07-25T11:20:34.000Z (11 months ago)
- Last Synced: 2025-07-25T17:38:36.238Z (11 months ago)
- Topics: generative-ai, llm, persona-generation, web-scraping
- Language: Python
- Homepage:
- Size: 48.8 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Reddit User Persona Generator
This is a Python-based project that generates a detailed **user persona** by analyzing a Reddit userβs **posts** and **comments** using **LLM-based natural language analysis**. The output includes characteristics like interests, personality traits, writing tone, and more β along with proper citations from Reddit.
This project was built as part of the Generative AI Internship Assignment for **BeyondChats**.
---
## π Features
- Scrapes recent Reddit posts and comments of a user
- Generates an LLM-backed psychological and stylistic persona
- Cites exact URLs for every inferred trait
- Clean CLI interface
- Fully PEP-8 compliant code structure
---
## π§° Technologies Used
- Python 3.10+
- [PRAW](https://praw.readthedocs.io/en/stable/) β Reddit API wrapper
- [Google Gemini API](https://ai.google.dev/)
- `python-dotenv` for secure environment variable management
---
## π§ Setup Instructions
### 1. Clone this repository
```bash
git clone https://github.com/your-username/reddit-user-persona-generator.git
cd RedditMindMap
````
### 2. Install dependencies
We recommend using a virtual environment:
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
```
---
### 3. Setup `.env` file
Create a `.env` file in the project root directory with the following content:
```env
# Reddit API (Create from https://www.reddit.com/prefs/apps)
REDDIT_CLIENT_ID=your_reddit_client_id
REDDIT_CLIENT_SECRET=your_reddit_client_secret
# Gemini API (Get from https://ai.google.dev/)
GOOGLE_API_KEY=your_google_gemini_api_key
```
---
## π§ͺ How to Run
You can run the script in two ways:
### **Option 1 β With a Reddit Profile URL as an Argument**
```bash
python main.py https://www.reddit.com/user/kojied/
```
### **Option 2 β Run Without Arguments (Interactive Mode)**
```bash
python main.py
```
You will be prompted to enter a Reddit profile URL or just the username.
---
### π‘ What Happens When You Run It:
1. Extracts the username from the provided URL (or direct input).
2. Scrapes up to **30 posts** and **30 comments** using the Reddit API.
3. Uses **Google Gemini LLM** to generate a structured persona.
4. Saves the output in two formats:
* π `kojied_persona.txt` (text-based for terminal and evaluation)
* π `kojied_persona.md` (Markdown-formatted for GitHub)
---
## π Example Output
Each persona file (`.txt` and `.md`) includes:
* π― **Interests**
* π€ **Personality traits**
* π£οΈ **Tone of writing**
* π¨βπ **Possible profession or education**
* π **Language style or humor**
* π **Political/social leanings (if any)**
* π« **Limitations**
* π **Citations** for each trait (Reddit post or comment URL)
* π¬ (Optional) **Representative quote**
* β
(Optional) **Goals and needs**
---
## π¦ File Structure
```plaintext
reddit-user-persona-generator/
β
βββ main.py # Entry point for CLI or prompt-based input
βββ persona_utils.py # All scraping, LLM generation, saving logic
βββ requirements.txt # Python dependencies
βββ .env # Stores API keys (excluded from Git)
βββ kojied_persona.txt # Sample output (text format)
βββ kojied_persona.md # Sample output (Markdown format)
βββ Hungry-Move-6603_persona.txt
βββ Hungry-Move-6603_persona.md
βββ README.md # README file
```
---
## β
PEP-8 Compliant
All code follows PEP-8 standards for style and formatting. Verified using:
```bash
flake8 main.py persona_utils.py
```
---
## π Notes
* Ensure your Reddit app is created as a **script app**, not web or installed.
* If Gemini API throws a quota error, try reducing post/comment limit or use a smaller model.
* Only public Reddit data is used; no login or upvote activity is tracked.