{"id":28331169,"url":"https://github.com/masurii/fbscrapeideas","last_synced_at":"2026-04-28T18:02:56.638Z","repository":{"id":295377062,"uuid":"989904244","full_name":"MasuRii/FBScrapeIdeas","owner":"MasuRii","description":"Modern CLI tool for scraping \u0026 analyzing Facebook groups using Playwright \u0026 Gemini AI. Features self-healing selectors, session security, and local offline analysis.","archived":false,"fork":false,"pushed_at":"2025-12-30T19:12:15.000Z","size":1229,"stargazers_count":15,"open_issues_count":0,"forks_count":5,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-01-01T14:51:03.791Z","etag":null,"topics":["academic-research","ai","cli","data-extraction","data-mining","facebook-scraper","gemini-api","idea-generation","nlp","python","selenium","text-analysis"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MasuRii.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-05-25T04:43:38.000Z","updated_at":"2025-12-30T13:49:32.000Z","dependencies_parsed_at":"2025-12-30T20:04:09.091Z","dependency_job_id":null,"html_url":"https://github.com/MasuRii/FBScrapeIdeas","commit_stats":null,"previous_names":["masurii/fbscrapeideas"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/MasuRii/FBScrapeIdeas","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MasuRii%2FFBScrapeIdeas","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MasuRii%2FFBScrapeIdeas/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MasuRii%2FFBScrapeIdeas/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MasuRii%2FFBScrapeIdeas/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MasuRii","download_url":"https://codeload.github.com/MasuRii/FBScrapeIdeas/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MasuRii%2FFBScrapeIdeas/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32392304,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-28T14:34:11.604Z","status":"ssl_error","status_checked_at":"2026-04-28T14:32:37.009Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["academic-research","ai","cli","data-extraction","data-mining","facebook-scraper","gemini-api","idea-generation","nlp","python","selenium","text-analysis"],"created_at":"2025-05-26T18:28:29.905Z","updated_at":"2026-04-28T18:02:56.632Z","avatar_url":"https://github.com/MasuRii.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FB Scrape Ideas\n\n[![Python Version][python-shield]][python-url]\n[![License][license-shield]][license-url]\n[![Issues][issues-shield]][issues-url]\n[![Forks][forks-shield]][forks-url]\n[![Stars][stars-shield]][stars-url]\n[![Contributors][contributors-shield]][contributors-url]\n\nA CLI-driven application to scrape and analyze Facebook group posts for insights using Selenium and Google Gemini AI.\n\nThis tool helps users identify potential capstone/thesis ideas, student problems, or other valuable insights from university Facebook group discussions by automating data collection (including posts and comments) and AI-powered categorization.\n\n![CLI Screenshot](assets/CLIScreenshot_2.png)\n\n\n## 📖 Table of Contents\n\n- [FB Scrape Ideas](#fb-scrape-ideas)\n  - [📖 Table of Contents](#-table-of-contents)\n  - [✨ Features](#-features)\n  - [📝 Scraped Data Fields](#-scraped-data-fields)\n    - [🗨️ Posts](#️-posts)\n    - [💬 Comments](#-comments)\n    - [🔍 AI Analysis Fields](#-ai-analysis-fields)\n  - [🛠️ Tech Stack](#️-tech-stack)\n  - [📋 Prerequisites](#-prerequisites)\n  - [🚀 Getting Started](#-getting-started)\n    - [Installation](#installation)\n    - [Configuration](#configuration)\n  - [⚙️ Usage](#️-usage)\n  - [⚠️ Important Notice](#️-important-notice)\n\n## ✨ Features\n\n*   **🔒 Authenticated Facebook Group Scraping:** Securely logs into Facebook to scrape posts and comments from private or public groups.\n*   **🤖 Flexible AI Analysis:** \n    *   Support for **Google Gemini** (default) and **OpenAI-compatible** providers (OpenAI, Ollama, LM Studio, etc.)\n    *   Configurable models (e.g., switch between Gemini 2.5 Pro, 2.0 Flash, or local LLMs)\n    *   **Customizable Prompts:** Override default AI prompts via JSON configuration\n*   **💾 Local Database Storage:** Stores scraped data and AI insights in a local SQLite database.\n*   **📊 Data Export \u0026 Statistics:** Export data to CSV/JSON formats and view detailed statistics.\n*   **💻 Advanced CLI Interface:**\n    *   **Dynamic Filtering:** Filter posts by category, author, or potential ideas\n    *   **Pagination:** Limit results with `--limit` option\n    *   **Interactive Menus:** User-friendly command selection\n*   **⚡ Performance Optimizations:**\n    *   Parallel processing for faster scraping\n    *   Asynchronous AI batch processing\n    *   Incremental data saving during scraping\n*   **📤 Enhanced Export Capabilities:**\n    *   Flexible output paths\n    *   Multiple export formats (CSV/JSON)\n    *   Automatic directory creation\n\n## 📝 Scraped Data Fields\n\nThe application collects the following data from Facebook group posts and comments:\n\n### 🗨️ Posts\n- Post content\n- Post URL\n- Post timestamp\n- Author name\n- Author profile picture URL\n\n### 💬 Comments\n- Comment content\n- Comment timestamp\n- Author name\n- Author profile picture URL\n- Facebook comment ID\n\n### 🔍 AI Analysis Fields\n- Category (e.g., \"Project Idea\", \"Problem Statement\")\n- Sub-category\n- Keywords\n- Summary\n- Potential idea flag\n- Sentiment analysis (for comments)\n\n## 🛠️ Tech Stack\n\n*   **Language:** `Python`\n*   **Web Scraping:**\n    *   `Selenium`\n    *   `webdriver-manager`\n    *   `BeautifulSoup4`\n*   **AI \u0026 Machine Learning:**\n    *   `google-generativeai`\n*   **Database:**\n    *   `SQLite`\n*   **CLI:**\n    *   `click`\n*   **Utilities:**\n    *   `python-dotenv`\n    *   `getpass`\n\n\n## 📋 Prerequisites\n\nBefore you begin, ensure you have the following:\n*   Python 3.9+\n*   Git\n*   A modern Web Browser (e.g., Chrome, Firefox)\n*   Google Cloud Project \u0026 Gemini API Key\n\n\n## 🚀 Getting Started\n\n### 📦 Option 1: Binary Release (Easiest)\nFor most users, we recommend using the pre-compiled binaries:\n1.  **Download** the latest version for your platform from the [Releases](https://github.com/MasuRii/FBScrapeIdeas/releases) page.\n2.  **Run the application:**\n    - **Windows:** Double-click `FBScrapeIdeas-windows-x64.exe`.\n    - **macOS/Linux:** Open a terminal, make the file executable (`chmod +x FBScrapeIdeas-*`), and run it.\n3.  **Interactive Setup:** On the first launch, the application will guide you through an interactive wizard to configure your API keys and credentials. **No manual `.env` file creation is required!**\n\n---\n\n### 🛠️ Option 2: Running from Source (For Developers)\n\n1.  **Clone the repository:**\n    ```bash\n    git clone https://github.com/MasuRii/FBScrapeIdeas.git\n    cd FBScrapeIdeas\n    ```\n2.  **Create and activate a virtual environment:**\n    ```bash\n    # For Linux/macOS\n    python3 -m venv venv\n    source venv/bin/activate\n    \n    # For Windows (Command Prompt)\n    python -m venv venv\n    venv\\Scripts\\activate.bat\n    ```\n3.  **Install dependencies:**\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n### Configuration (Manual)\n\nIf you prefer to configure the application manually (e.g., for automated environments):\n\n1.  **Set up Environment Variables:**\n    Create a `.env` file in the project root:\n    ```dotenv\n    # .env\n    \n    # Provider Selection (gemini or openai)\n    AI_PROVIDER=gemini\n    \n    # Gemini Configuration\n    GOOGLE_API_KEY=YOUR_GEMINI_API_KEY_HERE\n    GEMINI_MODEL=models/gemini-2.5-flash\n    ```\n    (See [AI Provider Configuration](#ai-provider-configuration) for more details)\n    \u003e Note: Facebook credentials are entered securely during scraping or saved during the first-run interactive session.\n\n2.  **WebDriver Setup:**\n    `webdriver-manager` will handle this automatically on the first run.\n\n\n## 🧠 AI Provider Configuration\n\nFB Scrape Ideas supports multiple AI providers, allowing you to choose between Google's Gemini models, OpenAI's official API, or local LLMs running via tools like Ollama or LM Studio.\n\nYou can configure these settings via the `.env` file or the CLI menu.\n\n### 🔹 Using Google Gemini (Default)\n\nThis is the default provider. You only need a Google API Key.\n\n**Configuration (`.env`):**\n```dotenv\nAI_PROVIDER=gemini\nGOOGLE_API_KEY=your_google_api_key\nGEMINI_MODEL=models/gemini-2.0-flash  # Optional: Change model\n```\n\n**Available Gemini Models:**\n- `models/gemini-2.0-flash` (Fast, efficient)\n- `models/gemini-1.5-flash`\n- `models/gemini-1.5-pro` (Higher reasoning capability)\n\n### 🔹 Using OpenAI-Compatible Providers\n\nYou can connect to any service that follows the OpenAI API standard, including local LLMs.\n\n#### 1. Official OpenAI\n```dotenv\nAI_PROVIDER=openai\nOPENAI_API_KEY=sk-...\nOPENAI_MODEL=gpt-5o\n```\n\n#### 2. Ollama (Local LLM)\nRun Ollama locally (`ollama serve`) and use the following config:\n```dotenv\nAI_PROVIDER=openai\nOPENAI_BASE_URL=http://localhost:11434/v1\nOPENAI_API_KEY=ollama  # Value doesn't matter for Ollama, but must be present\nOPENAI_MODEL=llama3    # Or any model you have pulled\n```\n\n#### 3. LM Studio (Local LLM)\nStart the local server in LM Studio and use:\n```dotenv\nAI_PROVIDER=openai\nOPENAI_BASE_URL=http://localhost:1234/v1\nOPENAI_API_KEY=lm-studio\nOPENAI_MODEL=model-identifier\n```\n\n#### 4. OpenRouter / Together AI / Groq\nPoint the `OPENAI_BASE_URL` to the provider's endpoint:\n```dotenv\nAI_PROVIDER=openai\nOPENAI_BASE_URL=https://openrouter.ai/api/v1\nOPENAI_API_KEY=your_openrouter_key\nOPENAI_MODEL=anthropic/claude-3-opus\n```\n\n### 🔹 Custom Prompts\n\nYou can customize the instructions given to the AI by creating a `custom_prompts.json` file in the root directory. This allows you to tailor the categorization logic or sentiment analysis to your specific needs.\n\n**To use:**\n1. Copy `custom_prompts.example.json` to `custom_prompts.json`.\n2. Edit the prompts in `custom_prompts.json`.\n\n**Example Structure:**\n```json\n{\n  \"post_categorization\": \"You are an expert post categorizer. Analyze the following...\",\n  \"comment_analysis\": \"You are an expert comment analyzer...\"\n}\n```\n\n## ⚙️ Usage\n\nThe application is run via the CLI:\n\n```bash\npython main.py \u003ccommand\u003e [options]\n```\n\n**Available Commands:**\n\n*   `scrape`: Scrapes posts and comments from a Facebook group.\n    ```bash\n    python main.py scrape --group-url \"GROUP_URL\" [--num-posts 50] [--headless]\n    ```\n    \u003e You'll be prompted securely for Facebook credentials\n    \n*   `process-ai`: Processes scraped posts and comments with the configured AI provider.\n    ```bash\n    python main.py process-ai\n    ```\n    \n*   `view`: Views categorized posts and comments with filtering options:\n    ```bash\n    python main.py view [--category CATEGORY] [--author AUTHOR] [--limit N]\n    ```\n    *   Interactive field and value selection\n    *   Pagination support\n    \n*   `export`: Exports data to CSV or JSON format:\n    ```bash\n    python main.py export --format csv|json [--output-path PATH] [--category CATEGORY]\n    ```\n    *   Handles both posts and comments\n    *   Automatic directory creation\n    \n*   `stats`: Shows comprehensive statistics about collected data:\n    ```bash\n    python main.py stats\n    ```\n\n\n## ⚠️ Important Notice\n\n**This tool is provided for educational purposes only. Users must:**\n- Comply with Facebook's Terms of Service\n- Respect privacy and data protection laws\n- Not use scraped data for commercial purposes\n- Use responsibly and ethically\n\nThe developers assume no liability for misuse of this tool. Scraping may violate Facebook's terms - use at your own risk.\n\n\u003c!-- Shields.io links --\u003e\n[python-shield]: https://img.shields.io/badge/Python-3.9%2B-blue.svg\n[python-url]: https://www.python.org/downloads/\n[license-shield]: https://img.shields.io/github/license/MasuRii/FBScrapeIdeas\n[license-url]: https://github.com/MasuRii/FBScrapeIdeas/blob/master/LICENSE\n[issues-shield]: https://img.shields.io/github/issues/MasuRii/FBScrapeIdeas\n[issues-url]: https://github.com/MasuRii/FBScrapeIdeas/issues\n[forks-shield]: https://img.shields.io/github/forks/MasuRii/FBScrapeIdeas\n[forks-url]: https://github.com/MasuRii/FBScrapeIdeas/network/members\n[stars-shield]: https://img.shields.io/github/stars/MasuRii/FBScrapeIdeas\n[stars-url]: https://github.com/MasuRii/FBScrapeIdeas/stargazers\n[contributors-shield]: https://img.shields.io/github/contributors/MasuRii/FBScrapeIdeas\n[contributors-url]: https://github.com/MasuRii/FBScrapeIdeas/graphs/contributors\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmasurii%2Ffbscrapeideas","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmasurii%2Ffbscrapeideas","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmasurii%2Ffbscrapeideas/lists"}