{"id":20516415,"url":"https://github.com/messi10tom/insightextractor","last_synced_at":"2026-01-27T07:33:12.790Z","repository":{"id":261905492,"uuid":"885470243","full_name":"messi10tom/InsightExtractor","owner":"messi10tom","description":"InsightExtractor: An AI-powered tool for automated data retrieval. Connect CSVs or Google Sheets, define queries, and extract structured insights via web search and LLMs. Features include customizable prompts, API integration, and an intuitive dashboard for data export","archived":false,"fork":false,"pushed_at":"2024-11-19T16:04:00.000Z","size":569,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-08T17:54:28.509Z","etag":null,"topics":["api","llm","sheets","webscraping"],"latest_commit_sha":null,"homepage":"https://insightextractor.streamlit.app/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/messi10tom.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-08T16:40:31.000Z","updated_at":"2025-01-15T21:39:51.000Z","dependencies_parsed_at":"2024-11-09T06:31:20.235Z","dependency_job_id":"aad53cb9-5cd0-40e1-8b67-e39519b34357","html_url":"https://github.com/messi10tom/InsightExtractor","commit_stats":null,"previous_names":["messi10tom/insightextractor"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/messi10tom/InsightExtractor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/messi10tom%2FInsightExtractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/messi10tom%2FInsightExtractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/messi10tom%2FInsightExtractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/messi10tom%2FInsightExtractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/messi10tom","download_url":"https://codeload.github.com/messi10tom/InsightExtractor/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/messi10tom%2FInsightExtractor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28808021,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-27T07:14:39.408Z","status":"ssl_error","status_checked_at":"2026-01-27T07:14:39.098Z","response_time":168,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","llm","sheets","webscraping"],"created_at":"2024-11-15T21:28:50.474Z","updated_at":"2026-01-27T07:33:12.776Z","avatar_url":"https://github.com/messi10tom.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# InsightExtractor\n[![GitHub issues](https://img.shields.io/github/issues/messi10tom/InsightExtractor)](https://github.com/messi10tom/InsightExtractor/issues)\n[![GitHub forks](https://img.shields.io/github/forks/messi10tom/InsightExtractor)](https://github.com/messi10tom/InsightExtractor/network)\n[![GitHub stars](https://img.shields.io/github/stars/messi10tom/InsightExtractor)](https://github.com/messi10tom/InsightExtractor/stargazers)\n[![GitHub license](https://img.shields.io/github/license/messi10tom/InsightExtractor)](https://github.com/messi10tom/InsightExtractor/blob/main/LICENSE)\n[![GitHub contributors](https://img.shields.io/github/contributors/messi10tom/InsightExtractor)](https://github.com/messi10tom/InsightExtractor/graphs/contributors)\n[![GitHub last commit](https://img.shields.io/github/last-commit/messi10tom/InsightExtractor)](https://github.com/messi10tom/InsightExtractor/commits/main)\n![InsightExtractor Banner](./doc/banner.png)\n\nInsightExtractor is an AI-powered tool for automated data retrieval. Connect CSVs or Google Sheets, define queries, and extract structured insights via web search and LLMs. Features include customizable prompts, API integration, and an intuitive dashboard for data export.\n\n## Video Tutorial\n\nFor a detailed walkthrough on how to use InsightExtractor, check out our [YouTube video tutorial](https://youtu.be/ATr7Y5CtE1E).\n\n## Supported Models\n\nInsightExtractor supports multiple AI models for data extraction and processing. Users can choose from the following models:\n\n- **Gemini**\n- **ChatGPT**\n- **Ollama**\n\nEach model offers unique capabilities and can be selected based on the specific requirements of your data extraction tasks.\n## Features\n\n- **Automated Data Retrieval**: Seamlessly connect CSVs or Google Sheets and extract data.\n- **Customizable Prompts**: Define queries and prompts to tailor the data extraction process.\n- **Web Scraping**: Scrape web data and integrate it with your structured data.\n- **LLM Integration**: Utilize Language Models to process and extract insights from the data.\n- **Intuitive Dashboard**: User-friendly interface for managing data and exporting results.\n- **API Integration**: Easily integrate with other tools and services via APIs.\n\n## Installation\n\nTo install InsightExtractor, follow these steps:\n\n1. **Clone the Repository**:\n    ```sh\n    git clone https://github.com/messi10tom/InsightExtractor.git\n    cd InsightExtractor\n    ```\n    **Windows**\n    ```\n    python -m venv IE\n    IE\\Scripts\\activate\n    ```\n    **macOS and Linux**\n    ```\n    python3 -m venv IE\n    source IE/bin/activate\n    ```\n    \n\n2. **Install Dependencies**:\n    ```sh\n    pip install -r requirements.txt\n    ```\n\n3. **Set Up Environment Variables**:\n- **Create BD_AUTH Token**:\n    - Visit [Bright Data](https://brightdata.com/) and access the dashboard.\n    - Choose \"Scraping Browser\" from the \"Add\" dropdown menu.\n    - Name your scraping browser (e.g., \"InsightExtractor\") and create it.\n    - Go to \"Playground\" in your Scraping Browser and toggle to \"Code Examples\".\n    - Select \"Python, Selenium\" and copy the AUTH key from the example script.\n    - Paste the AUTH key into the `BD_AUTH` field in your `.env` file.\n\n- **Create Google Application Credentials**:\n    - Visit [Google Cloud Console](https://console.cloud.google.com/) and select your Google account.\n    - Create and select a new project.\n    - Navigate to \"API \u0026 Services\" and enable the Google Sheets API.\n    - Create credentials, set the service account role to Editor, and generate a JSON key.\n    - Download the JSON key file and move it to the project directory.\n    - Copy the file path and paste it into the `GOOGLE_APPLICATION_CREDENTIALS` field in your `.env` file.\n\n- **Create Google API Key**:\n    - Visit [Google AI Studio](https://aistudio.google.com/welcome) and sign in with your Google account.\n    - Click on the \"Get API key\".\n    - Click on \"Create API Key\" to generate a new API key.\n    - Copy the generated API key and paste it into the `GOOGLE_API_KEY` field in your `.env` file.\n\n - **Setting Up ChatGPT API Key**:\n\n    1. **Create an Account**:\n        - Sign up for a free account on ChatGPT [here](https://chat.openai.com/).\n\n    2. **Generate an API Key**:\n        - Log in, go to \"API Keys\", click \"+ Create new secret key\", name your key, and copy the API key.\n\n    3. **Set Up Billing**:\n        - Go to 'Billing', add payment details, choose user type, enter payment info, and configure payment options.\n\n    4. **Set Usage Limits**:\n        - Go to 'Limits', set hard and soft usage caps, and click 'Save'.\n\n    5. **Save API Key in `.env` File**:\n        - Add the following line to your `.env` file:\n            ```env\n            OPENAI_API_KEY=your_api_key_here\n            ```\n\n- **Set Up Ollama**:\n    - Visit [Ollama GitHub](https://github.com/ollama/ollama) and download the appropriate version for your operating system.\n    - After downloading, open your terminal and run the following command:\n        ```sh\n        ollama run llama3.2\n        ```\n- **Set up streamlit secrets**\n    - create ```.streamlit\\secrets.toml``` and paste all the API keys in the following format\n    ```env\n    BD_AUTH=\"\"\n    GOOGLE_APPLICATION_CREDENTIALS=\"path/to/your/credentials.json\"\n    GOOGLE_API_KEY=\"\"\n    OPENAI_API_KEY=\"\"\n    ```\n\n4. **Run the Application**:\n    ```sh\n    streamlit run src/main.py\n    ```\n\n## Usage\n\n1. **Upload CSV or Google Sheets**:\n    - Choose to upload a CSV file or connect to a Google Sheet.\n    - Ensure the CSV file contains a column named \"Links\" with the URLs of the webpages you want to scrape.\n\n2. **Define Your Query**:\n    - Enter a prompt to define what data you want to extract.\n\n3. **Extract Data**:\n    - The tool will scrape the web data, process it using LLMs, and present the extracted insights.\n\n4. **Export Results**:\n    - Download the results as a CSV file for further analysis.\n\n## Example\n\nHere is an example of how to use InsightExtractor:\n\n1. **Sample CSV File**:\n    ```csv\n    Links,company\n    example1.com,company_1\n    example2.com,company_2\n    ```\n\n2. **User Prompt**:\n    ```\n    Extract the names, emails, and companies of the professionals mentioned in the text {professional}.\n    ```\n\n## Contributing\n\nWe welcome contributions to InsightExtractor! If you have any ideas, suggestions, or bug reports, please open an issue or submit a pull request.\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n## Acknowledgements\n\n- [Streamlit](https://streamlit.io/)\n- [Pandas](https://pandas.pydata.org/)\n- [Selenium](https://www.selenium.dev/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmessi10tom%2Finsightextractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmessi10tom%2Finsightextractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmessi10tom%2Finsightextractor/lists"}