{"id":21125464,"url":"https://github.com/afnanksalal/ai-powered-data-extraction-agent","last_synced_at":"2026-05-20T03:36:13.845Z","repository":{"id":262777295,"uuid":"888315775","full_name":"Afnanksalal/AI-Powered-Data-Extraction-Agent","owner":"Afnanksalal","description":"This project demonstrates an AI-powered data extraction agent that uses SerpAPI for web searching and Groq's LLM for information extraction. It provides a Streamlit-based interface for users to upload CSV data, specify a query prompt, and extract targeted information from the web.","archived":false,"fork":false,"pushed_at":"2024-11-14T16:02:38.000Z","size":49,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-21T05:28:23.548Z","etag":null,"topics":["agent","data-extraction","groq","llm","serpapi","streamlit"],"latest_commit_sha":null,"homepage":"https://ai-powered-data-extraction-agent.koyeb.app/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Afnanksalal.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-14T07:28:50.000Z","updated_at":"2024-11-14T16:02:42.000Z","dependencies_parsed_at":"2024-11-14T08:36:57.469Z","dependency_job_id":null,"html_url":"https://github.com/Afnanksalal/AI-Powered-Data-Extraction-Agent","commit_stats":null,"previous_names":["afnanksalal/info-scrapper"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Afnanksalal%2FAI-Powered-Data-Extraction-Agent","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Afnanksalal%2FAI-Powered-Data-Extraction-Agent/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Afnanksalal%2FAI-Powered-Data-Extraction-Agent/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Afnanksalal%2FAI-Powered-Data-Extraction-Agent/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Afnanksalal","download_url":"https://codeload.github.com/Afnanksalal/AI-Powered-Data-Extraction-Agent/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243573163,"owners_count":20312879,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","data-extraction","groq","llm","serpapi","streamlit"],"created_at":"2024-11-20T04:34:03.783Z","updated_at":"2026-05-20T03:36:13.820Z","avatar_url":"https://github.com/Afnanksalal.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AI-Powered Data Extraction Agent\n\nThis project demonstrates an AI-powered data extraction agent that uses SerpAPI for web searching and Groq's LLM for information extraction. It provides a Streamlit-based interface for users to upload CSV data, specify a query prompt, and extract targeted information from the web.\n\n## Features\n\n* **CSV Data Upload:** Users can upload their CSV data through a user-friendly interface.\n* **Dynamic Query Prompts:** The application allows users to define flexible query prompts using placeholders that correspond to column names in the CSV.\n* **SerpAPI Integration:** Leverages SerpAPI for performing web searches based on user queries.\n* **Groq LLM Integration:** Uses Groq's Large Language Model (LLM) to extract specific information from search results.\n* **Concise and Plaintext Output:** The LLM is instructed to provide concise, plain text answers without extra formatting or explanations.\n* **Error Handling:** Robust error handling ensures that issues with API calls or data processing are caught and displayed to the user.\n* **Streamlit Interface:** Provides an interactive and easy-to-use web application for data extraction.\n\n## Installation\n\n1. **Clone the Repository:**\n   ```bash\n   git clone https://github.com/Afnanksalal/AI-Powered-Data-Extraction-Agent\n   cd AI-Powered-Data-Extraction-Agent\n   ```\n\n2. **Create and Activate a Virtual Environment:** (Recommended)\n   ```bash\n   python3 -m venv venv\n   source venv/bin/activate  # On Linux/macOS\n   venv\\Scripts\\activate  # On Windows\n   ```\n\n3. **Install Requirements:**\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n4. **Configure API Keys:**  \n   You can set up your API keys using either a `config.yaml` file or Koyeb environment variables:\n   \n   - **Using `config.yaml`:**  \n     Replace values in `config.yaml` in the project's root directory with your API keys:\n     ```yaml\n     serpapi_api_key: YOUR_SERPAPI_API_KEY\n     groq_api_key: YOUR_GROQ_API_KEY\n     ```\n   \n   - **Using Koyeb Environment Variables:**  \n     Set `serpapi_api_key` and `groq_api_key` as environment variables directly in Koyeb. These variables will be prioritized over `config.yaml`.\n\n## Usage\n\n1. **Run the Streamlit App:**\n   ```bash\n   streamlit run main.py\n   ```\n\n2. **Upload CSV Data:** Use the file uploader in the Streamlit app to upload your CSV file.\n\n3. **Select Column:** Choose the column from your CSV that you want to use for the queries.\n\n4. **Enter Query Prompt:** Enter a query prompt. Use curly braces `{}` as placeholders for the values from the selected column. For example:\n   - If your column is named \"company\": `Give me contact details of {company}`\n\n5. **Run Extraction:** Click the \"Run Extraction\" button to start the data extraction process.\n\n## Deploy to Koyeb\n\nClick the button below to deploy this project on Koyeb. Be sure to replace `\"changeme\"` with your actual API keys for `serpapi_api_key` and `groq_api_key` in the Koyeb configuration.\n\n[![Deploy to Koyeb](https://www.koyeb.com/static/images/deploy/button.svg)](https://app.koyeb.com/deploy?name=info-scrapper\u0026type=git\u0026repository=Afnanksalal%2Finfo-scrapper\u0026branch=main\u0026builder=buildpack\u0026env%5Bgroq_api_key%5D=changeme\u0026env%5Bserpapi_api_key%5D=changeme\u0026ports=8000%3Bhttp%3B%2F)\n\n## Example CSV Files\n\nYou can find an example CSV file (`example.csv`) in the repository to test the application.\n\n## Project Structure\n\n* `main.py`: The main Streamlit application file.\n* `search_and_extract.py`: Handles web searches using SerpAPI.\n* `llm_integration.py`: Handles interaction with the Groq LLM.\n* `dashboard.py`: Creates the Streamlit dashboard components.\n* `utils.py`: Contains utility functions (e.g., loading configuration).\n* `data_processing.py`: Handles data loading.\n* `config.yaml`: Stores API keys and configuration.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Contributing\n\nContributions are welcome! Please open an issue or submit a pull request.\n\n## Acknowledgements\n\n* **SerpAPI:** Used for web search functionality. [https://serpapi.com/](https://serpapi.com/)\n* **Groq:** Used for LLM-powered information extraction. [https://groq.com/](https://groq.com/)\n* **Streamlit:** Used for creating the interactive web application. [https://streamlit.io/](https://streamlit.io/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fafnanksalal%2Fai-powered-data-extraction-agent","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fafnanksalal%2Fai-powered-data-extraction-agent","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fafnanksalal%2Fai-powered-data-extraction-agent/lists"}