https://github.com/afnanksalal/ai-powered-data-extraction-agent
This project demonstrates an AI-powered data extraction agent that uses SerpAPI for web searching and Groq's LLM for information extraction. It provides a Streamlit-based interface for users to upload CSV data, specify a query prompt, and extract targeted information from the web.
https://github.com/afnanksalal/ai-powered-data-extraction-agent
agent data-extraction groq llm serpapi streamlit
Last synced: 3 months ago
JSON representation
This project demonstrates an AI-powered data extraction agent that uses SerpAPI for web searching and Groq's LLM for information extraction. It provides a Streamlit-based interface for users to upload CSV data, specify a query prompt, and extract targeted information from the web.
- Host: GitHub
- URL: https://github.com/afnanksalal/ai-powered-data-extraction-agent
- Owner: Afnanksalal
- License: mit
- Created: 2024-11-14T07:28:50.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-11-14T16:02:38.000Z (7 months ago)
- Last Synced: 2025-01-21T05:28:23.548Z (5 months ago)
- Topics: agent, data-extraction, groq, llm, serpapi, streamlit
- Language: Python
- Homepage: https://ai-powered-data-extraction-agent.koyeb.app/
- Size: 47.9 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# AI-Powered Data Extraction Agent
This project demonstrates an AI-powered data extraction agent that uses SerpAPI for web searching and Groq's LLM for information extraction. It provides a Streamlit-based interface for users to upload CSV data, specify a query prompt, and extract targeted information from the web.
## Features
* **CSV Data Upload:** Users can upload their CSV data through a user-friendly interface.
* **Dynamic Query Prompts:** The application allows users to define flexible query prompts using placeholders that correspond to column names in the CSV.
* **SerpAPI Integration:** Leverages SerpAPI for performing web searches based on user queries.
* **Groq LLM Integration:** Uses Groq's Large Language Model (LLM) to extract specific information from search results.
* **Concise and Plaintext Output:** The LLM is instructed to provide concise, plain text answers without extra formatting or explanations.
* **Error Handling:** Robust error handling ensures that issues with API calls or data processing are caught and displayed to the user.
* **Streamlit Interface:** Provides an interactive and easy-to-use web application for data extraction.## Installation
1. **Clone the Repository:**
```bash
git clone https://github.com/Afnanksalal/AI-Powered-Data-Extraction-Agent
cd AI-Powered-Data-Extraction-Agent
```2. **Create and Activate a Virtual Environment:** (Recommended)
```bash
python3 -m venv venv
source venv/bin/activate # On Linux/macOS
venv\Scripts\activate # On Windows
```3. **Install Requirements:**
```bash
pip install -r requirements.txt
```4. **Configure API Keys:**
You can set up your API keys using either a `config.yaml` file or Koyeb environment variables:
- **Using `config.yaml`:**
Replace values in `config.yaml` in the project's root directory with your API keys:
```yaml
serpapi_api_key: YOUR_SERPAPI_API_KEY
groq_api_key: YOUR_GROQ_API_KEY
```
- **Using Koyeb Environment Variables:**
Set `serpapi_api_key` and `groq_api_key` as environment variables directly in Koyeb. These variables will be prioritized over `config.yaml`.## Usage
1. **Run the Streamlit App:**
```bash
streamlit run main.py
```2. **Upload CSV Data:** Use the file uploader in the Streamlit app to upload your CSV file.
3. **Select Column:** Choose the column from your CSV that you want to use for the queries.
4. **Enter Query Prompt:** Enter a query prompt. Use curly braces `{}` as placeholders for the values from the selected column. For example:
- If your column is named "company": `Give me contact details of {company}`5. **Run Extraction:** Click the "Run Extraction" button to start the data extraction process.
## Deploy to Koyeb
Click the button below to deploy this project on Koyeb. Be sure to replace `"changeme"` with your actual API keys for `serpapi_api_key` and `groq_api_key` in the Koyeb configuration.
[](https://app.koyeb.com/deploy?name=info-scrapper&type=git&repository=Afnanksalal%2Finfo-scrapper&branch=main&builder=buildpack&env%5Bgroq_api_key%5D=changeme&env%5Bserpapi_api_key%5D=changeme&ports=8000%3Bhttp%3B%2F)
## Example CSV Files
You can find an example CSV file (`example.csv`) in the repository to test the application.
## Project Structure
* `main.py`: The main Streamlit application file.
* `search_and_extract.py`: Handles web searches using SerpAPI.
* `llm_integration.py`: Handles interaction with the Groq LLM.
* `dashboard.py`: Creates the Streamlit dashboard components.
* `utils.py`: Contains utility functions (e.g., loading configuration).
* `data_processing.py`: Handles data loading.
* `config.yaml`: Stores API keys and configuration.## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Contributing
Contributions are welcome! Please open an issue or submit a pull request.
## Acknowledgements
* **SerpAPI:** Used for web search functionality. [https://serpapi.com/](https://serpapi.com/)
* **Groq:** Used for LLM-powered information extraction. [https://groq.com/](https://groq.com/)
* **Streamlit:** Used for creating the interactive web application. [https://streamlit.io/](https://streamlit.io/)