Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lucereal/autoresearcher
AI Agent to gather information on specific topics
https://github.com/lucereal/autoresearcher
ai-agents beautifulsoup google-search-api openai playwright python web-scraping
Last synced: 18 days ago
JSON representation
AI Agent to gather information on specific topics
- Host: GitHub
- URL: https://github.com/lucereal/autoresearcher
- Owner: lucereal
- Created: 2024-10-12T21:36:20.000Z (24 days ago)
- Default Branch: main
- Last Pushed: 2024-10-17T22:45:45.000Z (19 days ago)
- Last Synced: 2024-10-18T04:09:45.936Z (19 days ago)
- Topics: ai-agents, beautifulsoup, google-search-api, openai, playwright, python, web-scraping
- Language: Python
- Homepage:
- Size: 232 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AutoResearcher :robot:
Automated Data Collection with GPT, Google Custom Search, and Web ScrapingThis Python project automates data collection on user provided topics.
## Features
- **Query Generation**: GPT creates relevant search queries.
- **Data Collection**: NewsApi, YouTube, and Google Search to collect data on specific topics
- **Data Scraping**: Uses Playwright, and BeautifulSoup to fetch and extract web content. Uses MoviePy to extract audio data from videos. Uses OpenAI Whisper Model to created audio transcriptions.
- **Summarization**: GPT summarizes scraped data into concise reports.
- **End-to-End Automation**: Fully automated from input to summarized output.## Technologies
- **Python**, **OpenAI API**, **Google Custom Search API**
- **Playwright**, **BeautifulSoup**## Setup
1. **Clone the repo**:
```bash
git clone https://github.com/your-username/your-repo-name.git
cd your-repo-name
```2. **Install dependencies**:
```bash
pip install -r requirements.txt
```3. **Set up API keys**:
Create a `.env` file with:
```bash
OPENAI_API_KEY=your_openai_api_key
GOOGLE_CUSTOM_SEARCH_API_KEY=your_google_custom_search_api_key
GOOGLE_CX=your_google_cse_id
```4. **Run the script**:
```bash
python src/researcher/main.py --topic "Your topic here"
```5. **Check Results Folder**:
Find result json file in results folder## Usage
1. Input a topic.
2. GPT generates search queries.
3. Google Custom Search retrieves results.
4. Playwright & BeautifulSoup scrape web pages.
5. GPT summarizes the scraped content.## Future Enhancements
- Support for more search engines.
- Advanced query generation and filtering.
- Customizable summarization options.
- Video summaries
- Add UI for visualizing results
- Add subscription features for users to get updates on a schedule