{"id":27888808,"url":"https://github.com/torchstack-ai/agent-scraper","last_synced_at":"2026-04-13T00:39:32.288Z","repository":{"id":280531977,"uuid":"942317592","full_name":"torchstack-ai/agent-scraper","owner":"torchstack-ai","description":"A simple AI agent that uses the ReAct pattern to search for and download web content about specific topics. Built with LangChain and OpenAI","archived":false,"fork":false,"pushed_at":"2025-03-03T23:21:02.000Z","size":4,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-10T09:18:03.992Z","etag":null,"topics":["agentic-ai","langchain","openai","react"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/torchstack-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-03T23:20:02.000Z","updated_at":"2025-06-19T22:22:14.000Z","dependencies_parsed_at":"2025-03-04T00:34:31.733Z","dependency_job_id":null,"html_url":"https://github.com/torchstack-ai/agent-scraper","commit_stats":null,"previous_names":["torchstack-ai/agent-scraper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/torchstack-ai/agent-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/torchstack-ai%2Fagent-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/torchstack-ai%2Fagent-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/torchstack-ai%2Fagent-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/torchstack-ai%2Fagent-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/torchstack-ai","download_url":"https://codeload.github.com/torchstack-ai/agent-scraper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/torchstack-ai%2Fagent-scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31735541,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-12T22:19:12.206Z","status":"ssl_error","status_checked_at":"2026-04-12T22:18:33.088Z","response_time":58,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agentic-ai","langchain","openai","react"],"created_at":"2025-05-05T09:28:35.203Z","updated_at":"2026-04-13T00:39:32.272Z","avatar_url":"https://github.com/torchstack-ai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Agent Scraper\n\nA simple AI agent that uses the ReAct pattern to search for and download web content about specific topics. Built with LangChain and OpenAI.\n\n## Features\n\n- **Topic-based Web Search**: Finds relevant web pages for any given topic\n- **Automatic Content Download**: Saves web pages as HTML files\n- **ReAct Pattern Implementation**: Uses reasoning and acting to complete tasks\n- **LangSmith Integration**: Uses LangSmith for prompt management\n- **Rate Limiting**: Built-in delays to avoid search API issues\n\n## Prerequisites\n\n- Python 3.8+\n- OpenAI API key\n- LangSmith API key\n\n## Installation\n\n1. Clone the repository:\n```bash\ngit clone \u003crepository-url\u003e\ncd agent-scraper\n```\n\n2. Create and activate a virtual environment:\n```bash\npython -m venv .venv\nsource .venv/bin/activate  # On Windows: .venv\\Scripts\\activate\n```\n\n3. Install dependencies:\n```bash\npip install -r requirements.txt\n```\n\n4. Create a `.env` file in the project root with your API keys:\n```\nOPENAI_API_KEY=your_openai_api_key\nLANGSMITH_API_KEY=your_langsmith_api_key\n```\n\n## Usage\n\nRun the script:\n```bash\npython react.py\n```\n\nThe script will:\n1. Search for relevant web pages about the specified topic\n2. Download the found pages as HTML files\n3. Save the files in the `downloads` directory\n\n## Project Structure\n\n```\nagent-scraper/\n├── react.py              # Main script with agent implementation\n├── requirements.txt      # Project dependencies\n├── .env                 # Environment variables (create this)\n└── downloads/           # Directory for downloaded web pages\n```\n\n## How It Works\n\nThe agent uses the ReAct (Reasoning and Acting) pattern:\n1. **Reasoning**: The agent thinks about how to research the topic\n2. **Acting**: The agent performs actions (searching and downloading)\n3. **Observing**: The agent analyzes the results\n4. **Repeating**: The process continues until sufficient information is gathered\n\n## Tools\n\nThe agent has access to two main tools:\n1. **SearchTopics**: Finds relevant URLs for a given topic\n2. **DownloadPage**: Downloads and saves web pages as HTML files\n\n## Contributing\n\nFeel free to submit issues and enhancement requests!\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftorchstack-ai%2Fagent-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftorchstack-ai%2Fagent-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftorchstack-ai%2Fagent-scraper/lists"}