{"id":23908507,"url":"https://github.com/davidyen1124/ai-crawler","last_synced_at":"2026-05-04T10:31:26.288Z","repository":{"id":250482227,"uuid":"834585457","full_name":"davidyen1124/ai-crawler","owner":"davidyen1124","description":"AI web scraper using GPT to dynamically optimize CSS selectors for reliable data extraction.","archived":false,"fork":false,"pushed_at":"2024-07-27T18:09:55.000Z","size":8,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-05T04:38:23.971Z","etag":null,"topics":["ai","automation","css-selector","gpt","nodejs","openai","playwright","scraping"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/davidyen1124.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-27T18:05:41.000Z","updated_at":"2025-01-01T07:17:47.000Z","dependencies_parsed_at":"2024-07-27T19:27:45.032Z","dependency_job_id":"ac355039-24cb-4041-97d2-d1a909fa90ef","html_url":"https://github.com/davidyen1124/ai-crawler","commit_stats":null,"previous_names":["davidyen1124/ai-crawler"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidyen1124%2Fai-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidyen1124%2Fai-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidyen1124%2Fai-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidyen1124%2Fai-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/davidyen1124","download_url":"https://codeload.github.com/davidyen1124/ai-crawler/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240335049,"owners_count":19785320,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","automation","css-selector","gpt","nodejs","openai","playwright","scraping"],"created_at":"2025-01-05T04:38:30.744Z","updated_at":"2026-05-04T10:31:26.258Z","avatar_url":"https://github.com/davidyen1124.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AI-Powered Web Scraper\n\nThis project is an AI-powered web scraper that uses OpenAI's GPT model to dynamically analyze and optimize CSS selectors for reliable web scraping.\n\n## Features\n\n- Dynamic CSS selector optimization using AI\n- Visual feedback with highlighted elements in the browser\n- Automatic screenshot capture for AI analysis\n- Simplified DOM tree structure analysis\n- Configurable scraping goals\n\n## Prerequisites\n\n- Node.js (v14 or later recommended)\n- An OpenAI API key\n\n## Installation\n\n1. Clone the repository:\n\n   ```\n   git clone https://github.com/yourusername/ai-powered-web-scraper.git\n   cd ai-powered-web-scraper\n   ```\n\n2. Install dependencies:\n\n   ```\n   npm install\n   ```\n\n3. Create a `config.js` file in the root directory with your OpenAI API key:\n   ```javascript\n   module.exports = {\n     OPENAI_API_KEY: 'your-api-key-here',\n     MODEL: 'gpt-4o-mini'\n   }\n   ```\n\n## Usage\n\nTo start the web scraper, run:\n\n```\nnode crawler.js\n```\n\nYou can modify the `scrapingGoal` and target URL in the `crawler.js` file to customize the scraping task.\n\n## How it Works\n\n1. The scraper starts with an initial CSS selector and loads the target webpage.\n2. It captures a screenshot and analyzes the DOM structure.\n3. The AI model analyzes the current selector, screenshot, and DOM structure to suggest optimizations.\n4. The process repeats until the AI determines the selector is optimal or no further improvements can be made.\n5. Finally, the scraper extracts the desired information using the optimized selector.\n\n## Files\n\n- `crawler.js`: Main script that controls the web scraping process.\n- `openai.js`: Handles interactions with the OpenAI API for selector analysis.\n- `config.js`: Contains configuration settings (API key, model name).\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidyen1124%2Fai-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavidyen1124%2Fai-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidyen1124%2Fai-crawler/lists"}