{"id":24989783,"url":"https://github.com/yaser-123/infozap","last_synced_at":"2026-05-10T19:15:58.707Z","repository":{"id":275252903,"uuid":"925553131","full_name":"Yaser-123/Infozap","owner":"Yaser-123","description":"A modern Python-based web scraper powered by Selenium, designed to extract specific content from websites. It intelligently handles CAPTCHA, waits for full page load, and uses AI integration to retrieve and parse targeted information efficiently, making web scraping smarter and faster.","archived":false,"fork":false,"pushed_at":"2025-02-16T03:48:30.000Z","size":75192,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-16T04:21:55.584Z","etag":null,"topics":["beautifulsoup","chromedriver","llama3","ollama","python","selenium","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Yaser-123.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-01T06:12:43.000Z","updated_at":"2025-02-16T03:48:33.000Z","dependencies_parsed_at":"2025-02-01T07:31:37.600Z","dependency_job_id":null,"html_url":"https://github.com/Yaser-123/Infozap","commit_stats":null,"previous_names":["yaser-123/infozap"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yaser-123%2FInfozap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yaser-123%2FInfozap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yaser-123%2FInfozap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Yaser-123%2FInfozap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Yaser-123","download_url":"https://codeload.github.com/Yaser-123/Infozap/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246180923,"owners_count":20736460,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup","chromedriver","llama3","ollama","python","selenium","webscraping"],"created_at":"2025-02-04T13:03:30.158Z","updated_at":"2026-05-10T19:15:58.454Z","avatar_url":"https://github.com/Yaser-123.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AI Web Scraper with Streamlit\n\nThis project is a web scraping tool powered by AI, built using Streamlit. It allows users to scrape the content of a website, clean and extract the relevant DOM content, and then parse the content using an AI model (Ollama) to answer specific questions or extract specific information.\n\n## Features\n\n- **Web Scraping**: Scrape the DOM content of any website by providing its URL.\n- **Content Cleaning**: Extract and clean the body content of the website for better readability.\n- **AI-Powered Parsing**: Use an AI model (Ollama) to parse the scraped content and answer user-defined questions or extract specific information.\n- **Streamlit UI**: A user-friendly interface to interact with the scraper and parser.\n\n## Demo\n\nHere are some screenshots of the application in action:\n\n---\n![Dashboard Overview](Images/img-1.png)\n---\n![Dashboard Overview](Images/img-2.png)\n---\n![Dashboard Overview](Images/img-3.png)\n---\n![Dashboard Overview](Images/img-4.png)\n---\n![Dashboard Overview](Images/img-5.png)\n---\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"Images/img-6.png\"/\u003e\n\u003c/p\u003e\n\n## Installation\n\n1. **Clone the Repository**:\n   ```bash\n   git clone https://github.com/your-username/ai-web-scraper.git\n   cd infozap\n2. **Set Up a Virtual Environment (Optional but Recommended)**:\n```\npython -m venv venv\nsource venv/bin/activate  # On Windows, use `venv\\Scripts\\activate`\n```\n3. **Install Dependencies**:\n```\npip install -r requirements.txt\n```\n\n4. **Run the Streamlit App**:\n```\nstreamlit run app.py\nAccess the App:\nOpen your browser and navigate to http://localhost:8501 to use the AI Web Scraper.\n```\n## Usage\n\n1. **Enter a Website URL**:\n   - Input the URL of the website you want to scrape in the provided text box.\n\n2. **Scrape the Website**:\n   - Click the \"Scrape Website\" button to extract and clean the DOM content of the website.\n\n3. **View the Scraped Content**:\n   - Once the scraping is complete, you can view the cleaned DOM content in an expandable text box.\n\n4. **Parse the Content**:\n   - Describe what you want to parse or extract from the scraped content in the text area provided.\n   - Click the \"Parse Content\" button to let the AI model process the content and provide the results.\n\n## Dependencies\n\n- **Streamlit**: For building the web interface.\n- **BeautifulSoup**: For web scraping and DOM manipulation.\n- **Ollama**: For parsing and extracting information using AI.\n\n## Project Structure\n\n```plaintext\nai-web-scraper/\n├── app.py                # Main Streamlit application\n├── scrape.py             # Functions for scraping and cleaning website content\n├── parse.py              # Functions for parsing content using Ollama\n├── requirements.txt      # List of dependencies\n├── README.md             # This file\n```\n\n## Contributing\n\nContributions are welcome! If you'd like to contribute, please follow these steps:\n\n1. Fork the repository.\n2. Create a new branch for your feature or bugfix.\n3. Commit your changes.\n4. Push your branch and submit a pull request.\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyaser-123%2Finfozap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyaser-123%2Finfozap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyaser-123%2Finfozap/lists"}