{"id":27017899,"url":"https://github.com/rogendo/web-scraping","last_synced_at":"2026-05-05T07:31:38.247Z","repository":{"id":283610460,"uuid":"934230200","full_name":"Rogendo/Web-Scraping","owner":"Rogendo","description":"This repository contains automation dockerized and undockerized scripts  that scrape various websites.","archived":false,"fork":false,"pushed_at":"2025-04-16T08:11:37.000Z","size":6531,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-16T10:53:50.099Z","etag":null,"topics":["beautifulsoup4","scrapy","selenium","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Rogendo.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-17T13:43:14.000Z","updated_at":"2025-04-16T08:11:43.000Z","dependencies_parsed_at":null,"dependency_job_id":"54ce43a1-c7f5-468e-9aa5-3ef53919ed58","html_url":"https://github.com/Rogendo/Web-Scraping","commit_stats":null,"previous_names":["rogendo/web-scraping"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Rogendo/Web-Scraping","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rogendo%2FWeb-Scraping","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rogendo%2FWeb-Scraping/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rogendo%2FWeb-Scraping/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rogendo%2FWeb-Scraping/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Rogendo","download_url":"https://codeload.github.com/Rogendo/Web-Scraping/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rogendo%2FWeb-Scraping/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32640533,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-04T10:08:07.713Z","status":"online","status_checked_at":"2026-05-05T02:00:06.033Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup4","scrapy","selenium","webscraping"],"created_at":"2025-04-04T16:35:26.793Z","updated_at":"2026-05-05T07:31:38.241Z","avatar_url":"https://github.com/Rogendo.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Web Scraping\r\n\r\nWelcome to the Web Scraping repository! This repository contains various web scraping scripts written in Python to extract data from different websites.\r\n\r\n## Table of Contents\r\n\r\n- [Introduction](#introduction)\r\n- [Installation](#installation)\r\n- [Usage](#usage)\r\n- [Directories](#directories)\r\n- [Contributing](#contributing)\r\n- [License](#license)\r\n\r\n## Introduction\r\n\r\nThis repository contains a collection of web scraping scripts that can be used to extract data from various websites. Each script is designed to scrape specific data and save it in a structured format such as CSV or JSON.\r\n\r\n## Installation\r\n\r\nTo get started, clone the repository and set up a virtual environment:\r\n\r\n```sh\r\ngit clone https://github.com/Rogendo/Web-Scraping.git\r\ncd Web-Scraping\r\npython -m venv venv\r\nsource venv/bin/activate  # On Windows, use `venv\\Scripts\\activate`\r\npip install -r requirements.txt\r\n\r\n```\r\n\r\n## Directories\r\n - Notebooks\r\n - Docker\r\n - Scripts\r\n\r\n \r\n### Notebooks\r\nThe Notebooks directory contains Jupyter notebooks that implement various web scraping tasks. Notebooks are great workspaces for developing and testing scraping code.\r\n\r\nDirectory: Notebooks\r\nDescription: Contains Jupyter notebooks for various scraping tasks.\r\nUsage: Open the notebooks using Jupyter and run the cells to execute the scraping tasks.\r\n\r\n### Docker\r\n\r\nThe Docker directory contains Dockerized scripts that automate the scraping tasks and run them in Docker containers. Docker containers provide isolated environments, ensuring that your web scraping scripts run consistently across different machines without conflicts from varying system configurations.\r\nDocker too helps with:\r\n\r\n- Dependency Management: Docker containers encapsulate all dependencies, libraries, and tools required for your web scraping scripts, eliminating the need to install and configure them on each machine.\r\n\r\n- Cross-Platform Compatibility: Docker containers can run on any system that supports Docker, making it easy to move your web scraping setup between development, testing, and production environments.\r\n\r\n- Cloud Deployment: Docker containers can be deployed on cloud platforms, allowing you to scale your web scraping operations based on demand.\r\n\r\n- Scalability: Horizontal Scaling: Docker makes it easy to scale your web scraping operations by adding more containers to handle increased load. This is particularly useful for large-scale scraping projects.\r\n- Load Balancing: Docker Swarm and Kubernetes can be used to manage and distribute the load across multiple containers, ensuring efficient resource utilization.\r\n\r\n- Quick Setup: Docker allows for rapid setup of development environments, enabling you to start scraping quickly without spending time on environment configuration.\r\n- Version Control: Docker images can be versioned, making it easy to track changes and roll back to previous versions if necessary.\r\n- Resource Efficiency:\r\nLightweight: Docker containers are lightweight compared to virtual machines, leading to faster startup times and lower resource consumption.\r\n- Optimized Resource Usage: Containers share the host system's kernel, making them more efficient in terms of CPU and memory usage.\r\n- Sandboxing: Docker containers provide a level of security by isolating applications from the host system and other containers.\r\n- Reproducible Builds: Docker images can be built and shared, ensuring that anyone can reproduce the same environment and run the web scraping scripts without issues.\r\n\r\n#### Usage: Build and run the Docker containers using the provided Dockerfiles.\r\n\r\n\r\n### Scripts\r\nThe Scripts directory contains standalone Python scripts that extract data from various websites.\r\n\r\n##### Usage: \r\nRun the scripts directly using Python.\r\n```sh\r\n    python -m venv venv\r\n    source venv/bin/activate  # On Windows, use `venv\\Scripts\\activate`\r\n    pip install -r requirements.txt\r\n\r\n    python script_name.py\r\n```\r\n\r\n## Contributing\r\nContributions are welcome! If you have any improvements or new scripts to add, please open a pull request. Make sure to follow the coding standards and include a detailed description of your changes.\r\n\r\n## License\r\nThis project is licensed under the MIT License. See the LICENSE file for more details.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frogendo%2Fweb-scraping","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frogendo%2Fweb-scraping","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frogendo%2Fweb-scraping/lists"}