{"id":23661156,"url":"https://github.com/chouaib-629/webscraping","last_synced_at":"2026-05-08T04:45:42.854Z","repository":{"id":268024133,"uuid":"903056225","full_name":"chouaib-629/WebScraping","owner":"chouaib-629","description":"A collection of web scraping projects using Beautiful Soup, Selenium, and mixed approaches. Each project includes Python scripts and CSV files of the scraped data. Perfect for learning and experimenting with static and dynamic web scraping techniques.","archived":false,"fork":false,"pushed_at":"2024-12-20T17:05:31.000Z","size":109,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-29T04:56:06.368Z","etag":null,"topics":["automation","beautifulsoup","beautifulsoup4","browser-automation","csv","datacollection","dataextraction","dynamicwebsite","html-parser","jupyter-notebook","python","python-script","python3","selenium","staticwebsite","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chouaib-629.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-13T20:51:36.000Z","updated_at":"2024-12-20T20:20:38.000Z","dependencies_parsed_at":"2024-12-13T21:39:55.711Z","dependency_job_id":null,"html_url":"https://github.com/chouaib-629/WebScraping","commit_stats":null,"previous_names":["chouaib-629/webscraping"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chouaib-629%2FWebScraping","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chouaib-629%2FWebScraping/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chouaib-629%2FWebScraping/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chouaib-629%2FWebScraping/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chouaib-629","download_url":"https://codeload.github.com/chouaib-629/WebScraping/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239655162,"owners_count":19675364,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","beautifulsoup","beautifulsoup4","browser-automation","csv","datacollection","dataextraction","dynamicwebsite","html-parser","jupyter-notebook","python","python-script","python3","selenium","staticwebsite","webscraping"],"created_at":"2024-12-29T04:56:10.849Z","updated_at":"2025-12-04T07:30:18.662Z","avatar_url":"https://github.com/chouaib-629.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Web Scraping Projects\n\nThis repository contains a collection of websites that I have scraped for learning and experimentation purposes. The scraped data is organized into subfolders, where each subfolder corresponds to a specific website. These websites were scraped using different techniques, including Beautiful Soup (bs4) for static content, Selenium for dynamic content, and a mix of both for certain cases.\n\n## Folder Structure\n\n- **Main Folder**: Contains subfolders, each representing a scraped website.\n- **Subfolders**: Named based on the website they were scraped from. Each subfolder contains:\n  - The Python code used to scrape the website in two formats: `.py` and `.ipynb`.\n  - The CSV file containing the scraped data.\n\n## Scraping Techniques\n\n1. **Static Websites**:\n   - Scraped using **Beautiful Soup (bs4)**.\n   - These websites have static HTML content that can be directly accessed and parsed.\n\n2. **Dynamic Websites**:\n   - Scraped using **Selenium**.\n   - These websites load data dynamically through JavaScript, requiring a browser simulation to fetch the content.\n\n3. **Mixed Approach**:\n   - Some websites required a combination of **Selenium** and **bs4**.\n   - Selenium was used to render the dynamic content, and Beautiful Soup was used for parsing the HTML.\n\n## Classification of Websites\n\nBelow is the list of websites classified by the scraping technique used:\n\n### Beautiful Soup (bs4)\n\n- [yallakora](https://www.yallakora.com/match-center/%d9%85%d8%b1%d9%83%d8%b2-%d8%a7%d9%84%d9%85%d8%a8%d8%a7%d8%b1%d9%8a%d8%a7%d8%aa#nav-menu)\n- [wuzzuf](https://wuzzuf.net/jobs/egypt)\n- [wikipedia](https://en.wikipedia.org/wiki/List_of_largest_companies_in_the_United_States_by_revenue)\n\n### Selenium\n\n- [cookieClicker](https://orteil.dashnet.org/cookieclicker/)\n\n### Mixed Approach\n\n- [coinMarketCap](https://coinmarketcap.com/)\n\n## Requirements\n\nTo replicate or run the scraping scripts used in this project, the following Python libraries are required:\n\n- **Beautiful Soup**: `bs4`\n- **Selenium**\n- **Requests**\n- **lxml**\n- **html.parser**\n\nEnsure you have Python installed, along with the necessary libraries. For Selenium, download the appropriate browser driver (e.g., ChromeDriver for Google Chrome).\n\n## Getting Started\n\n1. Clone this repository to your local machine:\n\n   ```bash\n   git clone https://github.com/chouaib-629/WebScraping.git\n   ```\n\n2. Navigate to the desired subfolder to inspect the scraped data or associated scripts.\n\n## Notes\n\n- The data scraped from these websites is for educational purposes only. Please adhere to the terms and conditions of the websites before scraping.\n- The scripts and data are provided \"as is\" without warranty of any kind.\n\n## Author\n\nThis project is managed by a data science enthusiast and full-stack developer experimenting with web scraping techniques.\n\n## Contact Information\n\nFor questions or support, please contact [Me](mailto:chouaiba629@gmail.com).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchouaib-629%2Fwebscraping","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchouaib-629%2Fwebscraping","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchouaib-629%2Fwebscraping/lists"}