{"id":18244175,"url":"https://github.com/nouraalgohary/web-scraping","last_synced_at":"2026-04-15T15:33:30.496Z","repository":{"id":213399760,"uuid":"734041864","full_name":"NouraAlgohary/Web-Scraping","owner":"NouraAlgohary","description":null,"archived":false,"fork":false,"pushed_at":"2023-12-21T07:26:41.000Z","size":58,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-19T04:07:39.060Z","etag":null,"topics":["pandas","selenium","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NouraAlgohary.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-20T18:10:05.000Z","updated_at":"2023-12-21T07:23:35.000Z","dependencies_parsed_at":"2023-12-20T19:44:44.412Z","dependency_job_id":"78a22c86-9c1b-44e7-af51-b861c10ea2bc","html_url":"https://github.com/NouraAlgohary/Web-Scraping","commit_stats":null,"previous_names":["nouraalgohary/books-to-scrape-"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/NouraAlgohary/Web-Scraping","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NouraAlgohary%2FWeb-Scraping","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NouraAlgohary%2FWeb-Scraping/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NouraAlgohary%2FWeb-Scraping/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NouraAlgohary%2FWeb-Scraping/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NouraAlgohary","download_url":"https://codeload.github.com/NouraAlgohary/Web-Scraping/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NouraAlgohary%2FWeb-Scraping/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270642169,"owners_count":24621322,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-15T02:00:12.559Z","response_time":110,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pandas","selenium","webscraping"],"created_at":"2024-11-05T09:15:41.711Z","updated_at":"2026-04-15T15:33:30.425Z","avatar_url":"https://github.com/NouraAlgohary.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Toscrape\n\n🛠️ Web Scraping Exploration with Selenium\n\nTake a gentle dive into the basics of web scraping with this repository! Using Selenium, the project walks you through extracting data from books and quotes websites. \nIt's a simple yet effective exercise to get hands-on experience with web scraping techniques. The data collected is neatly organized into a CSV file, offering a practical glimpse into data processing. \nWhether you're new to web scraping or just looking for a straightforward example, this repository provides a humble starting point for your exploration. Happy coding!\n\n\n\n## [1. Books to Scrape](http://books.toscrape.com/)\n![image](https://github.com/NouraAlgohary/Web-Scraping/assets/103903785/7a5c0b19-e620-4531-8714-6cc1c8b9fe55)\n\n## [2. Quotes to Scrape](https://quotes.toscrape.com/)\n![image](https://github.com/NouraAlgohary/Web-Scraping/assets/103903785/d34bbf5d-5799-47ec-8309-2f2f3911e199)\n\n## Files \n- [booksToScrape.csv](https://github.com/NouraAlgohary/Web-Scraping/blob/main/booksToScrape.csv) Books data as a CSV file\n- [quotesToScrape.csv](https://github.com/NouraAlgohary/Web-Scraping/blob/main/QuotesToScrape.csv) Quotes data as a CSV file\n- [books_web_scraping.py](https://github.com/NouraAlgohary/Web-Scraping/blob/main/books_web_scraping.py) Books website web scraping code\n- [quotes_web_scraping.py](https://github.com/NouraAlgohary/Web-Scraping/blob/main/quotes_web_scraping.py) Quotes website web scraping code\n\n## Steps\n### Setting Up Libraries\nSelenium is a powerful web automation library for Python, widely used for web scraping and testing.\u003c/br\u003e\n```pip install selenium```\u003c/br\u003e\nPandas is a versatile data manipulation library in Python, commonly employed for data analysis and storage, such as saving data to CSV files.\u003c/br\u003e\n```pip install pandas```\n\n### Getting Started\n1. Create a webdriver instance\u003c/br\u003e\n```\ndriver = webdriver.Chrome()\nurl = \"http://books.toscrape.com/\"\ndriver.get(url)\n```\n2. Chrome must be loaded with the message\u003c/br\u003e\n```Chrome is being controlled by automated test software.```\n### Explicit Waits\nUse explicit waits for a smoother web scraping experience:\u003c/br\u003e\n```\nfrom selenium.webdriver.support.ui import WebDriverWait\nfrom selenium.webdriver.support import expected_conditions as EC\n\ntry:\n            # Explicitly wait for the next page button to be present\n            WebDriverWait(driver, 20).until(EC.presence_of_element_located(next_page_button_locator))\n\n            # Explicitly wait for the next page button to be clickable\n            WebDriverWait(driver, 20).until(EC.element_to_be_clickable(next_page_button_locator))\n\n            # Find the next page button and click it\n            next_page_button = driver.find_element(*next_page_button_locator)\n            next_page_button.click()\n\n\n        except Exception as e:\n            print(f\"Exception: {type(e).__name__} - {e}. Refreshing the page and retrying click.\")\n            driver.refresh()\n```\n\n### Data Extraction\nUse various locators using By for element identification:\u003c/br\u003e\n``` By.```\n```\nfrom selenium.webdriver.common.by import By\n```\n- ```find_element(By.CSS_SELECTOR, some_string)``` Finds element using CSS selector. It performs the same tasks as the old one. ```find_element_by_css_selector```\n- ```find_element(By.XPATH, some_string)``` Finds elment by XPATH instead of ```find_element_by_xpath```\n- ```find_element(By.CLASS_NAME, some_string)``` Finds element by Class Name as the old one did ```find_element_by_class_name```\n  These methods return an instance of ```WebElement```\n  \n#### WebElement\n- ```element.click()``` Clicking on the element\n- ```element.get_attribute(‘class’)``` Accessing attribute class, title...etc\n- - ```element.text``` Accessing text element\n \n### Store data\nSave a list of lists as a data frame using Pandas\u003c/br\u003e\n```\ndf = pd.DataFrame(books_list)\n```\nSave the data frame to a CSV file for further use\u003c/br\u003e\n```\ndf.to_csv('path-to-folder/booksToScrape.csv', index=True)\n```\n\n### Finally\nClose the browser\n```\ndriver.quit()\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnouraalgohary%2Fweb-scraping","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnouraalgohary%2Fweb-scraping","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnouraalgohary%2Fweb-scraping/lists"}