An open API service indexing awesome lists of open source software.

https://github.com/oxylabs/web-scraping-selenium-python

Web Scraping with Python Selenium: Tutorial for Beginners
https://github.com/oxylabs/web-scraping-selenium-python

github-python json-database-python python-ecommerce python-web-crawler scraper-python selenium-web-scraper serp-api-python web-scraping web-scraping-python

Last synced: 15 days ago
JSON representation

Web Scraping with Python Selenium: Tutorial for Beginners

Awesome Lists containing this project

README

        

# Web Scraping with Python Selenium: Tutorial for Beginners

[![Oxylabs promo code](https://raw.githubusercontent.com/oxylabs/product-integrations/refs/heads/master/Affiliate-Universal-1090x275.png)](https://oxylabs.go2cloud.org/aff_c?offer_id=7&aff_id=877&url_id=112)

[![](https://dcbadge.vercel.app/api/server/eWsVUJrnG5)](https://discord.gg/GbxmdGhZjq)

[](https://github.com/topics/python) [](https://github.com/topics/selenium) [](https://github.com/topics/web-scraping)
- [Installing Selenium](#installing-selenium)
- [Testing](#testing)
- [Scraping with Selenium](#scraping-with-selenium)

In this article, we’ll cover an overview of web scraping with Selenium using a real-life example.

For a detailed tutorial on Selenium, see [our blog](https://oxylabs.io/blog/selenium-web-scraping).

## Installing Selenium

1. Create a virtual environment:

```sh
python3 -m venv .env
```

2. Install Selenium using pip:

```sh
pip install selenium
```

3. Install Selenium Web Driver. See [this page](https://www.selenium.dev/documentation/webdriver/getting_started/install_drivers/) for details.

## Testing

With virtual environment activated, enter IDLE by typing in `python3`. Enter the following command on IDLE:

```python
>>> from selenium.webdriver import Chrome

```

If there are no errors, move on to the next step. If there is an error, ensure that `chromedriver` is added to the PATH.

## Scraping with Selenium

Import required modules as follows:

```python
from selenium.webdriver import Chrome, ChromeOptions
from selenium.webdriver.common.by import By
```

Add the skeleton of the script as follows:

```python
def get_data(url) -> list:
...

def main():
...

if __name__ == '__main__':
main()
```

Create ChromeOptions object and set `headless` to `True`. Use this to create an instance of `Chrome`.

```python
browser_options = ChromeOptions()
browser_options.headless = True

driver = Chrome(options=browser_options)
```

Call the `driver.get` method to load a URL. After that, locate the link for the Humor section by link text and click it:

```python
driver.get(url)

element = driver.find_element(By.LINK_TEXT, "Humor")
element.click()
```

Create a CSS selector to find all books from this page. After that run a loop on the books and find the bookt title, price, stock availability. Use a dictionary to store one book information and add all these dictionaries to a list. See the code below:

```python
books = driver.find_elements(By.CSS_SELECTOR, ".product_pod")
data = []
for book in books:
title = book.find_element(By.CSS_SELECTOR, "h3 > a")
price = book.find_element(By.CSS_SELECTOR, ".price_color")
stock = book.find_element(By.CSS_SELECTOR, ".instock.availability")
book_item = {
'title': title.get_attribute("title"),
'price': price.text,
'stock': stock. text
}
data.append(book_item)

```

Lastly, return the `data` dictionary from this function.

For the complete code, see [main.py](src/main.py).

For a detailed tutorial on Selenium, see [our blog](https://oxylabs.io/blog/selenium-web-scraping).