https://github.com/oxylabs/aiohttp-proxy-integration

Python tutorial for implementing Residential Proxies with AIOHTTP
https://github.com/oxylabs/aiohttp-proxy-integration

aiohttp asyncio beautifulsoup bs4 github-python proxy-generator proxy-list proxy-list-github proxy-rotator proxy-site python python3 requests residential-proxy rotating-proxy scraping webproxy

Last synced: 2 months ago
JSON representation

Python tutorial for implementing Residential Proxies with AIOHTTP

Host: GitHub
URL: https://github.com/oxylabs/aiohttp-proxy-integration
Owner: oxylabs
Created: 2021-10-01T05:21:44.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2025-02-11T12:29:34.000Z (5 months ago)
Last Synced: 2025-04-23T04:52:44.091Z (2 months ago)
Topics: aiohttp, asyncio, beautifulsoup, bs4, github-python, proxy-generator, proxy-list, proxy-list-github, proxy-rotator, proxy-site, python, python3, requests, residential-proxy, rotating-proxy, scraping, webproxy
Language: Python
Homepage:
Size: 68.4 KB
Stars: 9
Watchers: 0
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Integrating Oxylabs' Residential Proxies with AIOHTTP

[![Oxylabs promo code](https://raw.githubusercontent.com/oxylabs/product-integrations/refs/heads/master/Affiliate-Universal-1090x275.png)](https://oxylabs.go2cloud.org/aff_c?offer_id=7&aff_id=877&url_id=112)

[![](https://dcbadge.vercel.app/api/server/eWsVUJrnG5)](https://discord.gg/GbxmdGhZjq)

[](https://github.com/topics/python) 

[](https://github.com/topics/web-scraping) 

[](https://github.com/topics/residential-proxy) 

[](https://github.com/topics/aiohttp) 

[](https://github.com/topics/asyncio)

## Requirements for the Integration

For the integration to work you'll need to install `aiohttp` library, use `Python 3.6` 

version or higher and Residential Proxies. 
 If you don't have `aiohttp` library, 

you can install it by using `pip` command:

```bash 

pip install aiohttp

```

You can get Residential Proxies here: https://oxy.yt/arWH

## Proxy Authentication

There are 2 ways to authenticate proxies with `aiohttp`.


The first way is to authorize and pass credentials along with the proxy URL

using `aiohttp.BasicAuth`:

```python

import aiohttp

USER = "user"

PASSWORD = "pass"

END_POINT = "pr.oxylabs.io:7777"

 

async def fetch():

    async with aiohttp.ClientSession() as session:

        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)

        async with session.get(

                "https://ip.oxylabs.io/location", 

                proxy="http://pr.oxylabs.io:7777", 

                proxy_auth=proxy_auth ,

        ) as resp:

            print(await resp.text())

```

The second one is by passing authentication credentials in proxy URL:

```python

import aiohttp

USER = "user"

PASSWORD = "pass"

END_POINT = "pr.oxylabs.io:7777"

async def fetch():

    async with aiohttp.ClientSession() as session:

        async with session.get(

                "https://ip.oxylabs.io/location", 

                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",

        ) as resp: 

            print(await resp.text())

```

In order to use your own proxies, adjust `user` and `pass` fields with your 

Oxylabs account credentials.

## Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io/location. 

If everything is working correctly, it will return an IP address of a proxy 

that you're currently using.

## Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous 

data extracting operations, we wrote a sample project to scrape product listing 

data and save the output to a `CSV` file. The proxy rotation allows us to send 

multiple requests at once risk-free – meaning that we don't need to worry about 

CAPTCHA or getting blocked. This makes the web scraping process extremely fast 

and efficient – now you can extract data from thousands of products in a matter 

of seconds!

```python

import asyncio

import time

import sys

import os

import aiohttp

import pandas as pd

from bs4 import BeautifulSoup

USER = "user"

PASSWORD = "pass"

END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.

url_list = [

    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"

    for page_num in range(1, 51)

]

async def parse_data(text, results_list):

    soup = BeautifulSoup(text, "lxml")

    for product_data in soup.select("ol.row > li > article.product_pod"):

        data = {

            "title": product_data.select_one("h3 > a")["title"],

            "url": product_data.select_one("h3 > a").get("href")[5:],

            "product_price": product_data.select_one("p.price_color").text,

            "stars": product_data.select_one("p")["class"][1],

        }

        results_list.append(data)  # Fill results_list by reference.

        print(f"Extracted data for a book: {data['title']}")

async def fetch(session, sem, url, results_list):

    async with sem:

        async with session.get(

            url,

            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",

        ) as response:

            await parse_data(await response.text(), results_list)

async def create_jobs(results_list):

    sem = asyncio.Semaphore(4)

    async with aiohttp.ClientSession() as session:

        await asyncio.gather(

            *[fetch(session, sem, url, results_list) for url in url_list]

        )

if __name__ == "__main__":

    results = []

    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.

    # This helps to avoid "Event Loop is closed" error.

    if sys.platform.startswith("win") and sys.version_info.minor >= 8:

        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:

        asyncio.run(create_jobs(results))

    except Exception as e:

        print(e)

        print("We broke, but there might still be some results")

    print(

        f"\nTotal of {len(results)} products from {len(url_list)} pages "

        f"gathered in {time.perf_counter() - start:.2f} seconds.",

    )

    df = pd.DataFrame(results)

    df["url"] = df["url"].map(

        lambda x: "".join(["https://books.toscrape.com/catalogue", x])

    )

    filename = "scraped-books.csv"

    df.to_csv(filename, encoding="utf-8-sig", index=False)

    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

```

If you want to test the project's script by yourself, you'll need to install 

some additional packages. To do that, simply download `requirements.txt` file 

and use `pip` command:

```bash 

pip install -r requirements.txt

```

If you're having any trouble integrating proxies with `aiohttp` and this guide 

didn't help you - feel free to contact Oxylabs customer support at [email protected].

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/oxylabs/aiohttp-proxy-integration

Awesome Lists containing this project

README