https://github.com/oxylabs/aiohttp-proxy-integration
Python tutorial for implementing Residential Proxies with AIOHTTP
https://github.com/oxylabs/aiohttp-proxy-integration
aiohttp asyncio beautifulsoup bs4 github-python proxy-generator proxy-list proxy-list-github proxy-rotator proxy-site python python3 requests residential-proxy rotating-proxy scraping webproxy
Last synced: 28 days ago
JSON representation
Python tutorial for implementing Residential Proxies with AIOHTTP
- Host: GitHub
- URL: https://github.com/oxylabs/aiohttp-proxy-integration
- Owner: oxylabs
- Created: 2021-10-01T05:21:44.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2025-02-11T12:29:34.000Z (3 months ago)
- Last Synced: 2025-04-23T04:52:44.091Z (28 days ago)
- Topics: aiohttp, asyncio, beautifulsoup, bs4, github-python, proxy-generator, proxy-list, proxy-list-github, proxy-rotator, proxy-site, python, python3, requests, residential-proxy, rotating-proxy, scraping, webproxy
- Language: Python
- Homepage:
- Size: 68.4 KB
- Stars: 9
- Watchers: 0
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Integrating Oxylabs' Residential Proxies with AIOHTTP
[](https://oxylabs.go2cloud.org/aff_c?offer_id=7&aff_id=877&url_id=112)
[](https://discord.gg/GbxmdGhZjq)
[
](https://github.com/topics/python)
[](https://github.com/topics/web-scraping)
[](https://github.com/topics/residential-proxy)
[](https://github.com/topics/aiohttp)
[](https://github.com/topics/asyncio)
## Requirements for the Integration
For the integration to work you'll need to install `aiohttp` library, use `Python 3.6`
version or higher and Residential Proxies.
If you don't have `aiohttp` library,
you can install it by using `pip` command:```bash
pip install aiohttp
```You can get Residential Proxies here: https://oxy.yt/arWH
## Proxy Authentication
There are 2 ways to authenticate proxies with `aiohttp`.
The first way is to authorize and pass credentials along with the proxy URL
using `aiohttp.BasicAuth`:```python
import aiohttpUSER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
async def fetch():
async with aiohttp.ClientSession() as session:
proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
async with session.get(
"https://ip.oxylabs.io/location",
proxy="http://pr.oxylabs.io:7777",
proxy_auth=proxy_auth ,
) as resp:
print(await resp.text())
```The second one is by passing authentication credentials in proxy URL:
```python
import aiohttpUSER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"async def fetch():
async with aiohttp.ClientSession() as session:
async with session.get(
"https://ip.oxylabs.io/location",
proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
) as resp:
print(await resp.text())
```In order to use your own proxies, adjust `user` and `pass` fields with your
Oxylabs account credentials.## Testing Proxies
To see if the proxy is working, try visiting https://ip.oxylabs.io/location.
If everything is working correctly, it will return an IP address of a proxy
that you're currently using.## Sample Project: Extracting Data From Multiple Pages
To better understand how residential proxies can be utilized for asynchronous
data extracting operations, we wrote a sample project to scrape product listing
data and save the output to a `CSV` file. The proxy rotation allows us to send
multiple requests at once risk-free – meaning that we don't need to worry about
CAPTCHA or getting blocked. This makes the web scraping process extremely fast
and efficient – now you can extract data from thousands of products in a matter
of seconds!```python
import asyncio
import time
import sys
import osimport aiohttp
import pandas as pd
from bs4 import BeautifulSoupUSER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"# Generate a list of URLs to scrape.
url_list = [
f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
for page_num in range(1, 51)
]async def parse_data(text, results_list):
soup = BeautifulSoup(text, "lxml")
for product_data in soup.select("ol.row > li > article.product_pod"):
data = {
"title": product_data.select_one("h3 > a")["title"],
"url": product_data.select_one("h3 > a").get("href")[5:],
"product_price": product_data.select_one("p.price_color").text,
"stars": product_data.select_one("p")["class"][1],
}
results_list.append(data) # Fill results_list by reference.
print(f"Extracted data for a book: {data['title']}")async def fetch(session, sem, url, results_list):
async with sem:
async with session.get(
url,
proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
) as response:
await parse_data(await response.text(), results_list)async def create_jobs(results_list):
sem = asyncio.Semaphore(4)
async with aiohttp.ClientSession() as session:
await asyncio.gather(
*[fetch(session, sem, url, results_list) for url in url_list]
)if __name__ == "__main__":
results = []
start = time.perf_counter()# Different EventLoopPolicy must be loaded if you're using Windows OS.
# This helps to avoid "Event Loop is closed" error.
if sys.platform.startswith("win") and sys.version_info.minor >= 8:
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())try:
asyncio.run(create_jobs(results))
except Exception as e:
print(e)
print("We broke, but there might still be some results")print(
f"\nTotal of {len(results)} products from {len(url_list)} pages "
f"gathered in {time.perf_counter() - start:.2f} seconds.",
)
df = pd.DataFrame(results)
df["url"] = df["url"].map(
lambda x: "".join(["https://books.toscrape.com/catalogue", x])
)
filename = "scraped-books.csv"
df.to_csv(filename, encoding="utf-8-sig", index=False)
print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")
```If you want to test the project's script by yourself, you'll need to install
some additional packages. To do that, simply download `requirements.txt` file
and use `pip` command:```bash
pip install -r requirements.txt
```If you're having any trouble integrating proxies with `aiohttp` and this guide
didn't help you - feel free to contact Oxylabs customer support at [email protected].