https://github.com/luminati-io/manage-failed-python-requests
Handle failed HTTP requests in Python using retry strategies with HTTPAdapter, Tenacity, and custom logic to improve web scraping reliability.
https://github.com/luminati-io/manage-failed-python-requests
headless-browser http python requests scraping-browser status-codes tenacity web-scraping web-unblocker
Last synced: 2 months ago
JSON representation
Handle failed HTTP requests in Python using retry strategies with HTTPAdapter, Tenacity, and custom logic to improve web scraping reliability.
- Host: GitHub
- URL: https://github.com/luminati-io/manage-failed-python-requests
- Owner: luminati-io
- Created: 2025-02-11T09:52:09.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-02-18T13:22:41.000Z (11 months ago)
- Last Synced: 2025-03-22T07:02:02.095Z (10 months ago)
- Topics: headless-browser, http, python, requests, scraping-browser, status-codes, tenacity, web-scraping, web-unblocker
- Homepage: https://brightdata.com/blog/web-data/retry-failed-requests-python
- Size: 22.5 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Managing Failed Requests in Python
[](https://brightdata.com/)
This guide explains how to handle failed HTTP requests in Python with retry strategies and custom logic.
- [What Are Status Codes?](#what-are-status-codes)
- [Retry Strategies](#retry-strategies)
- [HTTPAdapter](#httpadapter)
- [Tenacity](#tenacity)
- [Building a Custom Retry Mechanism](#building-a-custom-retry-mechanism)
- [Conclusion](#conclusion)
## What Are Status Codes?
Status codes are standardized three-digit numbers used in various protocols to indicate the result of a request. According to [Mozilla](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status), HTTP status codes can be broken down into the following categories:
- **100-199**: Informational responses
- **200-299**: Successful responses
- **300-399**: Redirection messages
- **400-499**: Client error messages
- **500-599**: Server error messages
When developing client-side applications like web scrapers, it's crucial to pay attention to status codes in the 400 and 500 ranges. Codes in the 400s typically indicate client-side errors, such as authentication failures, rate limiting, timeouts, or the well-known _404: Not Found error_. Meanwhile, status codes in the 500s signal server-side issues that may require retries or alternative handling strategies.
Here is a list of common error codes (taken from Mozilla’s [official documentation](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#client_error_responses)) you will encounter when performing web scraping:
| **Status Code** | **Meaning** | **Description** |
| --- | --- | --- |
| 400 | Bad Request | Check your request format |
| [401](https://brightdata.com/faqs/proxy-errors/error-401-how-to-avoid) | Unauthorized | Check your API key |
| [403](https://brightdata.com/faqs/proxy-errors/403-status-error-how-to-avoid) | Forbidden | You cannot access this data |
| 404 | Not Found | Site/Endpoint doesn’t exist |
| [408](https://brightdata.com/faqs/proxy-errors/error-408-how-to-avoid) | Request Timeout | Request timed out, try again |
| [429](https://brightdata.com/faqs/proxy-errors/429-error-how-to-avoid) | Too Many Requests | Slow down your requests |
| 500 | Internal Server Error | Generic server error, retry request |
| 501 | Not Implemented | Server doesn’t support this yet |
| [502](https://brightdata.com/faqs/proxy-errors/502-error-how-to-avoid) | Bad Gateway | Failed response from an upstream server |
| [503](https://brightdata.com/faqs/proxy-errors/503-error-how-to-avoid) | Service Unavailable | Server is temporarily down, retry later |
| [504](https://brightdata.com/faqs/proxy-errors/504-error-how-to-avoid) | Gateway Timeout | Timed out waiting for an upstream server |
## Retry Strategies
When implementing a retry mechanism in Python, you can leverage pre-built libraries like `HTTPAdapter` and `Tenacity`. Alternatively, you may choose to develop custom retry logic based on your specific needs.
A well-designed retry strategy should include both a retry limit and a backoff mechanism. The retry limit prevents infinite loops, ensuring that failed requests don’t continue indefinitely. A backoff strategy, which gradually increases the delay between retries, helps prevent excessive requests that could lead to being blocked or overloading the server.
- **Retry Limits**: It’s essential to define a retry limit. After a specified number of attempts (X), the scraper should stop retrying to avoid infinite loops.
- **Backoff Algorithm**: A gradual increase in wait time between retries helps prevent overwhelming the server. Start with a small delay, such as 0.3 seconds, then incrementally increase it to 0.6 seconds, 1.2 seconds, and so forth.
## HTTPAdapter
With `HTTPAdapter`, we need to configure three things: `total`, `backoff_factor`, and `status_forcelist`. `allowed_methods` isn’t a requirement per se, but it helps define our retry conditions and thus makes our code safer. In the code below, we use [httpbin](https://httpbin.org/) to automatically force an error and trigger the retry logic.
```python
import logging
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)
# Create a session
session = requests.Session()
# Configure retry settings
retry = Retry(
total=3, # Maximum retries
backoff_factor=0.3, # Time between retries (exponential backoff)
status_forcelist=(429, 500, 502, 503, 504), # Status codes to trigger a retry
allowed_methods={"GET", "POST"} # Allow retries for GET and POST
)
# Mount the adapter with our custom settings
adapter = HTTPAdapter(max_retries=retry)
session.mount("http://", adapter)
session.mount("https://", adapter)
# Function to make a request and test retry logic
def make_request(url, method="GET"):
try:
logger.info(f"Making a {method} request to {url} with retry logic...")
if method == "GET":
response = session.get(url)
elif method == "POST":
response = session.post(url)
else:
logger.error("Unsupported HTTP method: %s", method)
return
response.raise_for_status()
logger.info("✅ Request successful: %s", response.status_code)
except requests.exceptions.RequestException as e:
logger.error("❌ Request failed after retries: %s", e)
logger.info("Retries attempted: %d", len(response.history) if response else 0)
# Test Cases
make_request("https://httpbin.org/status/200") # ✅ Should succeed without retries
make_request("https://httpbin.org/status/500") # ❌ Should retry 3 times and fail
make_request("https://httpbin.org/status/404") # ❌ Should fail immediately (no retries)
make_request("https://httpbin.org/status/500", method="POST") # ❌ Should retry 3 times and fail
```
Once you created a `Session` object, do this:
- Create a `Retry` object and define:
- `total`: The maximum limit for retrying a request.
- `backoff_factor`: Time to wait between retries. This adjusts exponentially as our retries increase.
- `status_forcelist`: A list of bad status codes. Any codes in this list will automatically trigger a retry.
- Create an `HTTPAdapter` object with our `retry` variable: `adapter = HTTPAdapter(max_retries=retry)`.
- Once you’ve created the `adapter`, mount it to the HTTP and HTTPS methods using `session.mount()`.
When you run this code, the three retries (`total=3`) will run, and then you’ll get the following output.
```
2024-06-10 12:00:00 - INFO - Making a GET request to https://httpbin.org/status/200 with retry logic...
2024-06-10 12:00:00 - INFO - ✅ Request successful: 200
2024-06-10 12:00:01 - INFO - Making a GET request to https://httpbin.org/status/500 with retry logic...
2024-06-10 12:00:02 - ERROR - ❌ Request failed after retries: 500 Server Error: INTERNAL SERVER ERROR for url: ...
2024-06-10 12:00:02 - INFO - Retries attempted: 3
2024-06-10 12:00:03 - INFO - Making a GET request to https://httpbin.org/status/404 with retry logic...
2024-06-10 12:00:03 - ERROR - ❌ Request failed after retries: 404 Client Error: NOT FOUND for url: ...
2024-06-10 12:00:03 - INFO - Retries attempted: 0
2024-06-10 12:00:04 - INFO - Making a POST request to https://httpbin.org/status/500 with retry logic...
2024-06-10 12:00:05 - ERROR - ❌ Request failed after retries: 500 Server Error: INTERNAL SERVER ERROR for url: ...
2024-06-10 12:00:05 - INFO - Retries attempted: 3
```
## Tenacity
You can also use [`Tenacity`](https://tenacity.readthedocs.io/en/latest/), a popular open source retry library for Python. It’s not limited to HTTP, but it gives you an expressive way to implement retries.
Start with installing `Tenacity`:
```bash
pip install tenacity
```
Once installed, create a _decorator_ and use it to wrap a requests function. With the `@retry` decorator, add the `stop`, `wait`, and `retry` arguments.
```python
import logging
import requests
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type, retry_if_result, RetryError
# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)
# Define a retry strategy
@retry(
stop=stop_after_attempt(3), # Retry up to 3 times
wait=wait_exponential(multiplier=0.3), # Exponential backoff
retry=(
retry_if_exception_type(requests.exceptions.RequestException) | # Retry on request failures
retry_if_result(lambda r: r.status_code in {500, 502, 503, 504}) # Retry on specific HTTP status codes
),
)
def make_request(url):
logger.info("Making a request with retry logic to %s...", url)
response = requests.get(url)
response.raise_for_status()
logger.info("✅ Request successful: %s", response.status_code)
return response
# Attempt to make the request
try:
make_request("https://httpbin.org/status/500") # Test with a failing status code
except RetryError as e:
logger.error("❌ Request failed after all retries: %s", e)
```
The logic and settings here are very similar to the first example with `HTTPAdapter`:
- `stop=stop_after_attempt(3)`: This tells `tenacity` to give up after 3 failed retries.
- `wait=wait_exponential(multiplier=0.3)` uses the same wait that we used before. It also backs off exponentially, just like before.
- `retry=retry_if_exception_type(requests.exceptions.RequestException)` tells `tenacity` to use this logic every time a `RequestException` occurs.
- `make_request()` makes a request to our error endpoint. It receives all of the traits from the decorator you created above it.
When you run this code, you get a similar output:
```
2024-06-10 12:00:00 - INFO - Making a request with retry logic to https://httpbin.org/status/500...
2024-06-10 12:00:01 - WARNING - Retrying after 0.3 seconds...
2024-06-10 12:00:01 - INFO - Making a request with retry logic to https://httpbin.org/status/500...
2024-06-10 12:00:02 - WARNING - Retrying after 0.6 seconds...
2024-06-10 12:00:02 - INFO - Making a request with retry logic to https://httpbin.org/status/500...
2024-06-10 12:00:03 - ERROR - ❌ Request failed after all retries: RetryError[...]
```
## Building a Custom Retry Mechanism
You can also create a custom retry mechanism, which is often the best approach when working with specialized code. With a relatively small amount of code, you can achieve the same functionality provided by existing libraries while tailoring it to your specific needs.
The code below demonstrates how to import `sleep` for the exponential backoff, set the configuration (`total`, `backoff_factor` and `bad_codes`), and use a `while` loop to hold the retry logic. `while`you still have tries and you haven’t succeeded, attempt the request.
```python
import logging
import requests
from time import sleep
# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)
# Create a session
session = requests.Session()
# Define retry settings
TOTAL_RETRIES = 3
INITIAL_BACKOFF = 0.3
BAD_CODES = {429, 500, 502, 503, 504}
def make_request(url):
current_tries = 0
backoff = INITIAL_BACKOFF
success = False
while current_tries < TOTAL_RETRIES and not success:
try:
logger.info("Making a request with retry logic to %s...", url)
response = session.get(url)
if response.status_code in BAD_CODES:
raise requests.exceptions.HTTPError(f"Received {response.status_code}, triggering retry")
response.raise_for_status()
logger.info("✅ Request successful: %s", response.status_code)
success = True
return response
except requests.exceptions.RequestException as e:
logger.error("❌ Request failed: %s, retries left: %d", e, TOTAL_RETRIES - current_tries - 1)
if current_tries < TOTAL_RETRIES - 1:
logger.info("⏳ Retrying in %.1f seconds...", backoff)
sleep(backoff)
backoff *= 2 # Exponential backoff
current_tries += 1
logger.error("🚨 Request failed after all retries.")
return None
# Test Cases
make_request("https://httpbin.org/status/500") # ❌ Should retry 3 times and fail
make_request("https://httpbin.org/status/200") # ✅ Should succeed without retries
```
The actual logic here is handled by a simple `while` loop.
- If `response.status_code` is in the list of `bad_codes`, the script throws an exception.
- If a request fails, the script:
- Prints an error message to the console.
- `sleep(backoff_factor)` waits before sending the next request.
- `backoff_factor = backoff_factor * 2` doubles our `backoff_factor` for the next try.
- Increments `current_tries` so it doesn’t stay in the loop indefinitely.
Here’s the output from the custom retry code.
```
2024-06-10 12:00:00 - INFO - Making a request with retry logic to https://httpbin.org/status/500...
2024-06-10 12:00:01 - ERROR - ❌ Request failed: Received 500, triggering retry, retries left: 2
2024-06-10 12:00:01 - INFO - ⏳ Retrying in 0.3 seconds...
2024-06-10 12:00:02 - INFO - Making a request with retry logic to https://httpbin.org/status/500...
2024-06-10 12:00:03 - ERROR - ❌ Request failed: Received 500, triggering retry, retries left: 1
2024-06-10 12:00:03 - INFO - ⏳ Retrying in 0.6 seconds...
2024-06-10 12:00:04 - INFO - Making a request with retry logic to https://httpbin.org/status/500...
2024-06-10 12:00:05 - ERROR - ❌ Request failed: Received 500, triggering retry, retries left: 0
2024-06-10 12:00:05 - ERROR - 🚨 Request failed after all retries.
```
## Conclusion
To avoid all kinds of failed requests, we’ve developed products like the [Web Unlocker API](https://brightdata.com/products/web-unlocker) and [Scraping Browser](https://brightdata.com/products/scraping-browser). These tools automatically handle anti-bot measures, CAPTCHA challenges, and IP blocks, ensuring seamless and efficient web scraping for even the most challenging websites.
Sign up now and start your free trial today.