https://github.com/scrapingant/scrapingant-client-python
ScrapingAnt API client for Python.
https://github.com/scrapingant/scrapingant-client-python
crawler scraper scraping scrapingant scrapy webscraping
Last synced: 9 months ago
JSON representation
ScrapingAnt API client for Python.
- Host: GitHub
- URL: https://github.com/scrapingant/scrapingant-client-python
- Owner: ScrapingAnt
- Created: 2021-02-16T09:07:30.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-07-16T19:14:41.000Z (almost 2 years ago)
- Last Synced: 2025-01-07T19:11:58.464Z (over 1 year ago)
- Topics: crawler, scraper, scraping, scrapingant, scrapy, webscraping
- Language: Python
- Homepage: https://pypi.org/project/scrapingant-client/
- Size: 60.5 KB
- Stars: 36
- Watchers: 4
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ScrapingAnt API client for Python
[](https://badge.fury.io/py/scrapingant-client)
`scrapingant-client` is the official library to access [ScrapingAnt API](https://docs.scrapingant.com) from your Python
applications. It provides useful features like parameters encoding to improve the ScrapingAnt usage experience. Requires
python 3.6+.
- [Quick Start](#quick-start)
- [API token](#api-token)
- [API Reference](#api-reference)
- [Exceptions](#exceptions)
- [Examples](#examples)
- [Useful links](#useful-links)
## Quick Start
```python3
from scrapingant_client import ScrapingAntClient
client = ScrapingAntClient(token='')
# Scrape the example.com site
result = client.general_request('https://example.com')
print(result.content)
```
## Install
```shell
pip install scrapingant-client
```
If you need async support:
```shell
pip install scrapingant-client[async]
```
## API token
In order to get API token you'll need to register at [ScrapingAnt Service](https://app.scrapingant.com)
## API Reference
All public classes, methods and their parameters can be inspected in this API reference.
#### ScrapingAntClient(token)
Main class of this library.
| Param | Type |
|-------|---------------------|
| token | string |
* * *
#### Common arguments
- ScrapingAntClient.general_request
- ScrapingAntClient.general_request_async
- ScrapingAntClient.markdown_request
- ScrapingAntClient.markdown_request_async
https://docs.scrapingant.com/request-response-format#available-parameters
| Param | Type | Default |
|---------------------|----------------------------------------------------------------------------------------------------------------------------|------------|
| url | string | |
| method | string | GET |
| cookies | List[Cookie] | None |
| headers | List[Dict[str, str]] | None |
| js_snippet | string | None |
| proxy_type | ProxyType | datacenter |
| proxy_country | str | None |
| wait_for_selector | str | None |
| browser | boolean | True |
| return_page_source | boolean | False |
| data | same as [requests param 'data'](https://requests.readthedocs.io/en/latest/user/quickstart/#more-complicated-post-requests) | None |
| json | same as [requests param 'json'](https://requests.readthedocs.io/en/latest/user/quickstart/#more-complicated-post-requests) | None |
**IMPORTANT NOTE:** js_snippet will be encoded to Base64 automatically by the ScrapingAnt client library.
* * *
#### Cookie
Class defining cookie. Currently it supports only name and value
| Param | Type |
|-------|---------------------|
| name | string |
| value | string |
* * *
#### Response
Class defining response from API.
| Param | Type |
|-------------|----------------------------|
| content | string |
| cookies | List[Cookie] |
| status_code | int |
| text | string |
## Exceptions
`ScrapingantClientException` is base Exception class, used for all errors.
| Exception | Reason |
|--------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
| ScrapingantInvalidTokenException | The API token is wrong or you have exceeded the API calls request limit |
| ScrapingantInvalidInputException | Invalid value provided. Please, look into error message for more info |
| ScrapingantInternalException | Something went wrong with the server side code. Try again later or contact ScrapingAnt support |
| ScrapingantSiteNotReachableException | The requested URL is not reachable. Please, check it locally |
| ScrapingantDetectedException | The anti-bot detection system has detected the request. Please, retry or change the request settings. |
| ScrapingantTimeoutException | Got timeout while communicating with Scrapingant servers. Check your network connection. Please try later or contact support |
* * *
## Examples
### Sending custom cookies
```python3
from scrapingant_client import ScrapingAntClient
from scrapingant_client import Cookie
client = ScrapingAntClient(token='')
result = client.general_request(
'https://httpbin.org/cookies',
cookies=[
Cookie(name='cookieName1', value='cookieVal1'),
Cookie(name='cookieName2', value='cookieVal2'),
]
)
print(result.content)
# Response cookies is a list of Cookie objects
# They can be used in next requests
response_cookies = result.cookies
```
### Executing custom JS snippet
```python
from scrapingant_client import ScrapingAntClient
client = ScrapingAntClient(token='')
customJsSnippet = """
var str = 'Hello, world!';
var htmlElement = document.getElementsByTagName('html')[0];
htmlElement.innerHTML = str;
"""
result = client.general_request(
'https://example.com',
js_snippet=customJsSnippet,
)
print(result.content)
```
### Exception handling and retries
```python
from scrapingant_client import ScrapingAntClient, ScrapingantClientException, ScrapingantInvalidInputException
client = ScrapingAntClient(token='')
RETRIES_COUNT = 3
def parse_html(html: str):
... # Implement your data extraction here
parsed_data = None
for retry_number in range(RETRIES_COUNT):
try:
scrapingant_response = client.general_request(
'https://example.com',
)
except ScrapingantInvalidInputException as e:
print(f'Got invalid input exception: {{repr(e)}}')
break # We are not retrying if request params are not valid
except ScrapingantClientException as e:
print(f'Got ScrapingAnt exception {repr(e)}')
except Exception as e:
print(f'Got unexpected exception {repr(e)}') # please report this kind of exceptions by creating a new issue
else:
try:
parsed_data = parse_html(scrapingant_response.content)
break # Data is parsed successfully, so we dont need to retry
except Exception as e:
print(f'Got exception while parsing data {repr(e)}')
if parsed_data is None:
print(f'Failed to retrieve and parse data after {RETRIES_COUNT} tries')
# Can sleep and retry later, or stop the script execution, and research the reason
else:
print(f'Successfully parsed data: {parsed_data}')
```
### Sending custom headers
```python3
from scrapingant_client import ScrapingAntClient
client = ScrapingAntClient(token='')
result = client.general_request(
'https://httpbin.org/headers',
headers={
'test-header': 'test-value'
}
)
print(result.content)
# Http basic auth example
result = client.general_request(
'https://jigsaw.w3.org/HTTP/Basic/',
headers={'Authorization': 'Basic Z3Vlc3Q6Z3Vlc3Q='}
)
print(result.content)
```
### Simple async example
```python3
import asyncio
from scrapingant_client import ScrapingAntClient
client = ScrapingAntClient(token='')
async def main():
# Scrape the example.com site
result = await client.general_request_async('https://example.com')
print(result.content)
asyncio.run(main())
```
### Sending POST request
```python3
from scrapingant_client import ScrapingAntClient
client = ScrapingAntClient(token='')
# Sending POST request with json data
result = client.general_request(
url="https://httpbin.org/post",
method="POST",
json={"test": "test"},
)
print(result.content)
# Sending POST request with bytes data
result = client.general_request(
url="https://httpbin.org/post",
method="POST",
data=b'test_bytes',
)
print(result.content)
```
### Receiving markdown
```python3
from scrapingant_client import ScrapingAntClient
client = ScrapingAntClient(token='')
# Sending POST request with json data
result = client.markdown_request(
url="https://example.com",
)
print(result.markdown)
```
## Useful links
- [Scrapingant API doumentation](https://docs.scrapingant.com)
- [Scrapingant JS Client](https://github.com/scrapingant/scrapingant-client-js)