Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/scrapingant/scrapingant-client-python

ScrapingAnt API client for Python.
https://github.com/scrapingant/scrapingant-client-python

crawler scraper scraping scrapingant scrapy webscraping

Last synced: 3 days ago
JSON representation

ScrapingAnt API client for Python.

Awesome Lists containing this project

README

        

# ScrapingAnt API client for Python

[![PyPI version](https://badge.fury.io/py/scrapingant-client.svg)](https://badge.fury.io/py/scrapingant-client)

`scrapingant-client` is the official library to access [ScrapingAnt API](https://docs.scrapingant.com) from your Python
applications. It provides useful features like parameters encoding to improve the ScrapingAnt usage experience. Requires
python 3.6+.

- [Quick Start](#quick-start)
- [API token](#api-token)
- [API Reference](#api-reference)
- [Exceptions](#exceptions)
- [Examples](#examples)
- [Useful links](#useful-links)

## Quick Start

```python3
from scrapingant_client import ScrapingAntClient

client = ScrapingAntClient(token='')
# Scrape the example.com site
result = client.general_request('https://example.com')
print(result.content)
```

## Install

```shell
pip install scrapingant-client
```

If you need async support:

```shell
pip install scrapingant-client[async]
```

## API token

In order to get API token you'll need to register at [ScrapingAnt Service](https://app.scrapingant.com)

## API Reference

All public classes, methods and their parameters can be inspected in this API reference.

#### ScrapingAntClient(token)

Main class of this library.

| Param | Type |
|-------|---------------------|
| token | string |

* * *

#### Common arguments
- ScrapingAntClient.general_request
- ScrapingAntClient.general_request_async
- ScrapingAntClient.markdown_request
- ScrapingAntClient.markdown_request_async

https://docs.scrapingant.com/request-response-format#available-parameters

| Param | Type | Default |
|---------------------|----------------------------------------------------------------------------------------------------------------------------|------------|
| url | string | |
| method | string | GET |
| cookies | List[Cookie] | None |
| headers | List[Dict[str, str]] | None |
| js_snippet | string | None |
| proxy_type | ProxyType | datacenter |
| proxy_country | str | None |
| wait_for_selector | str | None |
| browser | boolean | True |
| return_page_source | boolean | False |
| data | same as [requests param 'data'](https://requests.readthedocs.io/en/latest/user/quickstart/#more-complicated-post-requests) | None |
| json | same as [requests param 'json'](https://requests.readthedocs.io/en/latest/user/quickstart/#more-complicated-post-requests) | None |

**IMPORTANT NOTE:** js_snippet will be encoded to Base64 automatically by the ScrapingAnt client library.

* * *

#### Cookie

Class defining cookie. Currently it supports only name and value

| Param | Type |
|-------|---------------------|
| name | string |
| value | string |

* * *

#### Response

Class defining response from API.
| Param | Type |
|-------------|----------------------------|
| content | string |
| cookies | List[Cookie] |
| status_code | int |
| text | string |

## Exceptions

`ScrapingantClientException` is base Exception class, used for all errors.

| Exception | Reason |
|--------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
| ScrapingantInvalidTokenException | The API token is wrong or you have exceeded the API calls request limit |
| ScrapingantInvalidInputException | Invalid value provided. Please, look into error message for more info |
| ScrapingantInternalException | Something went wrong with the server side code. Try again later or contact ScrapingAnt support |
| ScrapingantSiteNotReachableException | The requested URL is not reachable. Please, check it locally |
| ScrapingantDetectedException | The anti-bot detection system has detected the request. Please, retry or change the request settings. |
| ScrapingantTimeoutException | Got timeout while communicating with Scrapingant servers. Check your network connection. Please try later or contact support |

* * *

## Examples

### Sending custom cookies

```python3
from scrapingant_client import ScrapingAntClient
from scrapingant_client import Cookie

client = ScrapingAntClient(token='')

result = client.general_request(
'https://httpbin.org/cookies',
cookies=[
Cookie(name='cookieName1', value='cookieVal1'),
Cookie(name='cookieName2', value='cookieVal2'),
]
)
print(result.content)
# Response cookies is a list of Cookie objects
# They can be used in next requests
response_cookies = result.cookies
```

### Executing custom JS snippet

```python
from scrapingant_client import ScrapingAntClient

client = ScrapingAntClient(token='')

customJsSnippet = """
var str = 'Hello, world!';
var htmlElement = document.getElementsByTagName('html')[0];
htmlElement.innerHTML = str;
"""
result = client.general_request(
'https://example.com',
js_snippet=customJsSnippet,
)
print(result.content)
```

### Exception handling and retries

```python
from scrapingant_client import ScrapingAntClient, ScrapingantClientException, ScrapingantInvalidInputException

client = ScrapingAntClient(token='')

RETRIES_COUNT = 3

def parse_html(html: str):
... # Implement your data extraction here

parsed_data = None
for retry_number in range(RETRIES_COUNT):
try:
scrapingant_response = client.general_request(
'https://example.com',
)
except ScrapingantInvalidInputException as e:
print(f'Got invalid input exception: {{repr(e)}}')
break # We are not retrying if request params are not valid
except ScrapingantClientException as e:
print(f'Got ScrapingAnt exception {repr(e)}')
except Exception as e:
print(f'Got unexpected exception {repr(e)}') # please report this kind of exceptions by creating a new issue
else:
try:
parsed_data = parse_html(scrapingant_response.content)
break # Data is parsed successfully, so we dont need to retry
except Exception as e:
print(f'Got exception while parsing data {repr(e)}')

if parsed_data is None:
print(f'Failed to retrieve and parse data after {RETRIES_COUNT} tries')
# Can sleep and retry later, or stop the script execution, and research the reason
else:
print(f'Successfully parsed data: {parsed_data}')
```

### Sending custom headers

```python3
from scrapingant_client import ScrapingAntClient

client = ScrapingAntClient(token='')

result = client.general_request(
'https://httpbin.org/headers',
headers={
'test-header': 'test-value'
}
)
print(result.content)

# Http basic auth example
result = client.general_request(
'https://jigsaw.w3.org/HTTP/Basic/',
headers={'Authorization': 'Basic Z3Vlc3Q6Z3Vlc3Q='}
)
print(result.content)
```

### Simple async example

```python3
import asyncio

from scrapingant_client import ScrapingAntClient

client = ScrapingAntClient(token='')

async def main():
# Scrape the example.com site
result = await client.general_request_async('https://example.com')
print(result.content)

asyncio.run(main())
```

### Sending POST request

```python3
from scrapingant_client import ScrapingAntClient

client = ScrapingAntClient(token='')

# Sending POST request with json data
result = client.general_request(
url="https://httpbin.org/post",
method="POST",
json={"test": "test"},
)
print(result.content)

# Sending POST request with bytes data
result = client.general_request(
url="https://httpbin.org/post",
method="POST",
data=b'test_bytes',
)
print(result.content)
```

### Receiving markdown

```python3
from scrapingant_client import ScrapingAntClient

client = ScrapingAntClient(token='')

# Sending POST request with json data
result = client.markdown_request(
url="https://example.com",
)
print(result.markdown)
```

## Useful links

- [Scrapingant API doumentation](https://docs.scrapingant.com)
- [Scrapingant JS Client](https://github.com/scrapingant/scrapingant-client-js)