https://github.com/promptapi/scraper-py
Python package for Prompt API's Scraper API
https://github.com/promptapi/scraper-py
api-marketplace api-wrapper css-selector css-selector-parser data-extraction image-scraper promptapi python3 scraper scraper-api web-scraper web-scraping
Last synced: 28 days ago
JSON representation
Python package for Prompt API's Scraper API
- Host: GitHub
- URL: https://github.com/promptapi/scraper-py
- Owner: promptapi
- License: mit
- Created: 2020-08-31T18:54:12.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2020-10-06T04:45:46.000Z (over 5 years ago)
- Last Synced: 2025-08-20T00:56:50.189Z (6 months ago)
- Topics: api-marketplace, api-wrapper, css-selector, css-selector-parser, data-extraction, image-scraper, promptapi, python3, scraper, scraper-api, web-scraper, web-scraping
- Language: Python
- Homepage: https://promptapi.com/marketplace/description/scraper-api
- Size: 55.7 KB
- Stars: 5
- Watchers: 3
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README


[](https://github.com/psf/black)
[](https://travis-ci.org/promptapi/scraper-py)
# Prompt API - Scraper API - Python Package
`pa-scraper` is a python wrapper for [scraper api][scraper-api] with few
more extra cream and sugar.
## Requirements
1. You need to signup for [Prompt API][promptapi-signup]
1. You need to subscribe [scraper api][scraper-api], test drive is **free!!!**
1. You need to set `PROMPTAPI_TOKEN` environment variable after subscription.
then;
```bash
$ pip install pa-scraper
```
---
## Example Usage
Examples can be found [here][examples].
```python
# examples/fetch.py
from scraper import Scraper
url = 'https://pypi.org/classifiers/'
scraper = Scraper(url)
response = scraper.get()
if response.get('error', None):
# response['error'] returns error message
# response['status'] returns http status code
# Example: {'error': 'Not Found', 'status': 404}
print(response) # noqa: T001
else:
data = response['result']['data']
headers = response['result']['headers']
url = response['result']['url']
status = response['status']
# print(data) # print fetched html, will be long :)
print(headers) # noqa: T001
# {'Content-Length': '321322', 'Content-Type': 'text/html; charset=UTF-8', ... }
print(status) # noqa: T001
# 200
save_result = scraper.save('/tmp/my-data.html') # noqa: S108
if save_result.get('error', None):
# save error occured...
# add you code here...
pass
print(save_result) # noqa: T001
# {'file': '/tmp/my-data.html', 'size': 321322}
```
You can add url parameters for extra operations. Valid parameters are:
- `auth_password`: for HTTP Realm auth password
- `auth_username`: for HTTP Realm auth username
- `cookie`: URL Encoded cookie header.
- `country`: 2 character country code. If you wish to scrape from an IP address of a specific country.
- `referer`: HTTP referer header
- `selector`: CSS style selector path such as `a.btn div li`. If `selector`
is enabled, returning result will be collection of data and saved file
will be in `.json` format.
Here is an example with using url parameters and `selector`:
```python
# examples/fetch_with_params.py
from scraper import Scraper
url = 'https://pypi.org/classifiers/'
scraper = Scraper(url)
fetch_params = dict(country='EE', selector='ul li button[data-clipboard-text]')
response = scraper.get(params=fetch_params)
if response.get('error', None):
# response['error'] returns error message
# response['status'] returns http status code
# Example: {'error': 'Not Found', 'status': 404}
print(response) # noqa: T001
else:
data = response['result']['data']
headers = response['result']['headers']
url = response['result']['url']
status = response['status']
# print(data) # noqa: T001
# ['