An open API service indexing awesome lists of open source software.

https://github.com/promptapi/scraper-py

Python package for Prompt API's Scraper API
https://github.com/promptapi/scraper-py

api-marketplace api-wrapper css-selector css-selector-parser data-extraction image-scraper promptapi python3 scraper scraper-api web-scraper web-scraping

Last synced: 28 days ago
JSON representation

Python package for Prompt API's Scraper API

Awesome Lists containing this project

README

          

![Python](https://img.shields.io/badge/python-3.7.4-green.svg)
![Version](https://img.shields.io/badge/version-0.2.4-orange.svg)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Build Status](https://travis-ci.org/promptapi/scraper-py.svg?branch=main)](https://travis-ci.org/promptapi/scraper-py)

# Prompt API - Scraper API - Python Package

`pa-scraper` is a python wrapper for [scraper api][scraper-api] with few
more extra cream and sugar.

## Requirements

1. You need to signup for [Prompt API][promptapi-signup]
1. You need to subscribe [scraper api][scraper-api], test drive is **free!!!**
1. You need to set `PROMPTAPI_TOKEN` environment variable after subscription.

then;

```bash
$ pip install pa-scraper
```

---

## Example Usage

Examples can be found [here][examples].

```python
# examples/fetch.py

from scraper import Scraper

url = 'https://pypi.org/classifiers/'
scraper = Scraper(url)
response = scraper.get()

if response.get('error', None):
# response['error'] returns error message
# response['status'] returns http status code

# Example: {'error': 'Not Found', 'status': 404}
print(response) # noqa: T001
else:
data = response['result']['data']
headers = response['result']['headers']
url = response['result']['url']
status = response['status']

# print(data) # print fetched html, will be long :)

print(headers) # noqa: T001
# {'Content-Length': '321322', 'Content-Type': 'text/html; charset=UTF-8', ... }

print(status) # noqa: T001
# 200

save_result = scraper.save('/tmp/my-data.html') # noqa: S108

if save_result.get('error', None):
# save error occured...
# add you code here...
pass

print(save_result) # noqa: T001
# {'file': '/tmp/my-data.html', 'size': 321322}

```

You can add url parameters for extra operations. Valid parameters are:

- `auth_password`: for HTTP Realm auth password
- `auth_username`: for HTTP Realm auth username
- `cookie`: URL Encoded cookie header.
- `country`: 2 character country code. If you wish to scrape from an IP address of a specific country.
- `referer`: HTTP referer header
- `selector`: CSS style selector path such as `a.btn div li`. If `selector`
is enabled, returning result will be collection of data and saved file
will be in `.json` format.

Here is an example with using url parameters and `selector`:

```python
# examples/fetch_with_params.py

from scraper import Scraper

url = 'https://pypi.org/classifiers/'
scraper = Scraper(url)

fetch_params = dict(country='EE', selector='ul li button[data-clipboard-text]')
response = scraper.get(params=fetch_params)

if response.get('error', None):
# response['error'] returns error message
# response['status'] returns http status code

# Example: {'error': 'Not Found', 'status': 404}
print(response) # noqa: T001
else:
data = response['result']['data']
headers = response['result']['headers']
url = response['result']['url']
status = response['status']

# print(data) # noqa: T001
# ['