https://github.com/promptapi/scraper-py

Python package for Prompt API's Scraper API
https://github.com/promptapi/scraper-py

api-marketplace api-wrapper css-selector css-selector-parser data-extraction image-scraper promptapi python3 scraper scraper-api web-scraper web-scraping

Last synced: about 2 months ago
JSON representation

Python package for Prompt API's Scraper API

Host: GitHub
URL: https://github.com/promptapi/scraper-py
Owner: promptapi
License: mit
Created: 2020-08-31T18:54:12.000Z (over 5 years ago)
Default Branch: main
Last Pushed: 2020-10-06T04:45:46.000Z (over 5 years ago)
Last Synced: 2025-08-20T00:56:50.189Z (7 months ago)
Topics: api-marketplace, api-wrapper, css-selector, css-selector-parser, data-extraction, image-scraper, promptapi, python3, scraper, scraper-api, web-scraper, web-scraping
Language: Python
Homepage: https://promptapi.com/marketplace/description/scraper-api
Size: 55.7 KB
Stars: 5
Watchers: 3
Forks: 3
Open Issues: 1
Metadata Files:
- Readme: README.md
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

          ![Python](https://img.shields.io/badge/python-3.7.4-green.svg)

![Version](https://img.shields.io/badge/version-0.2.4-orange.svg)

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

[![Build Status](https://travis-ci.org/promptapi/scraper-py.svg?branch=main)](https://travis-ci.org/promptapi/scraper-py)

# Prompt API - Scraper API - Python Package

`pa-scraper` is a python wrapper for [scraper api][scraper-api] with few

more extra cream and sugar.

## Requirements

1. You need to signup for [Prompt API][promptapi-signup]

1. You need to subscribe [scraper api][scraper-api], test drive is **free!!!**

1. You need to set `PROMPTAPI_TOKEN` environment variable after subscription.

then;

```bash

$ pip install pa-scraper

```

---

## Example Usage

Examples can be found [here][examples].

```python

# examples/fetch.py

from scraper import Scraper

url = 'https://pypi.org/classifiers/'

scraper = Scraper(url)

response = scraper.get()

if response.get('error', None):

    # response['error']  returns error message

    # response['status'] returns http status code

    # Example: {'error': 'Not Found', 'status': 404}

    print(response)  # noqa: T001

else:

    data = response['result']['data']

    headers = response['result']['headers']

    url = response['result']['url']

    status = response['status']

    # print(data) # print fetched html, will be long :)

    print(headers)  # noqa: T001

    # {'Content-Length': '321322', 'Content-Type': 'text/html; charset=UTF-8', ... }

    print(status)  # noqa: T001

    # 200

    save_result = scraper.save('/tmp/my-data.html')  # noqa: S108

    if save_result.get('error', None):

        # save error occured...

        # add you code here...

        pass

    print(save_result)  # noqa: T001

    # {'file': '/tmp/my-data.html', 'size': 321322}

```

You can add url parameters for extra operations. Valid parameters are:

- `auth_password`: for HTTP Realm auth password

- `auth_username`: for HTTP Realm auth username

- `cookie`: URL Encoded cookie header.

- `country`: 2 character country code. If you wish to scrape from an IP address of a specific country.

- `referer`: HTTP referer header

- `selector`: CSS style selector path such as `a.btn div li`. If `selector`

  is enabled, returning result will be collection of data and saved file

  will be in `.json` format.

Here is an example with using url parameters and `selector`:

```python

# examples/fetch_with_params.py

from scraper import Scraper

url = 'https://pypi.org/classifiers/'

scraper = Scraper(url)

fetch_params = dict(country='EE', selector='ul li button[data-clipboard-text]')

response = scraper.get(params=fetch_params)

if response.get('error', None):

    # response['error']  returns error message

    # response['status'] returns http status code

    # Example: {'error': 'Not Found', 'status': 404}

    print(response)  # noqa: T001

else:

    data = response['result']['data']

    headers = response['result']['headers']

    url = response['result']['url']

    status = response['status']

    # print(data)  # noqa: T001

    # ['

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/promptapi/scraper-py

Awesome Lists containing this project

README