https://github.com/addono/cpas-crawler

A Scrapy crawler which crawls the literature database at http://cpas.antenna.nl/databases
https://github.com/addono/cpas-crawler

Last synced: 10 months ago
JSON representation

A Scrapy crawler which crawls the literature database at http://cpas.antenna.nl/databases

Host: GitHub
URL: https://github.com/addono/cpas-crawler
Owner: Addono
Created: 2019-01-04T12:35:34.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2024-05-14T22:15:34.000Z (about 2 years ago)
Last Synced: 2024-12-27T00:27:26.744Z (over 1 year ago)
Language: Python
Homepage:
Size: 9.77 KB
Stars: 0
Watchers: 3
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# CPAS Crawler
A Scrapy crawler to crawl the [CPAS Antenna database](http://cpas.antenna.nl/databases). A previously crawled example of the dataset can be found [here](https://app.scrapinghub.com/datasets/r5lDO9bYTqU).

## Usage
### Local
Install the project locally:
```
git clone https://github.com/Addono/CPAS-Crawler
cd CPAS-Crawler
pip install -r requirements.txt
```

Run the crawler:
```
scrapy crawl cpas-spider -o output.json -a start=1508 -a end=13000
```

### Scrapinghub
1. Create a new project
1. Clone this repository as the source code of the project
1. Run the `cpas-spider`, make sure to set the arguments `start` and `end` to indiciate the ID of the first and last crawled element.

## Troubleshooting
Windows users might need to install the following packages `pypiwin32` and `pywin32`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/addono/cpas-crawler

Awesome Lists containing this project

README