https://github.com/addono/cpas-crawler
A Scrapy crawler which crawls the literature database at http://cpas.antenna.nl/databases
https://github.com/addono/cpas-crawler
Last synced: 5 months ago
JSON representation
A Scrapy crawler which crawls the literature database at http://cpas.antenna.nl/databases
- Host: GitHub
- URL: https://github.com/addono/cpas-crawler
- Owner: Addono
- Created: 2019-01-04T12:35:34.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2024-05-14T22:15:34.000Z (over 1 year ago)
- Last Synced: 2024-12-27T00:27:26.744Z (about 1 year ago)
- Language: Python
- Homepage:
- Size: 9.77 KB
- Stars: 0
- Watchers: 3
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# CPAS Crawler
A Scrapy crawler to crawl the [CPAS Antenna database](http://cpas.antenna.nl/databases). A previously crawled example of the dataset can be found [here](https://app.scrapinghub.com/datasets/r5lDO9bYTqU).
## Usage
### Local
Install the project locally:
```
git clone https://github.com/Addono/CPAS-Crawler
cd CPAS-Crawler
pip install -r requirements.txt
```
Run the crawler:
```
scrapy crawl cpas-spider -o output.json -a start=1508 -a end=13000
```
### Scrapinghub
1. Create a new project
1. Clone this repository as the source code of the project
1. Run the `cpas-spider`, make sure to set the arguments `start` and `end` to indiciate the ID of the first and last crawled element.
## Troubleshooting
Windows users might need to install the following packages `pypiwin32` and `pywin32`.