An open API service indexing awesome lists of open source software.

https://github.com/william1nguyen/carlist-crawler-python

A Pipeline for extracting data from Carlist.my and load to ElasticSearch
https://github.com/william1nguyen/carlist-crawler-python

crawling-python elasticsearch etl-pipeline scrapy

Last synced: 8 months ago
JSON representation

A Pipeline for extracting data from Carlist.my and load to ElasticSearch

Awesome Lists containing this project

README

          

## CRAWLY

Scrapy Scripts for scraping data

### Run Commands

* Clone Project and go to `crawly/`

```
$ git clone git@github.com:natalieconan/crawly.git
$ cd crawly
```

* (Optional) To install pipenv with `Homebrew`:
```
$ brew install pipenv
```

* Activate Python Virtual Env using `Pipenv` and install packages
```
$ pipenv shell
$ pipenv install
```

* Finally, run spider for crawling
```
$ scrapy crawl ${spider_name}
```

In this case `spider_name = carlist`, so run this command to start crawling:

```
$ scrapy crawl carlist
```