https://github.com/aquatiko/craigslist-spider

A python spider to scrape jobs list and details form https://newyork.craigslist.org.
https://github.com/aquatiko/craigslist-spider

craigslist dynamic jobseeker python3 scrapy-spider

Last synced: 2 months ago
JSON representation

A python spider to scrape jobs list and details form https://newyork.craigslist.org.

Host: GitHub
URL: https://github.com/aquatiko/craigslist-spider
Owner: aquatiko
Created: 2018-07-18T10:16:09.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2018-07-18T10:38:33.000Z (over 7 years ago)
Last Synced: 2025-07-10T10:16:24.922Z (3 months ago)
Topics: craigslist, dynamic, jobseeker, python3, scrapy-spider
Language: Python
Homepage:
Size: 121 KB
Stars: 1
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Craigslist-spider
A python spider to scrape jobs list and their details form craigslist.

## Usage

In Terminal or CMD, navigate to the main Scrapy project folder, and run the spider:

```scrapy crawl jobs -o output.csv```

### Settings
In settings.py change these parts to make spider site friendly-

```Set CONCURRENT REQUESTS= 2 (or 5), to set maximum concurrent requests made by spider to domain. A high limit might be detected by domain.```

```ROBOTSTXT_OBEY= False ,to be able to scrape parts of website that itn't allowed by domain. You can check those rulse by visiting www.site_name/robots.txt```

```DOWNLOAD_DELAY=2 (in seconds), to allow a gap in time period between concurrent requests. This will make your spider slow but also lessens the chance of detected by domain.```

You can also uncomment other settings in settings.py and set their values for a more customized spider.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aquatiko/craigslist-spider

Awesome Lists containing this project

README