Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/engageintellect/scrapers
A repository of web scrapers using Python & Scrapy
https://github.com/engageintellect/scrapers
crawler python scrapy spider
Last synced: 28 days ago
JSON representation
A repository of web scrapers using Python & Scrapy
- Host: GitHub
- URL: https://github.com/engageintellect/scrapers
- Owner: engageintellect
- Created: 2024-03-28T01:56:45.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-03-28T02:03:47.000Z (10 months ago)
- Last Synced: 2024-10-26T18:46:31.077Z (3 months ago)
- Topics: crawler, python, scrapy, spider
- Language: Python
- Homepage:
- Size: 8.79 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# scrapers
## Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.### Prerequisites
What things you need to install the software and how to install them```bash
pip install -r requirements.txt
```### Running a Spider
To run a spider, use the following command:
```bash
cd hackernews_scraper
```Without outputting to a file.
```bash
scrapy crawl hackernews
```With output to a file.
```bash
scrapy crawl hackernews -o results.json
```Using the Scrapy shell
```bash
scrapy shell 'https://news.ycombinator.com/'
``````python
print(response.css('td.title a::text').get())
print(response.css('td.title a::attr(href)').get())
```