https://github.com/engageintellect/scrapers
A repository of web scrapers using Python & Scrapy
https://github.com/engageintellect/scrapers
crawler python scrapy spider
Last synced: about 1 year ago
JSON representation
A repository of web scrapers using Python & Scrapy
- Host: GitHub
- URL: https://github.com/engageintellect/scrapers
- Owner: engageintellect
- Created: 2024-03-28T01:56:45.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-03-28T02:03:47.000Z (about 2 years ago)
- Last Synced: 2025-02-06T13:48:42.863Z (over 1 year ago)
- Topics: crawler, python, scrapy, spider
- Language: Python
- Homepage:
- Size: 8.79 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# scrapers
## Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
### Prerequisites
What things you need to install the software and how to install them
```bash
pip install -r requirements.txt
```
### Running a Spider
To run a spider, use the following command:
```bash
cd hackernews_scraper
```
Without outputting to a file.
```bash
scrapy crawl hackernews
```
With output to a file.
```bash
scrapy crawl hackernews -o results.json
```
Using the Scrapy shell
```bash
scrapy shell 'https://news.ycombinator.com/'
```
```python
print(response.css('td.title a::text').get())
print(response.css('td.title a::attr(href)').get())
```