https://github.com/yizhezhang-ervin/knowledge_scrapy
Scrapy
https://github.com/yizhezhang-ervin/knowledge_scrapy
python scrapy
Last synced: 4 months ago
JSON representation
Scrapy
- Host: GitHub
- URL: https://github.com/yizhezhang-ervin/knowledge_scrapy
- Owner: YizheZhang-Ervin
- Created: 2020-03-30T16:32:07.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2020-03-30T16:55:46.000Z (about 5 years ago)
- Last Synced: 2025-01-06T04:40:53.887Z (5 months ago)
- Topics: python, scrapy
- Language: Python
- Homepage:
- Size: 6.84 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# EZScrapy
Scrapy## Install:
pip install scrapy
scrapy -h
## Components(5+2):
1) Spiders[Entrance]
2) Downloader
3) Scheduler
4) Itempipelines
5) Engine
1&5) Spider Middleware
2&5) Downloader Middleware
Sequence:1-5-3-5-2-5-1-5-4
## Steps:
scrapy command options args
scrapy startproject name dir
scrapy genspider options spidername domain
scrapy crawl spidername
scrapy settings options
scrapy list
scrapy shell url
## Gain Information Methods:
BeautifulSoup
lxml
re
xPath Selector
CSS Selector
## Settings:
CONCURRENT_REQUESTS: 32 default
CONCURRENT_ITEMS: 100 default
CONCURRENT_REQUESTS_PER_DOMAIN: 8 default
CONCURRENT_REQUESTS_PER_IP: 0 default