https://github.com/zhaipro/cdn
抓取静态网站
https://github.com/zhaipro/cdn
cdn spider
Last synced: about 1 month ago
JSON representation
抓取静态网站
- Host: GitHub
- URL: https://github.com/zhaipro/cdn
- Owner: zhaipro
- License: mit
- Created: 2018-09-27T15:25:24.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2021-03-08T14:28:29.000Z (over 5 years ago)
- Last Synced: 2025-01-28T03:50:01.997Z (over 1 year ago)
- Topics: cdn, spider
- Language: Python
- Size: 15.6 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# cdn
抓取静态网站
## 爬取效果
```
$ make run
scrapy crawl spider -L INFO -o db.sqlite
2018-09-28 08:02:57 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: crawler)
...
2018-09-28 08:34:29 [scrapy.core.engine] INFO: Spider closed (finished)
$
$ du -B G db.sqlite
10G db.sqlite
$
$ sqlite3 db.sqlite 'select count(*) from page;'
39536
$
$ make runserver
export FLASK_DEBUG=1 FLASK_APP=app.py; flask run
...
127.0.0.1 - - [28/Sep/2018 08:35:26] "GET / HTTP/1.1" 200 -
...
```