Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/brownred/picture-crawling-with-scrapy-and-beautifulsoup4
https://github.com/brownred/picture-crawling-with-scrapy-and-beautifulsoup4
Last synced: 26 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/brownred/picture-crawling-with-scrapy-and-beautifulsoup4
- Owner: Brownred
- Created: 2023-02-11T07:14:00.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-02-13T04:48:15.000Z (almost 2 years ago)
- Last Synced: 2024-01-23T21:51:46.065Z (11 months ago)
- Language: Python
- Size: 793 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# scrapy-picture-spider
[![Open Source Love](https://badges.frapsoft.com/os/v1/open-source.svg?v=103)](https://github.com/ellerbrock/open-source-badge/)
This project is a spider that uses scrapy and beautifulsoup4 for crawl picture.
## Depend Library
This project the depend library have: BeautifulSoup4,Requests,Scarpy,BloomFilter.So need execute following command:
```
pip install beautifulsoup4
pip install lxml
pip install Scrapy
pip install pybloom-live
pip install requests
```## Proxies Module
This module for dynamic crawl ip proxy which supply other module use.### Quick start
You can directly execute the script of name is `proxies_spider.py` also at function `download()` set your download path.## Deviant_art Module
This module for dynamic crawl image of the https://www.deviantart.com/ and download to your computer.![Example](doc/example.gif)
### Quick start
In the directory of deviant_art_spider execute the following command start this crawler.
```
scrapy crawl deviant_art_image_spider
```You can setting configuration at `settings.py`.
```
# The attribute is your path for download image
# It default path is current dir
IMAGES_STORE = '.'# Scrapy close when downloaded image equal to this attribute
MAXIMUM_IMAGE_NUMBER = 10000
```## Pixiv
This module for dynamic crawl image of the https://www.pixiv.net/ and download to your computer.### Quick start
You need set your username and password for login pixiv at the settings.py.```
USERNAME = 'your pixiv username'
PASSWORD = 'your pixiv password'
```Then in the directory of pixiv_spider execute the following command start this crawler.
```
scrapy crawl pixiv_daily_spider
```