Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/brownred/picture-crawling-with-scrapy-and-beautifulsoup4

Last synced: 26 days ago
JSON representation

Host: GitHub
URL: https://github.com/brownred/picture-crawling-with-scrapy-and-beautifulsoup4
Owner: Brownred
Created: 2023-02-11T07:14:00.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-02-13T04:48:15.000Z (almost 2 years ago)
Last Synced: 2024-01-23T21:51:46.065Z (11 months ago)
Language: Python
Size: 793 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# scrapy-picture-spider

[![Open Source Love](https://badges.frapsoft.com/os/v1/open-source.svg?v=103)](https://github.com/ellerbrock/open-source-badge/)

This project is a spider that uses scrapy and beautifulsoup4 for crawl picture.

## Depend Library
This project the depend library have: BeautifulSoup4,Requests,Scarpy,BloomFilter.

So need execute following command:

```
pip install beautifulsoup4
pip install lxml
pip install Scrapy
pip install pybloom-live
pip install requests
```

## Proxies Module
This module for dynamic crawl ip proxy which supply other module use.

### Quick start
You can directly execute the script of name is `proxies_spider.py` also at function `download()` set your download path.

## Deviant_art Module
This module for dynamic crawl image of the https://www.deviantart.com/ and download to your computer.

![Example](doc/example.gif)

### Quick start
In the directory of deviant_art_spider execute the following command start this crawler.
```
scrapy crawl deviant_art_image_spider
```

You can setting configuration at `settings.py`.

```
# The attribute is your path for download image
# It default path is current dir
IMAGES_STORE = '.'

# Scrapy close when downloaded image equal to this attribute
MAXIMUM_IMAGE_NUMBER = 10000
```

## Pixiv
This module for dynamic crawl image of the https://www.pixiv.net/ and download to your computer.

### Quick start
You need set your username and password for login pixiv at the settings.py.

```
USERNAME = 'your pixiv username'
PASSWORD = 'your pixiv password'
```

Then in the directory of pixiv_spider execute the following command start this crawler.
```
scrapy crawl pixiv_daily_spider
```