https://github.com/dyslab/spy-sample
Scrapy Learning... 🕷🕸🕸🕷
https://github.com/dyslab/spy-sample
python3 samples scrapy-spider
Last synced: 6 months ago
JSON representation
Scrapy Learning... 🕷🕸🕸🕷
- Host: GitHub
- URL: https://github.com/dyslab/spy-sample
- Owner: dyslab
- License: mit
- Created: 2019-10-12T08:42:12.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2024-04-11T23:15:33.000Z (over 1 year ago)
- Last Synced: 2025-04-05T05:45:41.724Z (9 months ago)
- Topics: python3, samples, scrapy-spider
- Language: Python
- Homepage:
- Size: 968 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# spy-sample: Python Scrapy Learning Program
[](https://scrapy.org/) [](./LICENSE)
***NOTE: The project is ONLY FOR LEARNING, TEST and EDUCATIONAL PURPOSE. It is NOT dedicated to be used as a practical part for certain specific purpose.***
## Development framework:
- Python version: v3.7
- Scrapy version: v1.8 (Check out [HERE](https://docs.scrapy.org/en/1.8/topics/commands.html) for details about Scrapy v1.8)
### Install virtual enviroment:
```bash
python3 -m venv venv
```
### Activate venv and run:
```bash
# Activate venv mode
source venv/bin/activate
# Install packages before first run. This is a one time action
pip install -r requirements.txt
# Jump to work directory and run python script
cd [project_dir] # './spytest' or './spyimg'
scrapy [command ...] # See below content
# Deactivate venv mode
deactivate
```
### Packages info
Install packages by **pip** in virtual enviroment. All packages listed in [requirements.txt](requirements.txt).
```bash
# Check out `requirements.txt`
cat requirements.txt
# Export packages list to `requirements.txt` in virtual enviroment
pip freeze > requirements.txt
```
## Sample Scripts CLIs
```bash
# Jump into the project directory './spytest' or './spyimg'
cd ./spytest # or, cd ./spyimg
# List all spiders belong to the project
scrapy list
```
- **spytest**
- 🕷 [xmlsample](./spytest/spytest/spiders/xmlsample.py)
```bash
# Fetch data from default url.
scrapy crawl --nolog xmlsample -o xmlsample.csv
# Fetch data and output to a json file from 'https://www.feng.com/rss.xml' according to the list 'avaliable_sites' in 'xmlsample.py'
scrapy crawl xmlsample -a target=feng.com -o xmlsample.json
```
- 🕷 [csvsample](./spytest/spytest/spiders/csvsample.py)
```bash
scrapy crawl csvsample -o csvsample.json
```
- 🕷 [sitemapsample](./spytest/spytest/spiders/sitemapsample.py)
```bash
scrapy crawl sitemapsample -o sitemapsample.csv
```
- _Deprecated spiders 🕷: ~~cptrack, tttrack, uspstrack~~_
- **spyimg**
- 🕷 [feimgs_svgrepo](./spyimg/spyimg/spiders/feimgs_svgrepo.py) (See demos on _[./spyimg/feimgs_svgrepo_demos/README.md](./spyimg/feimgs_svgrepo_demos/README.md)_ )
```bash
scrapy crawl --nolog feimgs_svgrepo -a cat=wechat
```
- 🕷 [feimgs_pornpics](./spyimg/spyimg/spiders/feimgs_pornpics.py)
```bash
scrapy crawl --nolog feimgs_pornpics -a url=https://www.pornpics.com/galleries/met-art-diana-a-nika-b-35320148/
```
- 🕷 [feimgs_imagefap](./spyimg/spyimg/spiders/feimgs_imagefap.py) (Fit for the gallery which contains less than 10-page photos)
```bash
scrapy crawl --nolog feimgs_imagefap -a url=https://www.imagefap.com/pictures/11922724/les1506
```
- 🕷 [feimgs_imagefap2](./spyimg/spyimg/spiders/feimgs_imagefap2.py) (Fit for all galleries)
```bash
scrapy crawl --nolog feimgs_imagefap2 -a url=https://www.imagefap.com/gallery/11922185
```
- _Deprecated spiders 🕷: ~~feimgs_mtrtsy, feimgs_kkrtys, feimgs_ojbk~~_
---
*··· Last Modified on 26 January 2024 ···*
*··· Created on 12 October 2019 ···*