https://github.com/wapiti08/crawlerset
The collections for different platforms to apply the python crawler and scrapy to extract information and also present different scraping methods
https://github.com/wapiti08/crawlerset
bs4 crawlspider parser python3 scrapy scrapy-redis selenium selenium-webdriver spider xml
Last synced: 2 months ago
JSON representation
The collections for different platforms to apply the python crawler and scrapy to extract information and also present different scraping methods
- Host: GitHub
- URL: https://github.com/wapiti08/crawlerset
- Owner: Wapiti08
- Created: 2018-05-17T04:20:42.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2022-10-04T14:15:17.000Z (over 3 years ago)
- Last Synced: 2025-10-25T22:05:54.935Z (8 months ago)
- Topics: bs4, crawlspider, parser, python3, scrapy, scrapy-redis, selenium, selenium-webdriver, spider, xml
- Language: Python
- Homepage:
- Size: 1.16 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# spider-project
The collections for different platforms to apply the python crawler and scrapy to extract information.
**1. Douyu**:
Using the **scrapy framework** to capture pictures from mobile app.
**2. acfun spider**:
**Basic crawler** to extract core information or comments about the opera.
It provides the function to sort the most hot operas.
**3. spider-for-movie**:
This project is similar to the "acfun spider", it provides some basic ideas to **bypass the block of anti-crawler** policy on websites.
**4. TenSpider**:
The crawler to capture the positions from tencent company, which helps you find the most desired job to apply.
**5. Products_Clawer**:
The most advanced clawer to grap information for all products in one page to **save as json format**
**6. Google_API_Query**:
The script can be used to **search with any keywords** on google with bypassing ability.
**7. Go cocurrency**:
```
go mod tidy
go run go_crawler.go {websites}
```
**Usage about the scrapy framework**:
You can check the note.txt for some instructions and attentions.