Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hubertroy/seen
A lightweight crawling/spider framework for everyone(support JavaScript!).:sparkles:
https://github.com/hubertroy/seen
easy-to-use javasciprt lightweight-framework python3 spider-framework support-javascript web-crawling
Last synced: 8 days ago
JSON representation
A lightweight crawling/spider framework for everyone(support JavaScript!).:sparkles:
- Host: GitHub
- URL: https://github.com/hubertroy/seen
- Owner: HuberTRoy
- Created: 2017-11-20T03:46:35.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2018-07-19T05:34:30.000Z (over 6 years ago)
- Last Synced: 2024-10-11T22:12:08.687Z (26 days ago)
- Topics: easy-to-use, javasciprt, lightweight-framework, python3, spider-framework, support-javascript, web-crawling
- Language: Python
- Homepage:
- Size: 82 KB
- Stars: 13
- Watchers: 4
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Seen
##
Seen is a lightweight web crawling framework for everyone.
Written with `asyncio`,`aiohttp/requests`.It is useful for writing a web crawling quickly and get **FULL JavaScript Support**.
**Working Process:**
![workingProcess](https://github.com/HuberTRoy/seen/blob/master/img/process.png)## Requirements:
* Python 3.5+
* aiohttp or requests
* pyquery## Installation:
```
pip install seen
```Get JavaScript support!
```
pip install pyppeteer
```## Usage:
1. Write spider.py
```python
from seen import Spider, Parser, Item, Cssclass Post(Item):
title = Css('title')
img = Css('img', 'src')def save(self):
print(self.result['title'])
print(self.result['img'])class MySpider(Spider):
roots = 'https://www.v2ex.com'
url_limit = ('www.v2ex.com')
concurrency = 1
# if you want to load JavaScript, set use_browser = True
# by default is False.
use_browser = Falseparsers = [Parser(Post)]
if __name__ == '__main__':
spider = MySpider()spider.start()
```2. Run `python spider.py`.
3. Check result.## Contribution
* Pull request.
* Open an issue.