Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bububa/supermario
A python module which support an other project named bububa.Lego provide several advance web scrape functions.
https://github.com/bububa/supermario
Last synced: about 5 hours ago
JSON representation
A python module which support an other project named bububa.Lego provide several advance web scrape functions.
- Host: GitHub
- URL: https://github.com/bububa/supermario
- Owner: bububa
- Created: 2009-12-02T05:26:03.000Z (almost 15 years ago)
- Default Branch: master
- Last Pushed: 2010-03-03T11:57:10.000Z (over 14 years ago)
- Last Synced: 2023-04-18T23:29:48.248Z (over 1 year ago)
- Language: Python
- Homepage: http://syd.todayclose.com/
- Size: 193 KB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.txt
Awesome Lists containing this project
README
= About SuperMario =
SuperMario is an advance web cralwer library written in python. It
provides a number of methods to mine data from kinds of sites.== License ==
BSD License
See 'LICENSE' for details.== Requirements ==
Platform: *nix like system (Unix, Linux, Mac OS X, etc.)
Python: 2.5+
Storage: mongodb
Some other python models:
- simplejson
- BeautifulSoup
- eventlet
- PIL
- pycurl
- chardet
- feedparser
- mongokit
- templatemaker
- flickrapi
- pyyaml
- MySQLdb
- dateutil== Features ==
+ robots.txt protocol supported;
+ cache URL 's HTML;
+ normalize URL;
+ convert all content into unicode;
+ extract MainText from HTML by specific a * link-threshold *
+ convert partial RSS feed to full RSS feed;
+ proxies list support;
+ cookie keep support;
+ login support;