Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/bububa/supermario

A python module which support an other project named bububa.Lego provide several advance web scrape functions.
https://github.com/bububa/supermario

Last synced: about 5 hours ago
JSON representation

A python module which support an other project named bububa.Lego provide several advance web scrape functions.

Awesome Lists containing this project

README

        

= About SuperMario =
SuperMario is an advance web cralwer library written in python. It
provides a number of methods to mine data from kinds of sites.

== License ==
BSD License
See 'LICENSE' for details.

== Requirements ==
Platform: *nix like system (Unix, Linux, Mac OS X, etc.)
Python: 2.5+
Storage: mongodb
Some other python models:
- simplejson
- BeautifulSoup
- eventlet
- PIL
- pycurl
- chardet
- feedparser
- mongokit
- templatemaker
- flickrapi
- pyyaml
- MySQLdb
- dateutil

== Features ==
+ robots.txt protocol supported;
+ cache URL 's HTML;
+ normalize URL;
+ convert all content into unicode;
+ extract MainText from HTML by specific a * link-threshold *
+ convert partial RSS feed to full RSS feed;
+ proxies list support;
+ cookie keep support;
+ login support;