https://github.com/sreejoy/crawlerfriend
A light weight crawler which gives search results in HTML form or in Dictionary form, given URLs and keywords.
https://github.com/sreejoy/crawlerfriend
crawler python-crawler python-scraper python27 scrapper
Last synced: 10 days ago
JSON representation
A light weight crawler which gives search results in HTML form or in Dictionary form, given URLs and keywords.
- Host: GitHub
- URL: https://github.com/sreejoy/crawlerfriend
- Owner: Sreejoy
- License: mit
- Created: 2018-07-28T04:48:18.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2018-08-14T15:12:29.000Z (almost 7 years ago)
- Last Synced: 2025-03-02T08:48:37.822Z (4 months ago)
- Topics: crawler, python-crawler, python-scraper, python27, scrapper
- Language: Python
- Size: 16.6 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## CrawlerFriend
A light weight **Web Crawler** that supports **Python 2.7** which gives search results in HTML form or in
Dictionary form given URLs and Keywords. If you regularly visit a few websites and look for a few keywords
then this python package will automate the task for you and
return the result in a HTML file in your web browser.### Installation
```
pip install CrawlerFriend
```### How to use?
#### All Result in HTML
```
import CrawlerFriendurls = ["http://www.goal.com/","http://www.skysports.com/football","https://www.bbc.com/sport/football"]
keywords = ["Ronaldo","Liverpool","Salah","Real Madrid","Arsenal","Chelsea","Man United","Man City"]crawler = CrawlerFriend.Crawler(urls, keywords)
crawler.crawl()
crawler.get_result_in_html()
```The above code will open the following HTML document in Browser

#### All Result in Dictionary
```
result_dict = crawler.get_result()
```#### Changing Default Arguments
CrawlerFriend uses four HTML tags 'title', 'h1', 'h2', 'h3' and max_link_limit = 50 by default for searching.
But it can be changed by passing arguments to the constructor:
```
crawler = CrawlerFriend.Crawler(urls, keywords, max_link_limit=200, tags=['p','h4'])
crawler.crawl()
```