https://github.com/sreejoy/crawlerfriend

A light weight crawler which gives search results in HTML form or in Dictionary form, given URLs and keywords.
https://github.com/sreejoy/crawlerfriend

crawler python-crawler python-scraper python27 scrapper

Last synced: 10 days ago
JSON representation

A light weight crawler which gives search results in HTML form or in Dictionary form, given URLs and keywords.

Host: GitHub
URL: https://github.com/sreejoy/crawlerfriend
Owner: Sreejoy
License: mit
Created: 2018-07-28T04:48:18.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2018-08-14T15:12:29.000Z (almost 7 years ago)
Last Synced: 2025-03-02T08:48:37.822Z (4 months ago)
Topics: crawler, python-crawler, python-scraper, python27, scrapper
Language: Python
Size: 16.6 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        ## CrawlerFriend

A light weight **Web Crawler** that supports **Python 2.7** which gives search results in HTML form or in

Dictionary form given URLs and Keywords. If you regularly visit a few websites and look for a few keywords

then this python package will automate the task for you and

return the result in a HTML file in your web browser.

### Installation

```

pip install CrawlerFriend

```

### How to use?

#### All Result in HTML

```

import CrawlerFriend

urls = ["http://www.goal.com/","http://www.skysports.com/football","https://www.bbc.com/sport/football"]

keywords = ["Ronaldo","Liverpool","Salah","Real Madrid","Arsenal","Chelsea","Man United","Man City"]

crawler = CrawlerFriend.Crawler(urls, keywords)

crawler.crawl()

crawler.get_result_in_html()

```

The above code will open the following HTML document in Browser

![](https://i.imgur.com/aPoNAYu.png)

#### All Result in Dictionary

```

result_dict = crawler.get_result()

```

#### Changing Default Arguments

CrawlerFriend uses four HTML tags 'title', 'h1', 'h2', 'h3' and max_link_limit = 50 by default for searching.

But it can be changed by passing arguments to the constructor:

 ```

crawler = CrawlerFriend.Crawler(urls, keywords, max_link_limit=200, tags=['p','h4'])

crawler.crawl()

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sreejoy/crawlerfriend

Awesome Lists containing this project

README