https://github.com/dineshsprabu/concurrent-web-crawler

Flexible and concurrent web crawler implemented in 'go'
https://github.com/dineshsprabu/concurrent-web-crawler

concurrent-web-crawler crawler go-crawler spider web-crawler

Last synced: 7 months ago
JSON representation

Flexible and concurrent web crawler implemented in 'go'

Host: GitHub
URL: https://github.com/dineshsprabu/concurrent-web-crawler
Owner: dineshsprabu
License: mit
Created: 2017-05-04T13:44:29.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2017-05-05T06:00:06.000Z (about 9 years ago)
Last Synced: 2024-06-20T03:34:48.633Z (about 2 years ago)
Topics: concurrent-web-crawler, crawler, go-crawler, spider, web-crawler
Language: Go
Size: 13.7 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Concurrent Web Crawler

Highly configurable crawler with powerful concurrency and better status logging.

[![GoDoc](https://godoc.org/github.com/dineshsprabu/concurrent-web-crawler?status.svg)](https://godoc.org/github.com/dineshsprabu/concurrent-web-crawler). [![Build Status](https://travis-ci.org/dineshsprabu/concurrent-web-crawler.svg?branch=master)](https://travis-ci.org/dineshsprabu/concurrent-web-crawler)

## Installation

```

go get github.com/dineshsprabu/concurrent-web-crawler

```

## Usage

```go

package main

import(

"github.com/dineshsprabu/concurrent-web-crawler"

)

func main(){

	// Creating a web crawler object with configurations.

	myCrawler := web.Crawler{ 

			MaxConcurrencyLimit: 2, 

			StoragePath: "crawler/storage", 

			CrawlDelay: 10,

		}

	// List of URLS to be crawled as a string array.

	urls := []string{ 

				"https://httpbin.org/ip", 

				"http://example.com", 

				"https://archive.org/details/opensource_movies",

			}

	// Starting the crawler by passing the list of URLs.

	myCrawler.Start(urls)

}

```

## Log

```

> go run crawler_sample.go 

2017/05/04 20:29:59 ||  [Processing] Spawning subroutines :  2

2017/05/04 20:29:59 ||  [Processing] Fetching page content :  https://archive.org/details/opensource_movies

2017/05/04 20:29:59 ||  [Processing] Fetching page content :  https://httpbin.org/ip

2017/05/04 20:30:01 ||  [Processing] Writing to the file :  crawler/ip.html

2017/05/04 20:30:01 ||  [Success] Crawled page :  https://httpbin.org/ip

2017/05/04 20:30:03 ||  [Processing] Writing to the file :  crawler/details/opensource_movies.html

2017/05/04 20:30:03 ||  [Success] Crawled page :  https://archive.org/details/opensource_movies

2017/05/04 20:30:11 ||  [Processing] Fetching page content :  http://example.com

2017/05/04 20:30:12 ||  [Processing] Writing to the file :  crawler/example.com/index.html

2017/05/04 20:30:12 ||  [Success] Crawled page :  http://example.com

2017/05/04 20:30:22 ||  [Status] Failed urls :  []

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dineshsprabu/concurrent-web-crawler

Awesome Lists containing this project

README