Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gocolly/colly
Elegant Scraper and Crawler Framework for Golang
https://github.com/gocolly/colly
crawler crawling framework go golang scraper scraping spider
Last synced: 6 days ago
JSON representation
Elegant Scraper and Crawler Framework for Golang
- Host: GitHub
- URL: https://github.com/gocolly/colly
- Owner: gocolly
- License: apache-2.0
- Created: 2017-09-29T14:08:49.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2024-04-19T12:20:14.000Z (9 months ago)
- Last Synced: 2024-04-22T22:16:11.187Z (9 months ago)
- Topics: crawler, crawling, framework, go, golang, scraper, scraping, spider
- Language: Go
- Homepage: https://go-colly.org/
- Size: 8.11 MB
- Stars: 22,149
- Watchers: 327
- Forks: 1,707
- Open Issues: 183
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
Awesome Lists containing this project
- awesome-go - gocolly/colly
- awesome-rainmana - gocolly/colly - Elegant Scraper and Crawler Framework for Golang (Go)
- awesome-repositories - gocolly/colly - Elegant Scraper and Crawler Framework for Golang (Go)
- awesome-starred - colly - Elegant Scraper and Crawler Framework for Golang (Go)
- awesome-go - Colly - Lightning Fast and Elegant Scraping Framework for Gophers. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving. (Applications)
- awesome-starts - gocolly/colly - Elegant Scraper and Crawler Framework for Golang (Go)
- go-awesome - Colly - web crawler framework (Open source library / Crawlers)
- awesome-list - colly
- awesome-golang-repositories - colly
- awesome-go-quant - colly - Elegant Scraper and Crawler Framework for Golang (Golang / Scraper)
- awesome-tools - colly - Elegant Scraper and Crawler Framework for Golang (Uncategorized / Uncategorized)
- my-awesome - gocolly / colly - Elegant Scraper and Crawler Framework for Golang (Library)
- awesome-go-extra - colly - 09-29T14:08:49Z|2022-08-18T21:24:23Z| (Bot Building / Scrapers)
- my-awesome - gocolly/colly - 07 star:23.5k fork:1.8k Elegant Scraper and Crawler Framework for Golang (Go)
- StarryDivineSky - gocolly/colly
- awesome - gocolly/colly - Elegant Scraper and Crawler Framework for Golang (Go)
- awesome - gocolly/colly - Elegant Scraper and Crawler Framework for Golang (Go)
README
# Colly
Lightning Fast and Elegant Scraping Framework for Gophers
Colly provides a clean interface to write any kind of crawler/scraper/spider.
With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.
[![GoDoc](https://godoc.org/github.com/gocolly/colly?status.svg)](https://pkg.go.dev/github.com/gocolly/colly/v2)
[![Backers on Open Collective](https://opencollective.com/colly/backers/badge.svg)](#backers) [![Sponsors on Open Collective](https://opencollective.com/colly/sponsors/badge.svg)](#sponsors) [![build status](https://github.com/gocolly/colly/actions/workflows/ci.yml/badge.svg)](https://github.com/gocolly/colly/actions/workflows/ci.yml)
[![report card](https://img.shields.io/badge/report%20card-a%2B-ff3333.svg?style=flat-square)](http://goreportcard.com/report/gocolly/colly)
[![view examples](https://img.shields.io/badge/learn%20by-examples-0077b3.svg?style=flat-square)](https://github.com/gocolly/colly/tree/master/_examples)
[![Code Coverage](https://img.shields.io/codecov/c/github/gocolly/colly/master.svg)](https://codecov.io/github/gocolly/colly?branch=master)
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Fgocolly%2Fcolly.svg?type=shield)](https://app.fossa.io/projects/git%2Bgithub.com%2Fgocolly%2Fcolly?ref=badge_shield)
[![Twitter URL](https://img.shields.io/badge/twitter-follow-green.svg)](https://twitter.com/gocolly)------
## Sponsors
[Scrapfly](https://scrapfly.io/?utm_source=Github&utm_medium=repo&utm_campaign=colly)
is an enterprise-grade solution providing Web Scraping API that aims to simplify the
scraping process by managing everything: real browser rendering, rotating proxies, and
fingerprints (TLS, HTTP, browser) to bypass all major anti-bots. Scrapfly also unlocks the
observability by providing an analytical dashboard and measuring the success rate/block
rate in detail.------
## Features
- Clean API
- Fast (>1k request/sec on a single core)
- Manages request delays and maximum concurrency per domain
- Automatic cookie and session handling
- Sync/async/parallel scraping
- Caching
- Automatic encoding of non-unicode responses
- Robots.txt support
- Distributed scraping
- Configuration via environment variables
- Extensions## Example
```go
func main() {
c := colly.NewCollector()// Find and visit all links
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
e.Request.Visit(e.Attr("href"))
})c.OnRequest(func(r *colly.Request) {
fmt.Println("Visiting", r.URL)
})c.Visit("http://go-colly.org/")
}
```See [examples folder](https://github.com/gocolly/colly/tree/master/_examples) for more detailed examples.
## Installation
Add colly to your `go.mod` file:
```
module github.com/x/ygo 1.14
require (
github.com/gocolly/colly/v2 latest
)
```## Bugs
Bugs or suggestions? Visit the [issue tracker](https://github.com/gocolly/colly/issues) or join `#colly` on freenode
## Other Projects Using Colly
Below is a list of public, open source projects that use Colly:
- [greenpeace/check-my-pages](https://github.com/greenpeace/check-my-pages) Scraping script to test the Spanish Greenpeace web archive.
- [altsab/gowap](https://github.com/altsab/gowap) Wappalyzer implementation in Go.
- [jesuiscamille/goquotes](https://github.com/jesuiscamille/goquotes) A quotes scraper, making your day a little better!
- [jivesearch/jivesearch](https://github.com/jivesearch/jivesearch) A search engine that doesn't track you.
- [Leagify/colly-draft-prospects](https://github.com/Leagify/colly-draft-prospects) A scraper for future NFL Draft prospects.
- [lucasepe/go-ps4](https://github.com/lucasepe/go-ps4) Search playstation store for your favorite PS4 games using the command line.
- [yringler/inside-chassidus-scraper](https://github.com/yringler/inside-chassidus-scraper) Scrapes Rabbi Paltiel's web site for lesson metadata.
- [gamedb/gamedb](https://github.com/gamedb/gamedb) A database of Steam games.
- [lawzava/scrape](https://github.com/lawzava/scrape) CLI for email scraping from any website.
- [eureka101v/WeiboSpiderGo](https://github.com/eureka101v/WeiboSpiderGo) A sina weibo(chinese twitter) scraper
- [Go-phie/gophie](https://github.com/Go-phie/gophie) Search, Download and Stream movies from your terminal
- [imthaghost/goclone](https://github.com/imthaghost/goclone) Clone websites to your computer within seconds.
- [superiss/spidy](https://github.com/superiss/spidy) Crawl the web and collect expired domains.
- [docker-slim/docker-slim](https://github.com/docker-slim/docker-slim) Optimize your Docker containers to make them smaller and better.
- [seversky/gachifinder](https://github.com/seversky/gachifinder) an agent for asynchronous scraping, parsing and writing to some storages(elasticsearch for now)
- [eval-exec/goodreads](https://github.com/eval-exec/goodreads) crawl all tags and all pages of quotes from goodreads.If you are using Colly in a project please send a pull request to add it to the list.
## Contributors
This project exists thanks to all the people who contribute. [[Contribute]](CONTRIBUTING.md).
## Backers
Thank you to all our backers! 🙏 [[Become a backer](https://opencollective.com/colly#backer)]
## Sponsors
Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [[Become a sponsor](https://opencollective.com/colly#sponsor)]
## License
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Fgocolly%2Fcolly.svg?type=large)](https://app.fossa.io/projects/git%2Bgithub.com%2Fgocolly%2Fcolly?ref=badge_large)