https://github.com/dominikrys/web-scraper

🎬 IMDB Web Scraper in Go
https://github.com/dominikrys/web-scraper

crawler go mongodb

Last synced: 9 months ago
JSON representation

🎬 IMDB Web Scraper in Go

Host: GitHub
URL: https://github.com/dominikrys/web-scraper
Owner: dominikrys
Created: 2021-06-24T16:04:27.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2021-08-14T12:22:27.000Z (over 4 years ago)
Last Synced: 2025-01-10T10:46:58.982Z (10 months ago)
Topics: crawler, go, mongodb
Language: Go
Homepage:
Size: 34.2 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# IMDB Web Crawler

[![Build Status](https://img.shields.io/github/workflow/status/dominikrys/web-crawler/Continuous%20Integration?style=flat-square)](https://github.com/dominikrys/web-crawler/actions)

Web crawler for fetching information of people born on a specified day from [IMDB](https://www.imdb.com/), which is written to a MongoDB database. Information from the most popular x profiles if fetched. The crawler part is based off [Michael Okoko's blog post](https://blog.logrocket.com/web-scraping-with-go-and-colly/).

Note that there is rate limiting in place as the client may be blocked if too many requests are sent.

The aim of this project was to learn about Go and web scraping/crawling.

## Demo

[![asciicast](https://asciinema.org/a/422531.svg)](https://asciinema.org/a/422531)

## Build and Run Instructions

Make sure [Go](https://golang.org/) is installed.

To compile, run:

```bash
go build ./crawler.go
```

Before running the program, run a [MongoDB](https://www.mongodb.com/) instance on port `27017`. This can be easily done using [Docker](https://www.docker.com/):

```bash
docker run --name mongo -p 27017:27017 -d mongo:4.4.6
```

Note that if MongoDB is not running the crawler will still work, but writing to MongoDB will be disabled. The crawler will write to the `profiles` collection in the `crawler` database. These will be created by the crawler if they don't already exist.

Then, run the crawler:

```bash
./crawler.go --day --month [--profileNo ] [--mongoUri ]
```

Alternatively, for development, `go run` can be used:

```bash
go run . --day --month
```

To get more help on how to run the program and to check the program defaults, run:

```bash
./crawler --help
```

## Running tests

Make sure you have a MongoDB instance running as described above. Then, run:

```bash
go test
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dominikrys/web-scraper

Awesome Lists containing this project

README