{"id":19700379,"url":"https://github.com/dominikrys/web-scraper","last_synced_at":"2026-04-14T02:31:59.135Z","repository":{"id":163930771,"uuid":"379981267","full_name":"dominikrys/web-scraper","owner":"dominikrys","description":"🎬 IMDB Web Scraper in Go","archived":false,"fork":false,"pushed_at":"2021-08-14T12:22:27.000Z","size":35,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-14T23:18:39.463Z","etag":null,"topics":["crawler","go","mongodb"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dominikrys.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2021-06-24T16:04:27.000Z","updated_at":"2022-03-10T21:10:12.000Z","dependencies_parsed_at":"2024-04-04T03:11:38.683Z","dependency_job_id":"cf4f08c1-baa7-47e6-9bc7-0129e51300ab","html_url":"https://github.com/dominikrys/web-scraper","commit_stats":null,"previous_names":["dominikrys/web-crawler"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dominikrys/web-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dominikrys%2Fweb-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dominikrys%2Fweb-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dominikrys%2Fweb-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dominikrys%2Fweb-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dominikrys","download_url":"https://codeload.github.com/dominikrys/web-scraper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dominikrys%2Fweb-scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31779943,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-14T02:24:21.117Z","status":"ssl_error","status_checked_at":"2026-04-14T02:24:20.627Z","response_time":153,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","go","mongodb"],"created_at":"2024-11-11T21:05:44.487Z","updated_at":"2026-04-14T02:31:59.117Z","avatar_url":"https://github.com/dominikrys.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# IMDB Web Crawler\n\n[![Build Status](https://img.shields.io/github/workflow/status/dominikrys/web-crawler/Continuous%20Integration?style=flat-square)](https://github.com/dominikrys/web-crawler/actions)\n\nWeb crawler for fetching information of people born on a specified day from [IMDB](https://www.imdb.com/), which is written to a MongoDB database. Information from the most popular x profiles if fetched. The crawler part is based off [Michael Okoko's blog post](https://blog.logrocket.com/web-scraping-with-go-and-colly/).\n\nNote that there is rate limiting in place as the client may be blocked if too many requests are sent.\n\nThe aim of this project was to learn about Go and web scraping/crawling.\n\n## Demo\n\n[![asciicast](https://asciinema.org/a/422531.svg)](https://asciinema.org/a/422531)\n\n## Build and Run Instructions\n\nMake sure [Go](https://golang.org/) is installed.\n\nTo compile, run:\n\n```bash\ngo build ./crawler.go\n```\n\nBefore running the program, run a [MongoDB](https://www.mongodb.com/) instance on port `27017`. This can be easily done using [Docker](https://www.docker.com/):\n\n```bash\ndocker run --name mongo -p 27017:27017 -d mongo:4.4.6\n```\n\nNote that if MongoDB is not running the crawler will still work, but writing to MongoDB will be disabled. The crawler will write to the `profiles` collection in the `crawler` database. These will be created by the crawler if they don't already exist.\n\nThen, run the crawler:\n\n```bash\n./crawler.go --day \u003cday of birthday\u003e --month \u003cmonth of birthday\u003e [--profileNo \u003cnumber of profiles to fetch\u003e] [--mongoUri \u003cMongoDB URI\u003e]\n```\n\nAlternatively, for development, `go run` can be used:\n\n```bash\ngo run . --day \u003cday of birthday\u003e --month \u003cmonth of birthday\u003e\n```\n\nTo get more help on how to run the program and to check the program defaults, run:\n\n```bash\n./crawler --help\n```\n\n## Running tests\n\nMake sure you have a MongoDB instance running as described above. Then, run:\n\n```bash\ngo test\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdominikrys%2Fweb-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdominikrys%2Fweb-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdominikrys%2Fweb-scraper/lists"}