{"id":30664481,"url":"https://github.com/psatler/go-web-crawler","last_synced_at":"2026-04-28T23:36:17.714Z","repository":{"id":61628410,"uuid":"150356216","full_name":"psatler/go-web-crawler","owner":"psatler","description":"A simple web crawler written in Golang ","archived":false,"fork":false,"pushed_at":"2018-10-02T02:46:51.000Z","size":55,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-08-31T19:02:07.098Z","etag":null,"topics":["docker","docker-compose","go","golang","mysql","web-crawler"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/psatler.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-09-26T02:18:44.000Z","updated_at":"2025-02-27T13:00:48.000Z","dependencies_parsed_at":"2022-10-18T17:45:35.051Z","dependency_job_id":null,"html_url":"https://github.com/psatler/go-web-crawler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/psatler/go-web-crawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/psatler%2Fgo-web-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/psatler%2Fgo-web-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/psatler%2Fgo-web-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/psatler%2Fgo-web-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/psatler","download_url":"https://codeload.github.com/psatler/go-web-crawler/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/psatler%2Fgo-web-crawler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32404340,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-28T19:38:08.556Z","status":"ssl_error","status_checked_at":"2026-04-28T19:37:55.688Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","docker-compose","go","golang","mysql","web-crawler"],"created_at":"2025-08-31T19:01:53.644Z","updated_at":"2026-04-28T23:36:17.687Z","avatar_url":"https://github.com/psatler.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# A simple Go Web Crawler\n\n\u003e A go web crawler with MySQL persistency\n\nA simple go web crawler that finds the 10 most valuable companies according to the [Fundamentus](https://www.fundamentus.com.br/detalhes.php) website. It searches for stock name, company name, average daily rate of stock, and company's market value, storing them at a _MySQL_ dabatase.\n\n# How to Run\n\nYou can either run it locally via a shell script or in a container via Docker Compose. Also, you need to provide an **_.env_** file to access the database. An example is shown in [.env.example]().\n\n## Using a shell script and running locally\n\nDo the following:\n\n```\ngo get github.com/psatler/go-web-crawler\ncd go-web-crawler\nsh init-db.sh\n```\n\nThe script might ask you about _sudo_ passwords to make sure _MySQL_ service is up. Also, it opens the _MySQL_ db as a root user.\n\n## Using Docker Compose\n\nThe project also comes with a Docker Compose file to create containers for each service (go application and _MySQL_). So, to run using it, do:\n\n```\ngit clone https://github.com/psatler/go-web-crawler.git\ncd go-web-crawler\nsudo docker-compose up\n```\n\n**NOTE:** If the _MySQL_ service is up, it might rise some port conflicts/errors, so you might have to do `sudo service mysql stop` first to be able to run docker compose.\n\n# Main Dependencies\n\n- [GoQuery](https://godoc.org/github.com/PuerkitoBio/goquery): implements features similar to jQuery, including the chainable syntax, to manipulate and query an HTML document.\n- [Go-sql-driver](https://godoc.org/github.com/go-sql-driver/mysql): package mysql provides a MySQL driver for Go's database/sql package.\n- [StrConv](https://godoc.org/strconv): implements conversions to and from string representations of basic data types.\n- [Sort](https://godoc.org/sort#example-Slice): provides primitives for sorting slices and user-defined collections.\n\n# License\n\nThis project is licensed under the terms of the [MIT License](https://opensource.org/licenses/MIT) © Pablo Satler 2018\n\n# Acknowledgements\n\nThe app first searches a list of links to be queried afterwards. Then, it pulls some information of these links, like stock price, market value, etc. This second search (for details of each link) is the process which takes longer.\n\nThe first implementation, without any concurrency used, took about **9min30s** to **10min** to be completed. Then, another approach was dividing the slice into _go routines_, where each _go routine_ would take care of a part of the slice, appending the result to a final slice of structs. With that approach, the time spent dropped down to **4mins** ish.\n\nIt was used a _WaitGroup_. A WaitGroup waits for a collection of goroutines to finish. The main goroutine calls _Add_ to set the number of goroutines to wait for. Then each of the goroutines runs and calls _Done_ when finished. At the same time, _Wait_ can be used to block until all goroutines have finished.\n\n### Useful/Basic MySQL Database Commands\n\n```\nshow databases;\nuse \u003cDBName\u003e;\nshow tables;\ndescribe \u003cTableName\u003e;\ndrop database \u003cDBName\u003e;\ndrop table \u003cTableName\u003e;\nselect * from \u003cTableName\u003e;\n```\n\n### Opening MySQL db from inside the container\n\nIt was used the command below (as a root user), where `db-mysql-container` is the container name defined on Docker Compose file.\n\n```\nsudo docker exec -it db-mysql-container mysql -uroot -proot\n```\n\n### Load a SQL file using Docker Compose\n\nIt's done via volumes, as shown in the `docker-compose.yaml` file in this project and as shown [here](https://stackoverflow.com/questions/44533534/docker-how-to-use-sql-file-in-directory) and [here](https://gist.github.com/onjin/2dd3cc52ef79069de1faa2dfd456c945).\n\n```\ndb:\n     volumes:\n        /path-to-sql-files-on-your-host:/docker-entrypoint-initdb.d\n```\n\nthen run `docker-compose down -v` to destroy containers and volumes and run `docker-compose up` to recreate them.\n\n### Linking one container to another\n\nTo reach a service on another container, take [this docker tutorial](https://docs.docker.com/compose/networking/) as reference.\n\nFor example, in this project, _DB_HOST_ env var is defined as `db`, the name given to the mysql service. And _DB_PORT_ is set with the same number exposed in **ports** inside the mysql.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpsatler%2Fgo-web-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpsatler%2Fgo-web-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpsatler%2Fgo-web-crawler/lists"}