{"id":23959616,"url":"https://github.com/otobrglez/get37","last_synced_at":"2025-06-20T17:36:46.950Z","repository":{"id":61847693,"uuid":"538972692","full_name":"otobrglez/get37","owner":"otobrglez","description":"get37 🪠 is a Scala / ZIO based web scraper/spider","archived":false,"fork":false,"pushed_at":"2022-10-04T07:49:28.000Z","size":34493,"stargazers_count":14,"open_issues_count":1,"forks_count":1,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-13T08:14:37.787Z","etag":null,"topics":["scala","zio"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/otobrglez.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-09-20T12:17:05.000Z","updated_at":"2025-02-22T02:31:57.000Z","dependencies_parsed_at":"2022-10-22T09:00:26.088Z","dependency_job_id":null,"html_url":"https://github.com/otobrglez/get37","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/otobrglez/get37","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/otobrglez%2Fget37","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/otobrglez%2Fget37/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/otobrglez%2Fget37/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/otobrglez%2Fget37/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/otobrglez","download_url":"https://codeload.github.com/otobrglez/get37/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/otobrglez%2Fget37/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260988724,"owners_count":23093551,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["scala","zio"],"created_at":"2025-01-06T18:49:48.895Z","updated_at":"2025-06-20T17:36:41.909Z","avatar_url":"https://github.com/otobrglez.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# get37 🪠\n\n[get37] is a [Scala] / [ZIO] based web scraper/spider built as part\nof technical assignment at [13|37][1337].\n\n\u003cdiv align=\"center\"\u003e\n\n![get37 in action](https://github.com/otobrglez/get37/blob/master/get37.gif)\n\n\u003c/div\u003e\n\n[![CircleCI](https://dl.circleci.com/status-badge/img/gh/otobrglez/get37/tree/master.svg?style=shield\u0026circle-token=05d2aaa7bab5bf7af48f31089663c8ec1c220883)](https://dl.circleci.com/status-badge/redirect/gh/otobrglez/get37/tree/master)\n\n## 🏃‍♂️ Usage\n\nAfter the project is [assembled (instructions)](#-development) into \"über-JAR\", you can simply use it like this:\n\n```bash\n$ java -jar target/*/get37.jar https://tretton37.com\n$ java -jar target/*/get37.jar --maxFibers 10 --preFetchDelay 70 --maxDepth 4 https://zio.dev\n$ java -jar target/*/get37.jar --help # for more help\n```\n\n[get37] currently supports three configuration flags that can be passed along when the tool is started.\n\n- `maxFibers`, set to `10` by default tells the ZIO runtime how many [concurrent fibers](https://blog.rockthejvm.com/zio-fibers/) can be used when sub-requests are beeing made.\n- `preFetchDelay`, set to `10` milliseconds by defaul, adds a time delay before the sub-sequential requests are made.\n- `maxDepth`, set to `3` by default will serve as hard-limit when the spider tries to go deeper into the sites structure.\n\n## 🏗 Development\n\nThis project uses [Nix Shell (shell.nix)](./shell.nix) for project dependencies management. JDK and SBT are only dependencies.\n\n```bash\n$ sbt \"run https://tretton37.com\"\n```\n\nTo build \"über-JAR\" this project uses [sbt-assembly](https://github.com/sbt/sbt-assembly) and [sbt-native-packager](https://github.com/sbt/sbt-native-packager) plugins.\n\n```bash\n$ sbt assembly\n$ java -jar target/*/get37.jar\n```\n\n## Testing\n\nThis project also comes with [tests](src/test) that can be invoked with `SBT` and [CircleCI setup](https://app.circleci.com/pipelines/github/otobrglez/get37?branch=master).\n\n```bash\n$ sbt test\n```\n\n## Dependencies\n\n- [zio](https://zio.dev) - High-performance, type-safe, composable asynchronous and concurrent programming library and framework for Scala.\n- [zio-cli](https://github.com/zio/zio-cli) - Powerful command-line applications framework for ZIO.\n- [zio-http (ex-zhttp)](https://github.com/zio/zio-http) - A scala library for building HTTP apps. It is powered by [ZIO](https://zio.dev) and [Netty](https://netty.io/) and aims at being the defacto solution for writing, highly scalable and performant web applications using idiomatic Scala.\n- [jsoup](https://jsoup.org/) - is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. Although in this project is only used for content/link extraction.\n- [os-lib](https://github.com/com-lihaoyi/os-lib) - a simple, flexible, high-performance Scala interface to common OS filesystem and subprocess APIs\n\n\n## Resources\n\n- [Experimenting with recursion and ZIO](https://blog.knoldus.com/experimenting-with-recursion-and-zio/)\n- [5 lessons learned from my continuing awesome journey with ZIO](https://medium.com/wix-engineering/5-lessons-learned-from-my-continuing-awesome-journey-with-zio-66319d12ed7c)\n- [Guy Rutenberg - Make Offline Mirror of a Site using `wget`](https://www.guyrutenberg.com/2014/05/02/make-offline-mirror-of-a-site-using-wget/)\n- [EFF - Mirroring your site](https://www.eff.org/keeping-your-site-alive/mirroring-your-site)\n- [Aiswarya Prakasan - 10 minute command line apps with ZIO CLI](https://www.slideshare.net/AiswaryaPrakasan/10-minute-command-line-apps-with-zio-cli)\n\n## Author\n\n[Oto Brglez](https://github.com/otobrglez)\n\n![Twitter Follow](https://img.shields.io/twitter/follow/otobrglez?style=social)\n\n[scala]: https://www.scala-lang.org/\n\n[zio]: https://zio.dev/\n\n[get37]: https://github.com/otobrglez/get37\n\n[1337]: https://1337.tech/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fotobrglez%2Fget37","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fotobrglez%2Fget37","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fotobrglez%2Fget37/lists"}