{"id":13464680,"url":"https://github.com/dyweb/scrala","last_synced_at":"2025-05-07T13:40:40.469Z","repository":{"id":71929471,"uuid":"45529010","full_name":"dyweb/scrala","owner":"dyweb","description":"Unmaintained :whale: :coffee: :spider: Scala crawler(spider) framework, inspired by scrapy, created by @gaocegege","archived":false,"fork":false,"pushed_at":"2019-10-05T15:36:58.000Z","size":85,"stargazers_count":113,"open_issues_count":6,"forks_count":23,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-04-19T22:35:08.965Z","etag":null,"topics":["actor-model","docker","scala","scrapy","spider"],"latest_commit_sha":null,"homepage":"http://dongyueweb.com/scrala/","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dyweb.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2015-11-04T09:37:40.000Z","updated_at":"2024-09-21T12:53:40.000Z","dependencies_parsed_at":"2023-06-05T11:31:11.137Z","dependency_job_id":null,"html_url":"https://github.com/dyweb/scrala","commit_stats":null,"previous_names":["gaocegege/scrala"],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dyweb%2Fscrala","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dyweb%2Fscrala/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dyweb%2Fscrala/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dyweb%2Fscrala/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dyweb","download_url":"https://codeload.github.com/dyweb/scrala/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252888840,"owners_count":21820075,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["actor-model","docker","scala","scrapy","spider"],"created_at":"2024-07-31T14:00:48.546Z","updated_at":"2025-05-07T13:40:40.442Z","avatar_url":"https://github.com/dyweb.png","language":"Scala","funding_links":[],"categories":["All"],"sub_categories":[],"readme":"# scrala\n\n[![Codacy Badge](https://api.codacy.com/project/badge/grade/563bbcd12d874610bca7313abe6e6fdd)](https://www.codacy.com/app/gaocegege/scrala)\n[![Build Status](https://travis-ci.org/gaocegege/scrala.svg?branch=master)](https://travis-ci.org/gaocegege/scrala)\n![License](https://img.shields.io/pypi/l/Django.svg)\n[![scrala published](https://jitpack.io/v/gaocegege/scrala.svg)](https://jitpack.io/#gaocegege/scrala)\n[![Docker Pulls](https://img.shields.io/docker/pulls/gaocegege/scrala.svg)](https://hub.docker.com/r/gaocegege/scrala/)\n[![Join the chat at https://gitter.im/gaocegege/scrala](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/gaocegege/scrala?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n\nscrala is a web crawling framework for scala, which is inspired by [scrapy](https://github.com/scrapy/scrapy).\n\n## Installation\n\n### From Docker\n\n[![](https://images.microbadger.com/badges/image/gaocegege/scrala.svg)](https://microbadger.com/images/gaocegege/scrala \"Get your own image badge on microbadger.com\")\n\n[gaocegege/scrala in dockerhub](https://hub.docker.com/r/gaocegege/scrala/)\n\n#### Create a Dockerfile in your project.\n\n```\nFROM gaocegege/scrala:latest\n\n// COPY the build.sbt and the src to the container\n```\n\n#### Run a single command in docker\n\n```\ndocker run -v \u003cyour src\u003e:/app/src -v \u003cyour ivy2 directory\u003e:/root/.ivy2  gaocegege/scrala\n```\n\n### From SBT\n\n**Step 1.** Add it in your build.sbt at the end of resolvers:\n\n\tresolvers += \"jitpack\" at \"https://jitpack.io\"\n\n**Step 2.** Add the dependency\n\n\tlibraryDependencies += \"com.github.gaocegege\" % \"scrala\" % \"0.1.5\"\n\n### From Source Code\n\n\tgit clone https://github.com/gaocegege/scrala.git\n\tcd ./scrala\n\tsbt assembly\n\nYou will get the jar in `./target/scala-\u003cversion\u003e/`.\n\n## Example\n\n\timport com.gaocegege.scrala.core.spider.impl.DefaultSpider\n\timport com.gaocegege.scrala.core.common.response.Response\n\timport java.io.BufferedReader\n\timport java.io.InputStreamReader\n\timport com.gaocegege.scrala.core.common.response.impl.HttpResponse\n\timport com.gaocegege.scrala.core.common.response.impl.HttpResponse\n\n\tclass TestSpider extends DefaultSpider {\n\t  def startUrl = List[String](\"http://www.gaocegege.com/resume\")\n\n\t  def parse(response: HttpResponse): Unit = {\n\t    val links = (response getContentParser) select (\"a\")\n\t    for (i \u003c- 0 to links.size() - 1) {\n\t      request(((links get (i)) attr (\"href\")), printIt)\n\t    }\n\t  }\n\n\t  def printIt(response: HttpResponse): Unit = {\n\t    println((response getContentParser) title)\n\t  }\n\t}\n\n\tobject Main {\n\t  def main(args: Array[String]) {\n\t    val test = new TestSpider\n\t    test begin\n\t  }\n\t}\n\n\nJust like the scrapy, what you need to do is define a `startUrl` to tell me where to start, and override `parse(...)` to parse the response of the startUrl. And `request(...)` function is like `yield scrapy.Request(...)` in scrapy.\n\nYou can get the example project in the `./example/`\n\n## For Developer\n\nscrala is under active development, feel free to contribute documentation, test cases, pull requests, issues, and anything you want. I'm a newcomer to scala so the code is hard to read. I'm glad to see someone familiar with scala coding standards could do some code reviews for the repo :)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdyweb%2Fscrala","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdyweb%2Fscrala","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdyweb%2Fscrala/lists"}