{"id":23505058,"url":"https://github.com/hellokaton/elves","last_synced_at":"2025-04-09T19:18:24.659Z","repository":{"id":57732607,"uuid":"117105494","full_name":"hellokaton/elves","owner":"hellokaton","description":"🎊 Design and implement of lightweight crawler framework.","archived":false,"fork":false,"pushed_at":"2018-01-24T09:22:37.000Z","size":557,"stargazers_count":316,"open_issues_count":2,"forks_count":86,"subscribers_count":22,"default_branch":"master","last_synced_at":"2025-04-09T19:18:19.695Z","etag":null,"topics":["163news","douban-movie","elves","scrapy","spider"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hellokaton.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-01-11T13:41:16.000Z","updated_at":"2025-01-09T12:02:33.000Z","dependencies_parsed_at":"2022-09-13T15:00:30.940Z","dependency_job_id":null,"html_url":"https://github.com/hellokaton/elves","commit_stats":null,"previous_names":["biezhi/elves"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hellokaton%2Felves","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hellokaton%2Felves/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hellokaton%2Felves/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hellokaton%2Felves/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hellokaton","download_url":"https://codeload.github.com/hellokaton/elves/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248094989,"owners_count":21046770,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["163news","douban-movie","elves","scrapy","spider"],"created_at":"2024-12-25T09:11:27.819Z","updated_at":"2025-04-09T19:18:24.622Z","avatar_url":"https://github.com/hellokaton.png","language":"Java","readme":"# Elves\n\n一个轻量级的爬虫框架设计与实现，[博文分析](https://blog.biezhi.me/2018/01/design-and-implement-a-crawler-framework.html)。\n\n[![](https://img.shields.io/travis/biezhi/elves.svg)](https://travis-ci.org/biezhi/elves)\n[![](https://img.shields.io/maven-central/v/io.github.biezhi/elves.svg)](https://mvnrepository.com/artifact/io.github.biezhi/elves)\n[![@biezhi on zhihu](https://img.shields.io/badge/zhihu-%40biezhi-red.svg)](https://www.zhihu.com/people/biezhi)\n[![](https://img.shields.io/badge/license-MIT-FF0080.svg)](https://github.com/biezhi/elves/blob/master/LICENSE)\n[![](https://img.shields.io/github/followers/biezhi.svg?style=social\u0026label=Follow%20Me)](https://github.com/biezhi)\n\n## 特性\n\n- 事件驱动\n- 易于定制\n- 多线程执行\n- `CSS` 选择器和 `XPath` 支持\n\n**Maven** 坐标\n\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003eio.github.biezhi\u003c/groupId\u003e\n    \u003cartifactId\u003eelves\u003c/artifactId\u003e\n    \u003cversion\u003e0.0.2\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n如果你想在本地运行这个项目源码，请确保你是 `Java8` 环境并且安装了 [lombok](https://projectlombok.org/) 插件。\n\n## 架构图\n\n\u003cimg src=\"docs/static/elves.png\" width=\"60%\"/\u003e\n\n## 调用流程图\n\n\u003cimg src=\"docs/static/dispatch.png\" width=\"90%\"/\u003e\n\n## 快速上手\n\n搭建一个爬虫程序需要进行这么几步操作\n\n1. 编写一个爬虫类继承自 `Spider`\n2. 设置要抓取的 URL 列表\n3. 实现 `Spider` 的 `parse` 方法\n4. 添加 `Pipeline` 处理 `parse` 过滤后的数据\n\n举个栗子:\n\n```java\npublic class DoubanSpider extends Spider {\n\n    public DoubanSpider(String name) {\n        super(name);\n        this.startUrls(\n            \"https://movie.douban.com/tag/爱情\",\n            \"https://movie.douban.com/tag/喜剧\",\n            \"https://movie.douban.com/tag/动画\",\n            \"https://movie.douban.com/tag/动作\",\n            \"https://movie.douban.com/tag/史诗\",\n            \"https://movie.douban.com/tag/犯罪\");\n    }\n\n    @Override\n    public void onStart(Config config) {\n        this.addPipeline((Pipeline\u003cList\u003cString\u003e\u003e) (item, request) -\u003e log.info(\"保存到文件: {}\", item));\n    }\n\n    public Result parse(Response response) {\n        Result\u003cList\u003cString\u003e\u003e result   = new Result\u003c\u003e();\n        Elements             elements = response.body().css(\"#content table .pl2 a\");\n\n        List\u003cString\u003e titles = elements.stream().map(Element::text).collect(Collectors.toList());\n        result.setItem(titles);\n\n        // 获取下一页 URL\n        Elements nextEl = response.body().css(\"#content \u003e div \u003e div.article \u003e div.paginator \u003e span.next \u003e a\");\n        if (null != nextEl \u0026\u0026 nextEl.size() \u003e 0) {\n            String  nextPageUrl = nextEl.get(0).attr(\"href\");\n            Request nextReq     = this.makeRequest(nextPageUrl, this::parse);\n            result.addRequest(nextReq);\n        }\n        return result;\n    }\n\n}\n\npublic static void main(String[] args) {\n    DoubanSpider doubanSpider = new DoubanSpider(\"豆瓣电影\");\n    Elves.me(doubanSpider, Config.me()).start();\n}\n```\n\n## 爬虫例子\n\n- [豆瓣电影](https://github.com/biezhi/elves/blob/master/src/test/java/io/github/biezhi/elves/examples/DoubanExample.java)\n- [网易新闻](https://github.com/biezhi/elves/blob/master/src/test/java/io/github/biezhi/elves/examples/News163Example.java)\n- [糗事百科](https://github.com/biezhi/elves/blob/master/src/test/java/io/github/biezhi/elves/examples/QiubaiExample.java)\n- [妹。。。妹子图](https://github.com/biezhi/elves/blob/master/src/test/java/io/github/biezhi/elves/examples/MeiziExample.java)\n\n## 开源协议\n\n[MIT](https://github.com/biezhi/elves/blob/master/LICENSE)","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhellokaton%2Felves","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhellokaton%2Felves","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhellokaton%2Felves/lists"}