{"id":41176746,"url":"https://github.com/xiaoyvyv/androidcrawlerengine","last_synced_at":"2026-01-22T19:56:56.357Z","repository":{"id":207322137,"uuid":"718967931","full_name":"xiaoyvyv/AndroidCrawlerEngine","owner":"xiaoyvyv","description":"A dynamic crawler plug-in for the Android platform based on Dex dynamic loading, which can dynamically load and execute the dex plug-in package, and can realize real-time updates of crawler and other functions.","archived":false,"fork":false,"pushed_at":"2023-11-15T08:24:32.000Z","size":958,"stargazers_count":3,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-01-28T14:40:09.444Z","etag":null,"topics":["android","apk","class","crawler","dex","dynamic","execute","java","jsoup","jvm","kotlin","module","okhttp","pak","plugin","reflection","scrapy","spider","web","webmagic"],"latest_commit_sha":null,"homepage":"","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xiaoyvyv.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-11-15T06:53:41.000Z","updated_at":"2023-11-24T14:21:31.000Z","dependencies_parsed_at":null,"dependency_job_id":"b315e432-02f0-4074-9322-b9715e330074","html_url":"https://github.com/xiaoyvyv/AndroidCrawlerEngine","commit_stats":{"total_commits":10,"total_committers":2,"mean_commits":5.0,"dds":0.09999999999999998,"last_synced_commit":"f63593362355f649df1dbda587416468896a3888"},"previous_names":["xiaoyvyv/androidcrawlerengine"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/xiaoyvyv/AndroidCrawlerEngine","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xiaoyvyv%2FAndroidCrawlerEngine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xiaoyvyv%2FAndroidCrawlerEngine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xiaoyvyv%2FAndroidCrawlerEngine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xiaoyvyv%2FAndroidCrawlerEngine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xiaoyvyv","download_url":"https://codeload.github.com/xiaoyvyv/AndroidCrawlerEngine/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xiaoyvyv%2FAndroidCrawlerEngine/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28669981,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-22T19:36:09.361Z","status":"ssl_error","status_checked_at":"2026-01-22T19:36:05.567Z","response_time":144,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["android","apk","class","crawler","dex","dynamic","execute","java","jsoup","jvm","kotlin","module","okhttp","pak","plugin","reflection","scrapy","spider","web","webmagic"],"created_at":"2026-01-22T19:56:55.714Z","updated_at":"2026-01-22T19:56:56.352Z","avatar_url":"https://github.com/xiaoyvyv.png","language":"Kotlin","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AndroidCrawlerEngine\n\nA dynamic crawler plug-in for the Android platform based on Dex dynamic loading, which can dynamically load and execute the dex plug-in package, and can realize real-time updates of crawler and other functions.\n\n## 它能做什么？\n\n动态加载一个爬虫插件包，然后执行爬虫插件包的相关方法，输出结果。一般常用于爬虫程序，需要频繁更新爬虫的相关逻辑等的插件包，而不需要重新更新整个程序，可以看做一个小而美的插件库。\n\n\u003e 一些收罗互联网资源的小说、漫画、视频、文章等爬虫App的逻辑基本上是存放在服务端的，这样容易导致一个问题就是全部由服务端去爬取IP容易被封，可以把这些爬虫爬取网页的逻辑全部写在一个插件包内，打包为一个插件，通过插件版本管理动态更新，就不需要每次爬取的目标网站解析逻辑发生变化而直接更新App了，同时爬虫是在客户端运行的，能减少服务端的压力，也规避的服务端IP容易被封的问题。\n\n当然热门的热修复框架或App插件框架也能做到这种，但是都太复杂了，我的需求是仅能动态执行些编译好的逻辑代码，输出想要的内容且能动态下发更新，插件包不需要UI，所以就写了这个小而美的爬虫引擎。\n\n## 怎么引入？\n\n你的项目想获取这个动态加载爬虫插件包的能力很简单，仅添加一行依赖就能接入了。\n\n```kotlin\nimplementation(project(\":crawler-core\"))\n```\n\n## 怎么使用？\n1. 加载一个爬虫包（xxx.pak），一个爬虫包内有一个清单文件和多个爬虫类，每个爬虫类还可以有多个方法。\n   \n   ```kotlin\n   // 爬虫包文件可读路径\n   val crawlerPakPath = \"/xx/xx/xx.pak\"\n   // 加载爬虫包\n   val handle = CrawlerEngine.loadCrawlerPak(crawlerPakPath)\n   ```\n3. 获取爬虫包内的某个爬虫类\n   \n   ```kotlin\n   // 获取名为 `webCralwer` 的爬虫类\n   val webCralwer = handle.getCrawlerByName(\"webCralwer\")\n   ```\n5. 执行爬虫类的某个方法\n   \n   ```kotlin\n   /**\n    * 调用指定的 [CrawlerMethod] 方法\n    *\n    * @param functionName 清单配置的方法名称\n    * @param functionParamValues 方法的入参，类型需要和清单配置的方法参数的 Types 匹配，否则无法调用\n    * @param functionAccept 根据名称匹配到多个时，选择需要调用的那一个\n    * @param callback 方法执行完成返回的内容，Void 时会返回 null\n    */\n   fun call(\n       functionName: String,\n       functionParamValues: List\u003cAny?\u003e,\n       functionAccept: (List\u003cCrawlerFunction\u003e) -\u003e CrawlerFunction? = { it.firstOrNull() },\n       callback: (Any?) -\u003e Unit\n   )\n\n   // 示例\n   webCralwer.call(\"httpTest\", listOf(\"https://www.baidu.com\")) { result -\u003e\n       // 打印方法返回的结果\n       Log.i(\"CrawlerResult\",result.toString())\n   }\n   ```\n   是不是很简单？\n\n## 关于爬虫插件包？\n\n爬虫插件包结果也很简单\n\n- 一个 `manifest.json` 清单文件\n\n  清单文件记录了爬虫逻辑文件内的爬虫相关的类列表，和爬虫类相关的可执行方法以及方法的参数类型等。\n  这些都是爬虫插件包开发自动生成的，使用时通过 `handle.manifest` 直接获取到这个清单文件模型，然后 `getCrawlerByName` `call` 等方法的爬虫类名，以及调用的方法名等都要从清单文件读取。\n  结构如下：\n  \n  ```json\n  {\n     \"packageId\": \"com.xiaoyv.crawler.test\",\n     \"name\": \"plugin.pak\",\n     \"description\": \"This is a collection of crawler packages\",\n     \"author\": \"why\",\n     \"versionCode\": 1,\n     \"versionName\": \"1.0\",\n     \"createTime\": 1700030526874,\n     \"updateTime\": 1700030526874,\n     \"crawlers\": [\n       {\n         \"name\": \"webCrawler\",\n         \"className\": \"com.xiaoyv.crawler.test.WebTestCrawler\",\n         \"description\": \"WebTestCrawler\",\n         \"version\": 1,\n         \"function\": [\n           {\n             \"name\": \"httpTest\",\n             \"description\": \"http test\",\n             \"paramType\": [\n               \"java.lang.String\"\n             ]\n           }\n         ]\n       },\n       {\n         \"name\": \"otherCrawler\",\n         \"className\": \"com.xiaoyv.crawler.test.OtherCrawler\",\n         \"description\": \"OtherCrawler\",\n         \"version\": 1,\n         \"function\": [\n           {\n             \"name\": \"otherTest\",\n             \"description\": \"xxx\",\n             \"paramType\": [\n               \"java.lang.String\"\n             ]\n           }\n         ]\n       }\n     ]\n  }\n  ```\n  \n- 一个 `classes.dex` 爬虫逻辑的 `dex` 文件\n\n  见爬虫插件开发部分\n\n## 如何开发一个爬虫插件包？\n开发一个爬虫插件包也很简单，引入一个 `Gradle plugin` 即可\n\n1. 在项目级的 `build.gradle.kts` 内添加依赖\n   \n   ```kotlin\n   // ...\n   \n   buildscript {\n     dependencies {\n       // ...\n   \n       classpath(\"com.xiaoyv.gradle:crawler-plugin:$version\")\n     }\n   }\n   \n   // ...\n   ```\n2. 新建一个 `application` 类型的空模块，注意是 `application` 不是 `library`。然后引入该爬虫 Gradle 插件和依赖\n   \n   ```kotlin\n   plugins {\n       alias(libs.plugins.androidApplication)\n       alias(libs.plugins.kotlinAndroid)\n   }\n\n   // 引入插件\n   apply(plugin = \"com.xiaoyv.gradle.crawler\")\n\n   // 这里可以配置清单文件的相关开发信息\n   configure\u003ccom.xiaoyv.gradle.crawler.extension.CrawlerExtension\u003e {\n       crawlerName.set(\"plugin.pak\")\n       crawlerAuthor.set(\"why\")\n       crawlerDescription.set(\"This is a collection of crawler packages\")\n       crawlerCreateTime.set(System.currentTimeMillis())\n       crawlerUpdateTime.set(System.currentTimeMillis())\n   }\n\n   android {\n       // ...\n   }\n\n   dependencies {\n       // 注意这里使用的是 compileOnly，因为加载爬虫插件包的项目已经存在相关的类了，不需要插件包引入。\n       compileOnly(project(\":crawler-api\"))\n\n       // ...\n   }\n   ```\n\n3. 在这个模块创建一个爬虫类，如：`WebTestCrawler` 继承自 `ICrawler` 类。\n\n   注意：\n\n   爬虫类需要使用注解 `CrawlerObj` 标记，`CrawlerObj` 可以配置爬虫类的名字描述等，会自动生成到清单文件。\n\n   爬虫类里面暴露的爬虫方法必须要使用 `CrawlerMethod` 标记，`CrawlerMethod` 可以配置描述信息。\n\n   如果爬虫类没有被 `CrawlerObj` 标记，将不会生成到清单文件，也无法被 `getCrawlerByName` 加载。\n\n   如果爬虫类的需要暴露的方法没有被 `CrawlerMethod` 标记，则该方法不会生成到清单文件，也无法通过 `call` 等方法调用。\n\n   ```kotlin\n   package com.xiaoyv.crawler.test\n   \n   import android.content.Context\n   import android.widget.Toast\n   import com.xiaoyv.crawler.annotation.CrawlerMethod\n   import com.xiaoyv.crawler.annotation.CrawlerObj\n   import com.xiaoyv.crawler.api.ICrawler\n   import okhttp3.HttpUrl.Companion.toHttpUrl\n   import okhttp3.Request\n   import org.jsoup.Jsoup\n   \n   @CrawlerObj(name = \"webCrawler\", description = \"WebTestCrawler\", version = 1)\n   class WebTestCrawler : ICrawler() {\n   \n       override fun onCreate() {\n   \n       }\n   \n       @CrawlerMethod(description = \"http test\")\n       fun httpTest(url: String): String {\n           val response = useHttpClient.newCall(Request(url.toHttpUrl())).execute()\n           return response.body.string()\n       }\n   }\n   ```\n4. 编译插件包\n\n   直接执行该模块的 `gradle` 任务 `build` 组下的 `buildCrawlerRelease` 任务即可，生成的插件包位于 `build/outputs` 目录下面。\n   \n   ![图片](https://github.com/xiaoyvyv/AndroidCrawlerEngine/assets/29088158/eda7a0aa-525e-4edb-a335-f1d869ca4097)\n   \n   生成的插件包即可通过 `CrawlerEngine` 进行加载运行。\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxiaoyvyv%2Fandroidcrawlerengine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxiaoyvyv%2Fandroidcrawlerengine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxiaoyvyv%2Fandroidcrawlerengine/lists"}