{"id":19904328,"url":"https://github.com/coding-dream/aspider","last_synced_at":"2025-06-24T00:32:02.091Z","repository":{"id":179904034,"uuid":"112301414","full_name":"coding-dream/ASpider","owner":"coding-dream","description":"A  spider run on Android Platform","archived":false,"fork":false,"pushed_at":"2018-03-16T03:15:13.000Z","size":10215,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-01T07:27:35.244Z","etag":null,"topics":["crawler","jsoup","spider"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/coding-dream.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-11-28T07:29:03.000Z","updated_at":"2018-03-16T03:15:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"9718d662-bcc3-4ed7-bddf-318718f7b364","html_url":"https://github.com/coding-dream/ASpider","commit_stats":null,"previous_names":["coding-dream/aspider"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/coding-dream/ASpider","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coding-dream%2FASpider","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coding-dream%2FASpider/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coding-dream%2FASpider/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coding-dream%2FASpider/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/coding-dream","download_url":"https://codeload.github.com/coding-dream/ASpider/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coding-dream%2FASpider/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261582652,"owners_count":23180634,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","jsoup","spider"],"created_at":"2024-11-12T20:27:54.792Z","updated_at":"2025-06-24T00:32:02.068Z","avatar_url":"https://github.com/coding-dream.png","language":"Java","readme":"# ASpider\r\n\r\n## 使用介绍\r\n\r\n构建一个可运行在Android平台的爬虫框架，目前仍在完善。\r\n\r\n特性：\r\n1. 轻量级注解建表存储（类似Hibernate），c3p0连接池\r\n2. 网页解析：Jsoup、Xpath、正则\r\n3. 代理请求（普通代理池）\r\n4. QueueScheduler、PriorityScheduler、BDBScheduler等方式的调度器\r\n5. 支持GET、POST类型的爬虫\r\n6. 请求参数伪造，逻辑稍复杂的爬虫可传递参数。\r\n7. 内置下载器：HTTPConnection、Okhttp（可定制）\r\n\r\n下面是一个爬取全网链接的基础爬虫案例：\r\n\r\n```\r\npublic class SimpleSpider {\r\n\r\n    public static String TAG = SimpleSpider.class.getSimpleName();\r\n\r\n    public static final String REGEX_URL = \"(http|https)+://[^\\\\s|\\\\?|\u0026|'|\\\"]+(com|cn|org|net)+?\";\r\n\r\n    public static final Pattern pattern = Pattern.compile(REGEX_URL);\r\n\r\n    public static void start(String startUrl) {\r\n        ASpider.create()\r\n                .pageProcessor(new PageProcessor() {\r\n                    @Override\r\n                    public void process(Page page) {\r\n                        // 输出项(默认ConsolePipeline输出到终端)\r\n                        page.putField(\"html\", page.getRawText());\r\n                        // 匹配网页链接\r\n                        String newUrl = RegexUtils.get(REGEX_URL).selectSingle(html, 0);\r\n                        // 添加新链接到队列\r\n                        page.addTargetRequestsNoReferer(newUrl);\r\n                    }\r\n                })\r\n                .thread(20)\r\n                .urls(startUrl)\r\n                .runAsync();\r\n    }\r\n\r\n    public static void main(String[] args) {\r\n        start(\"http://www.xx.com\");\r\n    }\r\n}\r\n```\r\n\r\n项目封装了一个方便存储到MySQL的工具类，只需继承自DatabaseHelper：\r\n\r\n步骤一：编写实体Bean\r\n\r\n```\r\n@Table\r\npublic class JavaPdf {\r\n\r\n    @Column(value = \"id\", columnDefinition = \"int primary key auto_increment\")\r\n    private int id;\r\n\r\n    @Column\r\n    private String url;\r\n\r\n    @Column\r\n    private String title;\r\n\r\n    @Column(value = \"panUrl\", columnDefinition = \"varchar(255) not null unique\")\r\n    private String panUrl;\r\n\r\n    @Column(value = \"passwd\")\r\n    private String passwd;\r\n\r\n    public JavaPdf(String title, String panUrl, String passwd, String url) {\r\n        this.title = title;\r\n        this.panUrl = panUrl;\r\n        this.passwd = passwd;\r\n        this.url = url;\r\n    }\r\n}\r\n```\r\n\r\n步骤二：数据库建表\r\n\r\n```\r\nJavaPdfSpider javaPdfSpider = new JavaPdfSpider();\r\njavaPdfSpider.createTable(JavaPdf.class);\r\n```\r\n\r\n步骤三：存储数据\r\n\r\n```\r\njavaPdfSpider.insert(new JavaPdf(title, panUrl, passwd, page.getUrl()));\r\n```\r\n\r\n希望详细了解的请看源码，仅此粗略介绍！\r\n\r\n## 协议\r\n\r\nGPL（暂定）","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoding-dream%2Faspider","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcoding-dream%2Faspider","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoding-dream%2Faspider/lists"}