{"id":17808204,"url":"https://github.com/holmofy/spring-spider","last_synced_at":"2025-03-17T15:30:39.930Z","repository":{"id":65576980,"uuid":"585862464","full_name":"holmofy/spring-spider","owner":"holmofy","description":"Spring Spider App Utility Library.","archived":false,"fork":false,"pushed_at":"2023-03-19T07:57:10.000Z","size":56,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-28T00:51:16.178Z","etag":null,"topics":["crawler","java","spider","spring","spring-spider"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/holmofy.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-06T09:29:06.000Z","updated_at":"2023-09-13T11:52:54.000Z","dependencies_parsed_at":"2024-10-27T15:14:56.348Z","dependency_job_id":"81611cf3-7ef8-47b2-b97c-1e7f39b2f990","html_url":"https://github.com/holmofy/spring-spider","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/holmofy%2Fspring-spider","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/holmofy%2Fspring-spider/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/holmofy%2Fspring-spider/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/holmofy%2Fspring-spider/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/holmofy","download_url":"https://codeload.github.com/holmofy/spring-spider/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243865517,"owners_count":20360455,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","java","spider","spring","spring-spider"],"created_at":"2024-10-27T15:09:07.050Z","updated_at":"2025-03-17T15:30:39.554Z","avatar_url":"https://github.com/holmofy.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status(https://github.com/holmofy/spring-spider/actions/workflows/package.yaml/badge.svg)](https://github.com/holmofy/spring-spider/actions/workflows/package.yaml/badge.svg)](https://repo1.maven.org/maven2/io/github/holmofy/spring-spider)\n![coverage](https://github.com/holmofy/spring-spider/actions/workflows/coverage.yaml/badge.svg)\n\nA simple crawler tool library based on spring boot\n\n# feature\n\n* [x] support jsonpath \u0026 jsoup \u0026 xpath\n* [x] Integrate [playwright](https://github.com/microsoft/playwright-java) to support pages included js, such as\n  single-page application\n* [x] support raw http message\n\n## how to use\n\n0. Requirements: **spring boot 3.0, java17**\n\n1. add dependency\n\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003eio.github.holmofy\u003c/groupId\u003e\n    \u003cartifactId\u003espring-spider\u003c/artifactId\u003e\n    \u003cversion\u003e1.3.3\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n2. support jsonpath \u0026 Jsoup \u0026 Xpath\n\n```java\npublic class Example {\n    @Test\n    public void test_jsonpath() {\n        Downloader downloader = Downloader.builder().simple();\n        String current_user_url = downloader.download(CrawlerRequest.get(\"https://api.github.com/\").build())\n                .jsonPath()\n                .read(\"$.current_user_url\");\n        Assert.assertEquals(\"https://api.github.com/user\", current_user_url);\n    }\n\n    @Test\n    public void test_jsoup() {\n        Downloader downloader = Downloader.builder().simple();\n        List\u003cString\u003e repos = downloader.download(CrawlerRequest.get(\"https://github.com/search?q=spider\").build())\n                .jsoup()\n                .select(\"div.application-main ul.repo-list \u003e li \u003e div.mt-n1.flex-auto \u003e div.d-flex \u003e div \u003e a\")\n                .eachText();\n        Assert.assertEquals(10, repos.size());\n        System.out.println(repos);\n    }\n\n    @Test\n    public void test_xpath() {\n        Downloader downloader = Downloader.builder().simple();\n        String location = downloader.download(CrawlerRequest.get(\"https://www.douban.com/sitemap_index.xml\").build())\n                .xPath()\n                .select(\"/sitemapindex/sitemap/loc\")\n                .item(0)\n                .getTextContent();\n        Assert.assertEquals(\"https://www.douban.com/sitemap.xml.gz\", location);\n    }\n}\n```\n\n## playwright\n\n```java\npublic class Example {\n    public static void main(String[] args) {\n        Downloader playwright = Downloader.builder().playwright();\n        //...\n    }\n}\n```\n\n## raw http request\n\n```java\nimport io.github.holmofy.spider.CrawlerResponse;\nimport io.github.holmofy.spider.Downloader;\n\npublic class Example {\n    public static void main(String[] args) {\n        CrawlerRequest request = CrawlerRequest.parseRaw(\"\"\"\n                POST https://login.example.com/api/users/login\n                Accept: application/json, text/plain, */*\n                Content-Type: application/x-www-form-urlencoded;charset=UTF-8\n                Cookie: 0bd17c6216775852668436416eaee18367962376820602ec6d9cbff1f07b4c\n                User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36\n                      \n                user=admin\u0026password=123456\n                \"\"\");\n        CrawlerResponse response = Downloader.builder().simple().download(request);\n        // ...\n    }\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fholmofy%2Fspring-spider","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fholmofy%2Fspring-spider","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fholmofy%2Fspring-spider/lists"}