{"id":18885730,"url":"https://github.com/code4craft/xsoup","last_synced_at":"2025-05-16T11:05:54.923Z","repository":{"id":10363762,"uuid":"12504675","full_name":"code4craft/xsoup","owner":"code4craft","description":"When jsoup meets XPath.","archived":false,"fork":false,"pushed_at":"2023-07-10T05:22:27.000Z","size":190,"stargazers_count":469,"open_issues_count":28,"forks_count":152,"subscribers_count":43,"default_branch":"master","last_synced_at":"2025-04-09T06:08:33.129Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/code4craft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2013-08-31T11:37:03.000Z","updated_at":"2025-03-23T17:59:16.000Z","dependencies_parsed_at":"2024-06-20T23:25:50.889Z","dependency_job_id":"86ce49b8-4244-4fba-a3c1-4dccca051d4f","html_url":"https://github.com/code4craft/xsoup","commit_stats":null,"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code4craft%2Fxsoup","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code4craft%2Fxsoup/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code4craft%2Fxsoup/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/code4craft%2Fxsoup/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/code4craft","download_url":"https://codeload.github.com/code4craft/xsoup/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254518383,"owners_count":22084374,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T07:21:24.199Z","updated_at":"2025-05-16T11:05:49.916Z","avatar_url":"https://github.com/code4craft.png","language":"Java","readme":"Xsoup\n----\n[![Build Status](https://api.travis-ci.org/code4craft/xsoup.png?branch=master)](https://travis-ci.org/code4craft/xsoup)\n\nXPath selector based on Jsoup.\n\n## Get started:\n\n```java\n    @Test\n    public void testSelect() {\n\n        String html = \"\u003chtml\u003e\u003cdiv\u003e\u003ca href='https://github.com'\u003egithub.com\u003c/a\u003e\u003c/div\u003e\" +\n                \"\u003ctable\u003e\u003ctr\u003e\u003ctd\u003ea\u003c/td\u003e\u003ctd\u003eb\u003c/td\u003e\u003c/tr\u003e\u003c/table\u003e\u003c/html\u003e\";\n\n        Document document = Jsoup.parse(html);\n\n        String result = Xsoup.compile(\"//a/@href\").evaluate(document).get();\n        Assert.assertEquals(\"https://github.com\", result);\n\n        List\u003cString\u003e list = Xsoup.compile(\"//tr/td/text()\").evaluate(document).list();\n        Assert.assertEquals(\"a\", list.get(0));\n        Assert.assertEquals(\"b\", list.get(1));\n    }\n```\n\n## Performance:\n\nXsoup use Jsoup as HTML parser. \n\nCompare with another most used XPath selector for HTML - [**`HtmlCleaner`**](http://htmlcleaner.sourceforge.net/), Xsoup is much faster:\n\n\tNormal HTML, size 44KB\n\tXPath: \"//a\"\t\n\tRun for 2000 times\n\n\tEnvironment：Mac Air MD231CH/A \n\tCPU: 1.8Ghz Intel Core i5\n\n\u003ctable\u003e\n    \u003ctr\u003e\n        \u003ctd width=\"100\"\u003eOperation\u003c/td\u003e\n        \u003ctd width=\"100\"\u003eXsoup\u003c/td\u003e\n        \u003ctd\u003eHtmlCleaner\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eparse\u003c/td\u003e\n        \u003ctd\u003e3,207(ms)\u003c/td\u003e\n        \u003ctd\u003e7,999(ms)\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eselect\u003c/td\u003e\n        \u003ctd\u003e95(ms)\u003c/td\u003e\n        \u003ctd\u003e380(ms)\u003c/td\u003e\n    \u003c/tr\u003e\n\u003c/table\u003e\n\n## Syntax supported:\n\n### XPath1.0:\n\n\u003ctable\u003e\n    \u003ctr\u003e\n        \u003ctd width=\"100\"\u003eName\u003c/td\u003e\n        \u003ctd width=\"100\"\u003eExpression\u003c/td\u003e\n        \u003ctd\u003eSupport\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003enodename\u003c/td\u003e\n        \u003ctd\u003enodename\u003c/td\u003e\n        \u003ctd\u003eyes\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eimmediate parent\u003c/td\u003e\n        \u003ctd\u003e/\u003c/td\u003e\n        \u003ctd\u003eyes\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eparent\u003c/td\u003e\n        \u003ctd\u003e//\u003c/td\u003e\n        \u003ctd\u003eyes\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eattribute\u003c/td\u003e\n        \u003ctd\u003e[@key=value]\u003c/td\u003e\n        \u003ctd\u003eyes\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003enth child\u003c/td\u003e\n        \u003ctd\u003etag[n]\u003c/td\u003e\n        \u003ctd\u003eyes\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eattribute\u003c/td\u003e\n        \u003ctd\u003e/@key\u003c/td\u003e\n        \u003ctd\u003eyes\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003ewildcard in tagname\u003c/td\u003e\n        \u003ctd\u003e/*\u003c/td\u003e\n        \u003ctd\u003eyes\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003ewildcard in attribute\u003c/td\u003e\n        \u003ctd\u003e/[@*]\u003c/td\u003e\n        \u003ctd\u003eyes\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003efunction\u003c/td\u003e\n        \u003ctd\u003efunction()\u003c/td\u003e\n        \u003ctd\u003epart\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eor\u003c/td\u003e\n        \u003ctd\u003ea | b\u003c/td\u003e\n        \u003ctd\u003eyes since 0.2.0\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eparent in path\u003c/td\u003e\n        \u003ctd\u003e. or ..\u003c/td\u003e\n        \u003ctd\u003eno\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003epredicates\u003c/td\u003e\n        \u003ctd\u003eprice\u003e35\u003c/td\u003e\n        \u003ctd\u003eno\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003epredicates logic\u003c/td\u003e\n        \u003ctd\u003e@class=a or @class=b\u003c/td\u003e\n        \u003ctd\u003eyes since 0.2.0\u003c/td\u003e\n    \u003c/tr\u003e\n\u003c/table\u003e\n\n### Function supported:\n\nIn Xsoup, we use some function (maybe not in Standard XPath 1.0):\n\n\u003ctable\u003e\n    \u003ctr\u003e\n        \u003ctd width=\"100\"\u003eExpression\u003c/td\u003e\n        \u003ctd width=\"100\"\u003eDescription\u003c/td\u003e\n        \u003ctd\u003eStandard XPath\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd width=\"100\"\u003etext(n)\u003c/td\u003e\n        \u003ctd width=\"100\"\u003enth text content of element(0 for all)\u003c/td\u003e\n        \u003ctd\u003etext() only\u003c/td\u003e\n    \u003c/tr\u003e\n        \u003ctr\u003e\n        \u003ctd width=\"100\"\u003eallText()\u003c/td\u003e\n        \u003ctd width=\"100\"\u003etext including children\u003c/td\u003e\n        \u003ctd\u003enot support\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003c/tr\u003e\n        \u003ctr\u003e\n        \u003ctd width=\"100\"\u003etidyText()\u003c/td\u003e\n        \u003ctd width=\"100\"\u003etext including children, well formatted\u003c/td\u003e\n        \u003ctd\u003enot support\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd width=\"100\"\u003ehtml()\u003c/td\u003e\n        \u003ctd width=\"100\"\u003einnerhtml of element\u003c/td\u003e\n        \u003ctd\u003enot support\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd width=\"100\"\u003eouterHtml()\u003c/td\u003e\n        \u003ctd width=\"100\"\u003eouterHtml of element\u003c/td\u003e\n        \u003ctd\u003enot support\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd width=\"100\"\u003eregex(@attr,expr,group)\u003c/td\u003e\n        \u003ctd width=\"100\"\u003euse regex to extract content\u003c/td\u003e\n        \u003ctd\u003enot support\u003c/td\u003e\n    \u003c/tr\u003e\n\u003c/table\u003e\n\n### Extended syntax supported:\n\nThese XPath syntax are extended only in Xsoup (for convenience in extracting HTML, refer to Jsoup CSS Selector):\n\n\u003ctable\u003e\n    \u003ctr\u003e\n        \u003ctd width=\"100\"\u003eName\u003c/td\u003e\n        \u003ctd width=\"100\"\u003eExpression\u003c/td\u003e\n        \u003ctd\u003eSupport\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eattribute value not equals\u003c/td\u003e\n        \u003ctd\u003e[@key!=value]\u003c/td\u003e\n        \u003ctd\u003eyes\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eattribute value start with\u003c/td\u003e\n        \u003ctd\u003e[@key~=value]\u003c/td\u003e\n        \u003ctd\u003eyes\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eattribute value end with\u003c/td\u003e\n        \u003ctd\u003e[@key$=value]\u003c/td\u003e\n        \u003ctd\u003eyes\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eattribute value contains\u003c/td\u003e\n        \u003ctd\u003e[@key*=value]\u003c/td\u003e\n        \u003ctd\u003eyes\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eattribute value match regex\u003c/td\u003e\n        \u003ctd\u003e[@key~=value]\u003c/td\u003e\n        \u003ctd\u003eyes\u003c/td\u003e\n    \u003c/tr\u003e\n\u003c/table\u003e\n\n## License\n\nMIT License, see file `LICENSE`\n\n[![Bitdeli Badge](https://d2weczhvl823v0.cloudfront.net/code4craft/xsoup/trend.png)](https://bitdeli.com/free \"Bitdeli Badge\")\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode4craft%2Fxsoup","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcode4craft%2Fxsoup","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode4craft%2Fxsoup/lists"}