{"id":21388172,"url":"https://github.com/droidsonroids/jspoon","last_synced_at":"2025-04-04T22:05:24.741Z","repository":{"id":51681383,"uuid":"97011177","full_name":"DroidsOnRoids/jspoon","owner":"DroidsOnRoids","description":"Annotation based HTML to Java parser + Retrofit converter","archived":false,"fork":false,"pushed_at":"2024-04-19T06:49:08.000Z","size":353,"stargazers_count":323,"open_issues_count":10,"forks_count":23,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-03-28T21:03:09.428Z","etag":null,"topics":["html","java","parser"],"latest_commit_sha":null,"homepage":"https://www.thedroidsonroids.com/blog/scraping-web-pages-with-retrofit-jspoon-library","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DroidsOnRoids.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-07-12T13:27:58.000Z","updated_at":"2024-10-28T13:31:20.000Z","dependencies_parsed_at":"2025-01-05T01:45:58.600Z","dependency_job_id":null,"html_url":"https://github.com/DroidsOnRoids/jspoon","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DroidsOnRoids%2Fjspoon","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DroidsOnRoids%2Fjspoon/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DroidsOnRoids%2Fjspoon/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DroidsOnRoids%2Fjspoon/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DroidsOnRoids","download_url":"https://codeload.github.com/DroidsOnRoids/jspoon/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247256110,"owners_count":20909240,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["html","java","parser"],"created_at":"2024-11-22T12:16:09.072Z","updated_at":"2025-04-04T22:05:24.717Z","avatar_url":"https://github.com/DroidsOnRoids.png","language":"Java","readme":"[![Maven Central](https://maven-badges.herokuapp.com/maven-central/pl.droidsonroids/jspoon/badge.svg?style=flat)](https://maven-badges.herokuapp.com/maven-central/pl.droidsonroids/jspoon)\n[![Javadocs](https://javadoc.io/badge/pl.droidsonroids/jspoon.svg?color=blue)](https://javadoc.io/doc/pl.droidsonroids/jspoon)\n\n# jspoon\njspoon is a Java library that provides parsing HTML into Java objects basing on CSS selectors. It uses [jsoup][jsoup] underneath as a HTML parser.\n\n## Installation\nInsert the following dependency into your project's `build.gradle` file:\n```gradle\ndependencies {\n    implementation 'pl.droidsonroids:jspoon:1.3.2'\n}\n```\n## Usage\njspoon works on any class with a default constructor. To make it work you need to annotate fields with `@Selector` annotation and set a CSS selector as the annotation's value:\n```java\nclass Page {\n    @Selector(\"#title\") String title;\n    @Selector(\"li.a\") List\u003cInteger\u003e intList;\n    @Selector(value = \"#image1\", attr = \"src\") String imageSource;\n}\n```\nThen you can create a `HtmlAdapter` and use it to build objects:\n```java\nString htmlContent = \"\u003cdiv\u003e\" \n    + \"\u003cp id='title'\u003eTitle\u003c/p\u003e\" \n    + \"\u003cul\u003e\"\n    + \"\u003cli class='a'\u003e1\u003c/li\u003e\"\n    + \"\u003cli\u003e2\u003c/li\u003e\"\n    + \"\u003cli class='a'\u003e3\u003c/li\u003e\"\n    + \"\u003c/ul\u003e\"\n    + \"\u003cimg id='image1' src='image.bmp' /\u003e\"\n    + \"\u003c/div\u003e\";\n\nJspoon jspoon = Jspoon.create();\nHtmlAdapter\u003cPage\u003e htmlAdapter = jspoon.adapter(Page.class);\n\nPage page = htmlAdapter.fromHtml(htmlContent);\n//title = \"Title\"; intList = [1, 3]; imageSource = \"image.bmp\"\n```\nIt looks for the first occurrence in HTML and sets its value to a field.\n\n### Supported types\n`@Selector` can be applied to any field of the following types (or their primitive equivalents):\n* `String`\n* `Boolean`\n* `Integer`\n* `Long`\n* `Float`\n* `Double`\n* `Date`\n* `BigDecimal`\n* Jsoup's `Element`\n* Any class with  default constructor\n* `List` (or its superclass/superinterface) of supported type\n\nIt can also be used with a class, then you don't need to annotate every field inside it.\n\n### Attributes\nBy default, the HTML's `textContent` value is used on Strings, Dates and numbers. It is possible to use an attribute by setting an `attr` parameter in the `@Selector` annotation. You can also use `\"html\"` (or `\"innerHtml\"`) and `\"outerHtml\"` as `attr`'s value.\n\n### Formatting and regex\nRegex can be set up by passing `regex` parameter to `@Selector` annotation. Example:\n```java\nclass Page {\n    @Selector(value = \"#numbers\", regex = \"([a-z]+),\") String matchedNumber;\n}\n```\nDate format can be set up by passing `value` parameter to `@Format` annotation. Example:\n```java\nclass Page {\n    @Format(value = \"HH:mm:ss dd.MM.yyyy\")\n    @Selector(value = \"#date\") Date date;\n}\n```\n```java\nString htmlContent = \"\u003cspan id='date'\u003e13:30:12 14.07.2017\u003c/span\u003e\"\n    + \"\u003cspan id='numbers'\u003eONE, TwO, three,\u003c/span\u003e\";\nJspoon jspoon = Jspoon.create();\nHtmlAdapter\u003cPage\u003e htmlAdapter = jspoon.adapter(Page.class);\nPage page = htmlAdapter.fromHtml(htmlContent);//date = Jul 14, 2017 13:30:12; matchedNumber = \"three\";\n```\n\nJava's `Locale` is used for parsing Floats, Doubles and Dates. You can override it by setting `languageTag` @Format parameter:\n```java\n@Format(languageTag = \"pl\")\n@Selector(value = \"div \u003e p \u003e span\") Double pi; //3,14 will be parsed \n```\nIf jspoon doesn't find a HTML element it wont't set field's value unless you set the `defValue` parameter:\n```java\n@Selector(value = \"div \u003e p \u003e span\", defValue = \"NO_TEXT\") String text;\n```\n\n### Custom converterts\nWhen format or regex is not enough, custom converter can be used to implement parsing from jsoup's `Element`. This can be done by extending `ElementConverter` class:\n```java\npublic class JoinChildrenClassConverter implements ElementConverter\u003cString\u003e {\n    @Override\n    public String convert(Element node, Selector selector) {\n        return node.children().stream().map(Element::text).collect(Collectors.joining(\", \"));\n    }\n}\n```\nAnd it can be used the following way:\n```java\npublic class Model {\n    @Selector(value = \"#id\", converter = JoinChildrenClassConverter::class)\n    String childrenText;\n}\n```\n\n### Retrofit\nRetrofit converter is available [here][retrofit-converter].\n\n### Changelog\nSee [GitHub releases][changelog]\n\n### Other libraries/inspirations\n* [jsoup][jsoup] - all HTML parsing in jspoon is made by this library\n* [webGrude][webGrude] - when I had an idea I found this library. It was the biggest inspiration and I used some ideas from it\n* [Moshi][Moshi] - I wanted to make jspoon work with HTML the same way as Moshi works with JSON. I adapted caching mechanism (fields and adapters) from it.\n* [jsoup-annotations][jsoup-annotations] - similar to jspoon\n\n[//]: #\n   [jsoup]: \u003chttps://jsoup.org/\u003e\n   [webGrude]: \u003chttps://github.com/beothorn/webGrude\u003e\n   [Moshi]: \u003chttps://github.com/square/moshi\u003e\n   [jsoup-annotations]: \u003chttps://github.com/fcannizzaro/jsoup-annotations\u003e\n   [retrofit-converter]: \u003chttps://github.com/DroidsOnRoids/jspoon/tree/master/retrofit-converter-jspoon\u003e\n   [changelog]: \u003chttps://github.com/DroidsOnRoids/jspoon/releases\u003e","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdroidsonroids%2Fjspoon","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdroidsonroids%2Fjspoon","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdroidsonroids%2Fjspoon/lists"}