https://github.com/burnoo/kspoon
Annotation based HTML to Kotlin class parser with KMP support, kotlinx (de)serializtion format, jspoon successor
https://github.com/burnoo/kspoon
html html-parser kotlin kotlin-multiplatform kotlinx-serialization ktor retrofit2
Last synced: 23 days ago
JSON representation
Annotation based HTML to Kotlin class parser with KMP support, kotlinx (de)serializtion format, jspoon successor
- Host: GitHub
- URL: https://github.com/burnoo/kspoon
- Owner: burnoo
- License: apache-2.0
- Created: 2024-08-30T18:08:28.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-12-16T09:35:40.000Z (about 2 months ago)
- Last Synced: 2025-12-19T22:44:03.920Z (about 2 months ago)
- Topics: html, html-parser, kotlin, kotlin-multiplatform, kotlinx-serialization, ktor, retrofit2
- Language: Kotlin
- Homepage:
- Size: 452 KB
- Stars: 70
- Watchers: 1
- Forks: 2
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE.md
Awesome Lists containing this project
README
# kspoon 🥄
[](https://search.maven.org/search?q=dev.burnoo.kspoon) [](https://javadoc.io/doc/dev.burnoo.kspoon/kspoon/latest/kspoon/dev.burnoo.kspoon/-kspoon/index.html)
kspoon is a Kotlin Multiplatform library for parsing HTML into Kotlin objects. It
uses [ksoup](https://github.com/fleeksoft/ksoup) as an HTML parser
and [kotlinx.serialization](https://github.com/Kotlin/kotlinx.serialization) to create objects. This library is a
successor to [jspoon](https://github.com/DroidsOnRoids/jspoon/).
A big shoutout to [@itboy87](https://github.com/itboy87) for porting Jsoup to KMP - this library wouldn't exist without
his amazing work. Check out the [Ksoup repository](https://github.com/fleeksoft/ksoup)!
## Installation
Apply serialization plugin to your module `build.gradle.kts`/`build.gradle`:
```kotlin
plugins {
kotlin("plugin.serialization") version ""
}
```
Add the following dependency to your module `build.gradle.kts`/`build.gradle` file:
```kotlin
dependencies {
implementation("dev.burnoo.kspoon:kspoon:0.2.3")
}
```
## Usage
kspoon works with any serializable class. Adding `@Selector` annotations on its serializable fields, enables HTML
parsing:
```kotlin
@Serializable
data class Page(
@Selector("#header") val header: String,
@Selector("li.class1") val intList: List,
@Selector(value = "#image1", attr = "src") val imageSource: String,
)
```
You can then use a `Kspoon` instance to create objects:
```kotlin
val htmlContent = """
Title
- 1
- 2
- 3
""".trimIndent()
val page = Kspoon.parse(htmlContent)
println(page) // Page(header=Title, intList=[1, 3], imageSource=image.bmp)
```
The library looks for the first occurrence with CSS selector in the HTML and sets its value to the corresponding field.
### Configuration
kspoon can be configured using the `Kspoon {}` factory function, which returns an instance that can be used for parsing.
All available options with default values are listed below:
```kotlin
val kspoon = Kspoon {
// Specifies the parsing function. Type: (String) -> Document
parse = { html: String -> Ksoup.parse(html, baseUri = "") }
// Default text mode used for parsing.
defaultTextMode = HtmlTextMode.Text
// Enables coercing values when the selected HTML element is not found.
coerceInputValues = false
// Module with contextual and polymorphic serializers to be used.
serializersModule = EmptySerializersModule()
}
kspoon.parse(HTML_CONTENT)
```
### Selecting content
By default, the HTML's `textContent` value is used to extract data. This behavior can be changed either in the
configuration or by using the `textMode` parameter in the `@Selector` annotation. Options include `InnerHtml`,
`OuterHtml`, or `Data` (for scripts and styles):
```kotlin
@Serializable
data class Page(
@Selector("p", textMode = SelectorHtmlTextMode.OuterHtml)
val content: String
)
val htmlContent = "
Text
"
val page = Kspoon.parse(htmlContent)
println(page) // Page(content=Text
)
```
It is also possible to get an attribute value by setting the `attr` parameter in the `@Selector` annotation (
see [Usage](#usage) for an example).
### Regex
Regex can be set up by passing the `regex` parameter to the `@Selector` annotation. After parsing the text (with HTML
text mode or attribute), the regex is applied to the string. The returned string will be the first matched group or the
entire match if no group is specified.
```kotlin
data class Page(
@Selector(value = "#numbers", regex = "([0-9]+) ")
val starNumber: Int // 31 stars (31 will be parsed)
)
```
### Default values
There are three ways to set default values:
- `@Selector("#tag", defValue = "default")` - if the HTML element is not found, the `defValue` will be used as a
parsed string
- Nullable field - if the HTML element is not found, the value will be set to `null`
- `coerceInputValues = true` in the `Kspoon {}` configuration - enables coercing to a default value
```kotlin
@Serializable
data class Model(
@Selector("span")
val text: String = "not found"
)
val body = "
"
val text = Kspoon { coerceInputValues = true }.parse(body).text
println(text) // prints "not found"
```
`defValue` offers the best performance due to the internal logic of kotlinx.serialization. Nullable fields does HTML
selection twice. Coercing input values does HTML selection twice and also disables
[sequential decoding](https://kotlinlang.org/api/kotlinx.serialization/kotlinx-serialization-core/kotlinx.serialization.encoding/-composite-decoder/decode-sequentially.html).
### Serializers
Any `KSerializer` can be applied to a field annotated with `@Selector` to customize serialization logic. For example,
date serializers from [`kotlinx-datetime`](https://github.com/Kotlin/kotlinx-datetime):
```kotlin
@Serializable
data class Model(
@Serializable(LocalDateIso8601Serializer::class)
@Selector("span")
val date: LocalDate,
)
```
Additionally, kspoon has built-in serializers for Ksoup classes: `ElementSerializer`, `ElementsSerializer`, and
`DocumentSerializer`. They can be used directly or via contextual serialization:
```kotlin
@Serializable
data class Model(
@Serializable(ElementSerializer::class) // or @Contextual
@Selector("div.class1")
val element: Element,
)
```
It is also possible to write custom kspoon serializers that can access the selected `Element`. Read
more [here](/docs/custom-serializers.md).
### External libraries
The `Kspoon` class has a `toFormat(): StringFormat` function that can be used with third-party libraries. For detailed
integration instructions, see the following links:
- [Ktor](/docs/ktor.md)
- [Retrofit](docs/retrofit.md)
### Supported targets
`jvm`, `js`, `wasmjs` `linuxX64`, `linuxArm64`, `tvosArm64`, `tvosX64`, `tvosSimulatorArm64`, `macosX64`,
`macosArm64`, `iosArm64`, `iosSimulatorArm64`, `iosX64`, `mingwX64`
### [Custom serializers](docs/custom-serializers.md)
### [jspoon compatibility](docs/jspoon-compatibility.md)
### Changelog
See [GitHub releases](https://github.com/burnoo/kspoon/releases).