{"id":13659302,"url":"https://github.com/fleeksoft/ksoup","last_synced_at":"2025-05-16T04:04:42.727Z","repository":{"id":207649328,"uuid":"719100459","full_name":"fleeksoft/ksoup","owner":"fleeksoft","description":"Ksoup is a Kotlin Multiplatform library for working with HTML and XML. It's a port of the renowned Java library Jsoup.","archived":false,"fork":false,"pushed_at":"2025-04-29T18:33:54.000Z","size":4750,"stargazers_count":430,"open_issues_count":7,"forks_count":15,"subscribers_count":3,"default_branch":"release","last_synced_at":"2025-05-16T00:44:19.085Z","etag":null,"topics":["java-html-parser","jsoup","kmp","kotlin","kotlin-html-parser","kotlin-multiplatform","ksoup"],"latest_commit_sha":null,"homepage":"https://fleeksoft.github.io/ksoup/","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fleeksoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-11-15T13:05:46.000Z","updated_at":"2025-05-14T13:14:23.000Z","dependencies_parsed_at":"2023-11-23T17:41:20.677Z","dependency_job_id":"0b360bdc-f26e-46ae-b621-8aca39ba13fa","html_url":"https://github.com/fleeksoft/ksoup","commit_stats":null,"previous_names":["fleeksoft/ksoup"],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fleeksoft%2Fksoup","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fleeksoft%2Fksoup/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fleeksoft%2Fksoup/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fleeksoft%2Fksoup/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fleeksoft","download_url":"https://codeload.github.com/fleeksoft/ksoup/tar.gz/refs/heads/release","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254464894,"owners_count":22075570,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["java-html-parser","jsoup","kmp","kotlin","kotlin-html-parser","kotlin-multiplatform","ksoup"],"created_at":"2024-08-02T05:01:07.245Z","updated_at":"2025-05-16T04:04:42.720Z","avatar_url":"https://github.com/fleeksoft.png","language":"Kotlin","readme":"# Ksoup: Kotlin Multiplatform HTML \u0026 XML Parser\n\n**Ksoup** is a Kotlin Multiplatform library for working with real-world HTML and XML. It's a port of the renowned Java library, **jsoup**, and offers an easy-to-use API for URL fetching, data parsing, extraction, and manipulation using DOM and CSS selectors.\n\n[![Kotlin](https://img.shields.io/badge/Kotlin-2.1.20-blue.svg?style=flat\u0026logo=kotlin)](https://kotlinlang.org)\n[![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE.md)\n[![Maven Central](https://img.shields.io/maven-central/v/com.fleeksoft.ksoup/ksoup.svg)](https://central.sonatype.com/artifact/com.fleeksoft.ksoup/ksoup)\n\n![badge-android](http://img.shields.io/badge/platform-android-6EDB8D.svg?style=flat)\n![badge-ios](http://img.shields.io/badge/platform-ios-CDCDCD.svg?style=flat)\n![badge-mac](http://img.shields.io/badge/platform-macos-111111.svg?style=flat)\n![badge-tvos](http://img.shields.io/badge/platform-tvos-808080.svg?style=flat)\n![badge-jvm](http://img.shields.io/badge/platform-jvm-DB413D.svg?style=flat)\n![badge-linux](http://img.shields.io/badge/platform-linux-2D3F6C.svg?style=flat)\n![badge-windows](http://img.shields.io/badge/platform-windows-4D76CD.svg?style=flat)\n![badge-js](https://img.shields.io/badge/platform-js-F8DB5D.svg?style=flat)\n![badge-wasm](https://img.shields.io/badge/platform-wasm-F8DB5D.svg?style=flat)\n\n## 🚨 Deprecation Notice\n\n\u003e The `ksoup-korlibs` and `ksoup-network-korlibs` variant is **deprecated** and will be removed in a future release.\n\u003e\n\u003e **Recommendation:** Use the `ksoup-kotlinx` variant for I/O support and Ktor 3 for networking.\n\nKsoup implements the [WHATWG HTML5](https://html.spec.whatwg.org/multipage/) specification, parsing HTML to the same DOM as modern browsers do, but with support for Android, JVM, and native platforms.\n\n## Features\n- Scrape and parse HTML from a URL, file, or string\n- Find and extract data using DOM traversal or CSS selectors\n- Manipulate HTML elements, attributes, and text\n- Clean user-submitted content against a safe-list to prevent XSS attacks\n- Output tidy HTML\n\nKsoup is adept at handling all varieties of HTML found in the wild.\n\n## Getting started\n### Ksoup is published on Maven Central\nInclude the dependency in `commonMain`. Latest version [![Maven Central](https://img.shields.io/maven-central/v/com.fleeksoft.ksoup/ksoup.svg)](https://central.sonatype.com/artifact/com.fleeksoft.ksoup/ksoup)\n\nKsoup published in four variants. Pick the one that suits your needs and start building!\n1. **Lightweight variant: Use this if you only need to parse HTML from a string.**\n   ```kotlin\n   implementation(\"com.fleeksoft.ksoup:ksoup:\u003cversion\u003e\")\n    ```\n2. **This variant use [kotlinx-io](https://github.com/Kotlin/kotlinx-io) for I/O and [Ktor 3](https://github.com/ktorio/ktor) for networking**\n   ```kotlin\n   // Ksoup.parseFile, Ksoup.parseSource\n   implementation(\"com.fleeksoft.ksoup:ksoup-kotlinx:\u003cversion\u003e\")\n   \n    // Optional: Include only if you need to use network request functions such as\n    // Ksoup.parseGetRequest, Ksoup.parseSubmitRequest, and Ksoup.parsePostRequest\n   implementation(\"com.fleeksoft.ksoup:ksoup-network:\u003cversion\u003e\")\n    ```\n\n3. **This variant use [kotlinx-io](https://github.com/Kotlin/kotlinx-io) for I/O and [Ktor 2](https://github.com/ktorio/ktor) for networking**\n   ```kotlin\n   // Ksoup.parseFile, Ksoup.parseSource\n   implementation(\"com.fleeksoft.ksoup:ksoup-kotlinx:\u003cversion\u003e\")\n\n    // Optional: Include only if you need to use network request functions such as\n    // Ksoup.parseGetRequest, Ksoup.parseSubmitRequest, and Ksoup.parsePostRequest\n   implementation(\"com.fleeksoft.ksoup:ksoup-network-ktor2:\u003cversion\u003e\")\n    ```\n4. **This variant use [okio](https://github.com/square/okio) for I/O and [Ktor 2](https://github.com/ktorio/ktor) for networking**\n   ```kotlin\n   implementation(\"com.fleeksoft.ksoup:ksoup-okio:\u003cversion\u003e\")\n\n    // Optional: Include only if you need to use network request functions such as\n    // Ksoup.parseGetRequest, Ksoup.parseSubmitRequest, and Ksoup.parsePostRequest\n   implementation(\"com.fleeksoft.ksoup:ksoup-network-ktor2:\u003cversion\u003e\")\n    ```\n\n5. ~~**This variant use [korlibs-io](https://github.com/korlibs/korlibs-io) for I/O and networking**~~\n   ```kotlin\n   // Ksoup.parseFile, Ksoup.parseStream\n   implementation(\"com.fleeksoft.ksoup:ksoup-korlibs:\u003cversion\u003e\")\n\n    // Optional: Include only if you need to use network request functions such as\n    // Ksoup.parseGetRequest, Ksoup.parseSubmitRequest, and Ksoup.parsePostRequest\n   implementation(\"com.fleeksoft.ksoup:ksoup-network-korlibs:\u003cversion\u003e\")\n    ```\n   \n#### Ksoup supports [Charsets](https://github.com/fleeksoft/fleeksoft-io/blob/main/CharsetsReadme.md)\n- Standard charsets are already supported by **Ksoup IO**, but for extended charsets, plesae add `com.fleeksoft.charset:charset-ext`, For more details, visit the [Charsets Documentation](https://github.com/fleeksoft/fleeksoft-io/blob/main/CharsetsReadme.md)\n\n### Parsing HTML from a String with Ksoup\nFor API documentation you can check [Jsoup](https://jsoup.org/). Most of the APIs work without any changes.\n```kotlin\nval html = \"\u003chtml\u003e\u003chead\u003e\u003ctitle\u003eOne\u003c/title\u003e\u003c/head\u003e\u003cbody\u003eTwo\u003c/body\u003e\u003c/html\u003e\"\nval doc: Document = Ksoup.parse(html = html)\n\nprintln(\"title =\u003e ${doc.title()}\") // One\nprintln(\"bodyText =\u003e ${doc.body().text()}\") // Two\n```\nThis snippet demonstrates how to use `Ksoup.parse` for parsing an HTML string and extracting the title and body text.\n\n### Fetching and Parsing HTML from a URL using Ksoup\n```kotlin\n//Please note that the com.fleeksoft.ksoup:ksoup-network library is required for Ksoup.parseGetRequest.\nval doc: Document = Ksoup.parseGetRequest(url = \"https://en.wikipedia.org/\") // suspend function\n// or\nval doc: Document = Ksoup.parseGetRequestBlocking(url = \"https://en.wikipedia.org/\")\n\nprintln(\"title: ${doc.title()}\")\nval headlines: Elements = doc.select(\"#mp-itn b a\")\n\nheadlines.forEach { headline: Element -\u003e\n    val headlineTitle = headline.attr(\"title\")\n    val headlineLink = headline.absUrl(\"href\")\n\n    println(\"$headlineTitle =\u003e $headlineLink\")\n}\n```\n\n### Parsing XML\n```kotlin\n    val doc: Document = Ksoup.parse(xml, parser = Parser = Parser.xmlParser())\n```\n\n### Parsing Metadata from Website\n```kotlin\n//Please note that the com.fleeksoft.ksoup:ksoup-network library is required for Ksoup.parseGetRequest.\nval doc: Document = Ksoup.parseGetRequest(url = \"https://en.wikipedia.org/\") // suspend function\nval metadata: Metadata = Ksoup.parseMetaData(element = doc) // suspend function\n// or\nval metadata: Metadata = Ksoup.parseMetaData(html = HTML)\n\nprintln(\"title: ${metadata.title}\")\nprintln(\"description: ${metadata.description}\")\nprintln(\"ogTitle: ${metadata.ogTitle}\")\nprintln(\"ogDescription: ${metadata.ogDescription}\")\nprintln(\"twitterTitle: ${metadata.twitterTitle}\")\nprintln(\"twitterDescription: ${metadata.twitterDescription}\")\n// Check com.fleeksoft.ksoup.model.MetaData for more fields\n```\n\nIn this example, `Ksoup.parseGetRequest` fetches and parses HTML content from Wikipedia, extracting and printing news headlines and their corresponding links.\n### Ksoup Public functions\n  - **Ksoup.parse(html: String, baseUri: String = \"\"): Document**\n  - **Ksoup.parse(html: String, parser: Parser, baseUri: String = \"\"): Document**\n  - **Ksoup.parse(reader: Reader, parser: Parser, baseUri: String = \"\"): Document**\n  - **Ksoup.clean( bodyHtml: String, safelist: Safelist = Safelist.relaxed(), baseUri: String = \"\", outputSettings: Document.OutputSettings? = null): String**\n  - **Ksoup.isValid(bodyHtml: String, safelist: Safelist = Safelist.relaxed()): Boolean**\n### Ksoup I/O Public functions\n  - **Ksoup.parseInput(input: InputStream, baseUri: String, charsetName: String? = null, parser: Parser = Parser.htmlParser())** from (ksoup-io, ksoup-okio, ksoup-kotlinx, ksoup-korlibs)\n  - **Ksoup.parseFile** from (ksoup-okio, ksoup-kotlinx, ksoup-korlibs)\n  - **Ksoup.parseSource** from (ksoup-okio, ksoup-kotlinx)\n  - **Ksoup.parseStream** from (ksoup-korlibs)\n\n### Ksoup Network Public functions\n- Suspend functions\n    - **Ksoup.parseGetRequest**\n    - **Ksoup.parseSubmitRequest**\n    - **Ksoup.parsePostRequest**\n- Blocking functions\n  - **Ksoup.parseGetRequestBlocking**\n  - **Ksoup.parseSubmitRequestBlocking**\n  - **Ksoup.parsePostRequestBlocking**\n\n#### For further documentation, please check here: [Jsoup](https://jsoup.org/)\n\n### Ksoup vs. Jsoup Benchmarks: Parsing \u0026 Selecting 448KB HTML File [test.tx](https://github.com/fleeksoft/ksoup/blob/develop/ksoup-test/testResources/test.txt)\n![Ksoup vs Jsoup](benchmark1.png)\n\n## Open source\nKsoup is an open source project, a Kotlin Multiplatform port of jsoup, distributed under the MIT License, Version 2.0. The source code of Ksoup is available on [GitHub](https://github.com/fleeksoft/ksoup).\n\n\n## Development and Support\nFor questions about usage and general inquiries, please refer to [GitHub Discussions](https://github.com/fleeksoft/ksoup/discussions).\n\nIf you wish to contribute, please read the [Contributing Guidelines](CONTRIBUTING.md).\n\nTo report any issues, visit our [GitHub issues](https://github.com/fleeksoft/ksoup/issues), Please ensure to check for duplicates before submitting a new issue.\n\n\n## License\n\nKsoup is open source software licensed under the [MIT License](LICENSE.md).\n\nThis project is a Kotlin Multiplatform port of [Jsoup](https://jsoup.org), created by Jonathan Hedley.  \nPortions of this library are derived from jsoup and retain their original [MIT License](https://jsoup.org/license),  \n© 2009–2025 Jonathan Hedley.  \n","funding_links":[],"categories":["Libraries","Kotlin"],"sub_categories":["🗃 Serializer"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffleeksoft%2Fksoup","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffleeksoft%2Fksoup","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffleeksoft%2Fksoup/lists"}