{"id":13606513,"url":"https://github.com/MohamedRejeb/Ksoup","last_synced_at":"2025-04-12T08:31:24.194Z","repository":{"id":163855507,"uuid":"638220512","full_name":"MohamedRejeb/Ksoup","owner":"MohamedRejeb","description":"Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML, extracting HTML tags, attributes, and text, and encoding and decoding HTML entities.","archived":false,"fork":false,"pushed_at":"2024-01-05T21:06:52.000Z","size":453,"stargazers_count":305,"open_issues_count":6,"forks_count":7,"subscribers_count":8,"default_branch":"main","last_synced_at":"2024-03-19T12:34:52.717Z","etag":null,"topics":["android","html-parser","kotlin","kotlin-android","kotlin-js","kotlin-jvm","kotlin-library","kotlin-multiplatform","kotlin-native","parser","parser-library","parsing"],"latest_commit_sha":null,"homepage":"","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MohamedRejeb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null},"funding":{"custom":["https://www.buymeacoffee.com/mohamedrejeb"]}},"created_at":"2023-05-09T10:32:55.000Z","updated_at":"2024-03-18T07:58:15.000Z","dependencies_parsed_at":null,"dependency_job_id":"cb391f36-331a-4078-8a84-24b2b65d0831","html_url":"https://github.com/MohamedRejeb/Ksoup","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MohamedRejeb%2FKsoup","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MohamedRejeb%2FKsoup/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MohamedRejeb%2FKsoup/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MohamedRejeb%2FKsoup/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MohamedRejeb","download_url":"https://codeload.github.com/MohamedRejeb/Ksoup/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248539827,"owners_count":21121239,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["android","html-parser","kotlin","kotlin-android","kotlin-js","kotlin-jvm","kotlin-library","kotlin-multiplatform","kotlin-native","parser","parser-library","parsing"],"created_at":"2024-08-01T19:01:09.818Z","updated_at":"2025-04-12T08:31:19.187Z","avatar_url":"https://github.com/MohamedRejeb.png","language":"Kotlin","funding_links":["https://www.buymeacoffee.com/mohamedrejeb","https://www.buymeacoffee.com/MohamedRejeb","https://img.buymeacoffee.com/button-api/?text=Buy"],"categories":["Kotlin"],"sub_categories":[],"readme":"# Ksoup - Kotlin Multiplatform HTML Parser\n\nKsoup is a lightweight Kotlin Multiplatform library for parsing HTML, extracting HTML tags, attributes, and text, and encoding and decoding HTML entities.\n\n[![Kotlin](https://img.shields.io/badge/kotlin-1.9.22-blue.svg?logo=kotlin)](http://kotlinlang.org)\n[![MohamedRejeb](https://raw.githubusercontent.com/MohamedRejeb/MohamedRejeb/main/badges/mohamedrejeb.svg)](https://github.com/MohamedRejeb)\n[![Apache-2.0](https://img.shields.io/badge/License-Apache%202.0-green.svg)](https://opensource.org/licenses/Apache-2.0)\n[![BuildPassing](https://shields.io/badge/build-passing-brightgreen)](https://github.com/MohamedRejeb/ksoup/actions)\n[![Maven Central](https://img.shields.io/maven-central/v/com.mohamedrejeb.ksoup/ksoup-html)](https://search.maven.org/search?q=g:%22com.mohamedrejeb.ksoup%22%20AND%20a:%22ksoup-html%22)\n\n![Slide 16_9 - 1 (1)](https://github.com/MohamedRejeb/ksoup/assets/41842296/fc352215-c8fd-4274-8fc0-ee1c587bb930)\n\n## Features\n\n- Parse HTML from String\n- Extract HTML tags, attributes, and text\n- Encode and decode HTML entities\n- Lightweight and does not depend on any other library\n- Kotlin Multiplatform support\n- Fast and efficient\n- Unit tested\n\n## Installation\n\n[![Maven Central](https://img.shields.io/maven-central/v/com.mohamedrejeb.ksoup/ksoup-html)](https://search.maven.org/search?q=g:%22com.mohamedrejeb.ksoup%22%20AND%20a:%22ksoup-html%22)\n\nAdd the dependency below to your **module**'s `build.gradle.kts` or `build.gradle` file:\n\n\n| Kotlin version | Ksoup version |\n|----------------|---------------|\n| 2.0.x          | 0.4.x         |\n| 1.9.2x         | 0.3.x         |\n| 1.9.x          | 0.2.1         |\n| 1.8.x          | 0.1.4         |\n\n\n```kotlin\nval version = \"0.4.0\"\n\n// For parsing HTML\nimplementation(\"com.mohamedrejeb.ksoup:ksoup-html:$version\")\n\n// Only for encoding and decoding HTML entities \nimplementation(\"com.mohamedrejeb.ksoup:ksoup-entities:$version\")\n```\n\n## Usage\n\n### Parsing HTML\n\nTo parse HTML from a String, use the `KsoupHtmlParser` class, and provide an implementation of the `KsoupHtmlHandler` interface, and a `KsoupHtmlOptions` object.\nBoth of them are optional, you can use the default ones if you want.\n\n\n#### KsoupHtmlParser\n\nYou can create a parser using the `KsoupHtmlParser()`, there are several methods that you can use, for example `write` to parse a String, and `end` to close the parser when you are done:\n\n```kotlin\nval ksoupHtmlParser = KsoupHtmlParser()\n\n// String to parse\nval html = \"\u003ch1\u003eMy Heading\u003c/h1\u003e\"\n\n// Pass the HTML to the parser (It is going to parse the HTML and call the callbacks)\nksoupHtmlParser.write(html)\n\n// Close the parser when you are done\nksoupHtmlParser.end()\n```\n\n\n#### KsoupHtmlHandler\n\nYou can directly implement `KsoupHtmlHandler` interface or use `KsoupHtmlHandler.Builder()`:\n\n```kotlin\n// Implement `KsoupHtmlHandler` interface\nval firstHandler = object : KsoupHtmlHandler {\n    override fun onOpenTag(name: String, attributes: Map\u003cString, String\u003e, isImplied: Boolean) {\n        println(\"Open tag: $name\")\n    }\n}\n\n// Use `KsoupHtmlHandler.Builder()`\nval secondHandler = KsoupHtmlHandler\n    .Builder()\n    .onOpenTag { name, attributes, isImplied -\u003e\n        println(\"Open tag: $name\")\n    }\n    .build()\n```\n\nThere are several methods that you can override, for example is you want to just extract the text from the HTML, you can override the `onText` method:\n\n```kotlin\n// String to parse\nval html = \"\"\"\n    \u003chtml\u003e\n        \u003chead\u003e\n            \u003ctitle\u003eMy Title\u003c/title\u003e\n        \u003c/head\u003e\n        \u003cbody\u003e\n            \u003ch1\u003eMy Heading\u003c/h1\u003e\n            \u003cp\u003eMy paragraph.\u003c/p\u003e\n        \u003c/body\u003e\n    \u003c/html\u003e\n\"\"\".trimIndent()\n\n// String to store the extracted text\nvar string = \"\"\n\n// Create a handler\nval handler = KsoupHtmlHandler\n    .Builder()\n    .onText { text -\u003e\n        string += text\n    }\n    .build()\n\n// Create a parser\nval ksoupHtmlParser = KsoupHtmlParser(\n    handler = handler,\n)\n\n// Pass the HTML to the parser (It is going to parse the HTML and call the callbacks)\nksoupHtmlParser.write(html)\n\n// Close the parser when you are done\nksoupHtmlParser.end()\n```\n\nYou can also use `onOpenTag` and `onCloseTag` to know when a tag is opened or closed, it can be used for scrapping data from a website or powering a rich text editor,\nAlso you can use `onComment` to know when a comment is found in the HTML and `onAttribute` to know when attributes are found in a tag.\n\n\n#### KsoupHtmlOptions\n\nYou can also pass `KsoupHtmlOptions` to the parser to change the behavior of the parser, you can for example disable the decoding of HTML entities which is enabled by default:\n\n```kotlin\nval options = KsoupHtmlOption(\n    decodeEntities = false,\n)\n```\n\n### Encoding and Decoding HTML Entities\n\nYou can use the `KsoupEntities` class to encode and decode HTML entities:\n\n```kotlin\n// Encode HTML entities\nval encoded = KsoupEntities.encodeHtml(\"Hello \u0026 World\") // return: Hello \u0026amp; World\n\n// Decode HTML entities\nval decoded = KsoupEntities.decodeHtml(\"Hello \u0026amp; World\") // return: Hello \u0026 World\n```\n\n`KsoupEntities` also provides methods to encode and decode only XML entities or HTML4.\nThe `KsoupEntities` class is available in the `ksoup-entities` module.\n\nBoth `encodeHtml` and `decodeHtml` methods support all HTML5 entities, XML entities, and HTML4 entities.\n\n## Coming Features\n\n- [ ] Add clear documentation\n- [ ] Add Markdown parser\n\n## Contribution\nIf you've found an error in this sample, please file an issue. \u003cbr\u003e\nFeel free to help out by sending a pull request :heart:.\n\n[Code of Conduct](https://github.com/MohamedRejeb/ksoup/blob/main/CODE_OF_CONDUCT.md)\n\n## Find this library useful? :heart:\nSupport it by joining __[stargazers](https://github.com/MohamedRejeb/Ksoup/stargazers)__ for this repository. :star: \u003cbr\u003e\nAlso, __[follow me](https://github.com/MohamedRejeb)__ on GitHub for more libraries! 🤩\n\nYou can always \u003ca href=\"https://www.buymeacoffee.com/MohamedRejeb\"\u003e\u003cimg src=\"https://img.buymeacoffee.com/button-api/?text=Buy me a coffee\u0026emoji=\u0026slug=MohamedRejeb\u0026button_colour=FFDD00\u0026font_colour=000000\u0026font_family=Cookie\u0026outline_colour=000000\u0026coffee_colour=ffffff\"\u003e\u003c/a\u003e\n\n## License\n```markdown\nCopyright 2023 Mohamed Rejeb\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n   http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMohamedRejeb%2FKsoup","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMohamedRejeb%2FKsoup","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMohamedRejeb%2FKsoup/lists"}