https://github.com/danny1113/html-parser-builder
A result builder that build HTML parser and transform HTML elements to strongly-typed result, inspired by RegexBuilder.
https://github.com/danny1113/html-parser-builder
dsl html-parser swift
Last synced: 19 days ago
JSON representation
A result builder that build HTML parser and transform HTML elements to strongly-typed result, inspired by RegexBuilder.
- Host: GitHub
- URL: https://github.com/danny1113/html-parser-builder
- Owner: danny1113
- License: mit
- Created: 2022-07-14T09:14:12.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2025-03-05T09:39:46.000Z (about 2 months ago)
- Last Synced: 2025-04-01T10:02:58.004Z (20 days ago)
- Topics: dsl, html-parser, swift
- Language: Swift
- Homepage:
- Size: 77.1 KB
- Stars: 9
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-result-builders - HTMLParserBuilder - Build your HTML parser with declarative syntax and strongly-typed result. (Parsing)
README
# HTMLParserBuilder
A result builder that build HTML parser and transform HTML elements to strongly-typed result, inspired by RegexBuilder.
- [HTMLParserBuilder](#htmlparserbuilder)
- [Installation](#installation)
- [Requirement](#requirement)
- [Introduction](#introduction)
- [Usage](#api-detail-usage)
- [Bring your own parser](#bringyourownparser)
- [Parsing](#parsing)
- [HTML](#html)
- [One](#one)
- [ZeroOrOne](#zeroorone)
- [Many](#many)
- [Group](#group)
- [LateInit](#lateinit)
- [Wrap Up](#wrap-up)
- [Advanced use case](#advanced-use-case)## Installation
### Requirement
- Swift 5.9
- macOS 10.15
- iOS 13.0
- tvOS 13.0
- watchOS 6.0
- visionOS 1.0## Introduction
Parsing HTML can be complicated, for example you want to parse the simple html below:
```html
hello, world
INSIDE GROUP h1
INSIDE GROUP h2
```Existing HTML parsing library have these downside:
- Name every captured element
- It can be more complex as the element you want to capture become more and more
- Error handling can be hard```swift
let htmlString = "..."
let doc = HTMLDocument(string: htmlString)
let first = doc.querySelector("#hello")?.textContentlet group = doc.querySelector("#group")
let second = group?.querySelector("h1")?.textContent
let third = group?.querySelector("h2")?.textContentif let first = first,
let second = second,
let third = third {
// ...
} else {
// ...
}
```HTMLParserBuilder comes with some really great advantages:
- Strongly-typed capture result
- Structrued syntax
- Composible API
- Error handling built inYou can construct your parser which reflect your original HTML structure:
```swift
let parser = HTML {
ZeroOrOne("#hello") { (element: any Element?) -> String? in
return element?.textContent
} // => HTML
Group("#group") {
One("h1", transform: \.textContent) // => HTML
One("h2", transform: \.textContent) // => HTML
} // => HTML<(String, String)>
} // => HTML<(String?, (String, String))>let htmlString = "..."
let doc = HTMLDocument(string: htmlString)let output = try doc.parse(parser)
// => (String?, (String, String))
// output: (Optional("hello, world"), ("INSIDE GROUP h1", "INSIDE GROUP h2"))
```## Usage
### Bring your own parser
HTMLParserBuilder doesn't rely on any html parser, so you can chose any html parser you want to use, as long as it conforms to the `Document` and `Element` protocol.
For example, you can use SwiftSoup as the html parser, example for conformance to the `Document` and `Element` protocol is available in `Sources/HTMLParserBuilder/Example`.
```swift
dependencies: [
// ...
.package(url: "https://github.com/scinfu/SwiftSoup.git", .upToNextMajor(from: "2.8.0")),
.package(url: "https://github.com/danny1113/html-parser-builder.git", .upToNextMajor(from: "4.0.0")),
],
targets: [
.target(name: "YourTarget", dependencies: [
"SwiftSoup",
.product(name: "HTMLParserBuilder", package: "html-parser-builder"),
]),
]
```### Parsing
HTMLParserBuilder provides a function for parsing:
```swift
public func parse(_ html: HTML) throws -> Output
```### HTML
You can construct your parser inside `HTML`, it can also transform to other data type.
```swift
struct Pair {
let h1: String
let h2: String
}let parser = HTML {
One("#group h1", transform: \.textContent) // => HTML
One("#group h2", transform: \.textContent) // => HTML
}
.map { (output: (String, String)) -> Pair in
return Pair(
h1: output.0,
h2: output.1
)
} // => HTML
```---
### One
Using `One` is the same as `querySelector`, you pass in CSS selector to find the HTML element, and you can transform it to any other type you want:
- innerHTML
- textContent
- attributes
- ...> **Note**: If `One` can't find the HTML element that match the selector, it will throw an error cause the whole parse fail, for failable capture, see [`ZeroOrOne`](#zeroorone).
You can use this API with various declaration that is most suitable for you:
```swift
One("#hello", transform: \.textContent)
One("#hello") { $0.textContent }
One("#hello") { (e: any Element) -> String in
return e.textContent
}
```### ZeroOrOne
`ZeroOrOne` is a litte different from `One`, it also calls `querySelector` to find the HTML element, but it returns an **optional** HTML element.
For this example, it will produce the result type of `String?`, and the result will be `nil` when the HTML element can't be found.
```swift
ZeroOrOne("#hello") { (e: (any Element)?) -> String? in
return e?.innerHTML
}
```### Many
Using `Many` is the same as `querySelectorAll`, you pass in CSS selector to find all HTML elements that match the selector, and you can transform it to any other type you want:
You can use this API with various declaration that is most suitable for you:
```swift
Many("h1") { $0.map(\.textContent) }
Many("h1") { (e: [any Element]) -> [String] in
return e.map(\.textContent)
}
```You can also capture other elements inside and transform to other type:
```html
Group 1
Group 2
``````swift
Many("div.group") { (elements: [any Element]) -> [String] in
return elements.compactMap { e in
return e.query(selector: "h1")?.textContent
}
}
// => [String]
// output: ["Group 1", "Group 2"]
```---
### Group
`Group` will find a HTML element that match the selector, and all the captures inside will find its element based on the element found by `Group`, this is useful when you just want to capture element that is inside the local group.
Just like `HTML`, `Group` can also transform captured result to other data type by adding `transform`:
```swift
struct Pair {
let h1: String
let h2: String
}Group("#group") {
One("h1", transform: \.textContent) // => HTML
One("h2", transform: \.textContent) // => HTML
}
.map { (output: (String, String)) -> Pair in
return Pair(
h1: output.0,
h2: output.1
)
} // => Pair
```> **Note**: If `Group` can't find the HTML element that match the selector, it will throw an error cause the whole parse fail, you can use [`ZeroOrOne`](#zeroorone) as alternative.
### LateInit
This library also comes with a handy property wrapper: `LateInit`, which can delay the initialization until the first time you access it.
```swift
struct Container {
@LateInit var parser = HTML {
One("h1", transform: \.textContent)
}
}// it needs to be `var` to perform late initialization
var container = Container()
let output = doc.parse(container.parser)
// ...
```### Wrap Up
| API | Use Case |
| ---------- | ---------------------------------------------------- |
| One | Throws error when element can't be captured |
| ZeroOrOne | Returns `nil` when element can't be captured |
| Many | Capture all elements match the selector |
| Group | Capture elements in the local scope |
| LateInit | Delay the initialization to first time you access it |## Advanced use case
- Pass `HTMLComponent` into another
- Transform to custom data structure before parasing```swift
struct Pair {
let h1: String
let h2: String
}// |--------------------------------------------------------------|
let groupCapture = HTML { // |
Group("#group") { // |
One("h1", transform: \.textContent) // => HTML // |
One("h2", transform: \.textContent) // => HTML // |
} // => HTML<(String, String)> // |
} // |
.map { output -> Pair in // |
return Pair( // |
h1: output.0, // |
h2: output.1 // |
) // |
} // => HTML // |
// |
let parser = HTML { // |
ZeroOrOne("#hello") { (element: (any Element)?) -> String? in // |
return element?.textContent // |
} // => HTML // |
// |
groupCapture // => HTML -------------------------------------|} // => HTML<(String?, Pair)>
let htmlString = "..."
let doc = HTMLDocument(string: htmlString)let output: (String?, Pair) = try doc.parse(parser)
```