{"id":35200219,"url":"https://github.com/kylehowells/swift-justhtml","last_synced_at":"2026-04-02T12:15:45.350Z","repository":{"id":329213567,"uuid":"1117463410","full_name":"kylehowells/swift-justhtml","owner":"kylehowells","description":"Swift-JustHTML a Swift port of EmilStenstrom/justhtml by implementing 100% html5 spec compliant parsing of html documents in pure Swift","archived":false,"fork":false,"pushed_at":"2026-01-13T11:00:38.000Z","size":859,"stargazers_count":70,"open_issues_count":1,"forks_count":3,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-03-24T09:29:16.082Z","etag":null,"topics":["html","html5","parser","parser-library","swift","swift5","swift6"],"latest_commit_sha":null,"homepage":"https://kylehowells.github.io/swift-justhtml/documentation/justhtml/","language":"Swift","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kylehowells.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-16T10:52:11.000Z","updated_at":"2026-03-20T12:35:46.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/kylehowells/swift-justhtml","commit_stats":null,"previous_names":["kylehowells/swift-justhtml"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/kylehowells/swift-justhtml","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kylehowells%2Fswift-justhtml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kylehowells%2Fswift-justhtml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kylehowells%2Fswift-justhtml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kylehowells%2Fswift-justhtml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kylehowells","download_url":"https://codeload.github.com/kylehowells/swift-justhtml/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kylehowells%2Fswift-justhtml/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31305982,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-02T09:48:21.550Z","status":"ssl_error","status_checked_at":"2026-04-02T09:48:19.196Z","response_time":89,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["html","html5","parser","parser-library","swift","swift5","swift6"],"created_at":"2025-12-29T10:58:06.205Z","updated_at":"2026-04-02T12:15:45.337Z","avatar_url":"https://github.com/kylehowells.png","language":"Swift","readme":"# Swift-JustHTML\n\nA dependency-free HTML5 parser for Swift, following the WHATWG HTML parsing specification.\n\nSwift port of [justhtml](https://github.com/EmilStenstrom/justhtml) (Python) and [justjshtml](https://github.com/simonw/justjshtml) (JavaScript).\n\n## Features\n\n- **Full HTML5 Compliance** - Passes all 1,831 [html5lib-tests](https://github.com/html5lib/html5lib-tests) tree construction tests\n- **Zero Dependencies** - Pure Swift using only standard library and Foundation\n- **Cross-Platform** - macOS, iOS, tvOS, watchOS, visionOS, and Linux\n- **CSS Selectors** - Query documents using standard CSS selector syntax\n- **Multiple Output Formats** - Serialize to HTML, plain text, or Markdown\n- **Streaming API** - Memory-efficient event-based parsing\n- **Fragment Parsing** - Parse HTML fragments in specific contexts\n\n## Installation\n\n### Swift Package Manager\n\nAdd swift-justhtml to your `Package.swift`:\n\n```swift\ndependencies: [\n    .package(url: \"https://github.com/kylehowells/swift-justhtml.git\", from: \"0.4.0\")\n]\n```\n\nThen add it to your target's dependencies:\n\n```swift\ntargets: [\n    .target(\n        name: \"YourTarget\",\n        dependencies: [\"justhtml\"]\n    )\n]\n```\n\n### Xcode\n\n1. File \u003e Add Package Dependencies...\n2. Enter: `https://github.com/kylehowells/swift-justhtml.git`\n3. Select version: 0.4.0 or later\n\n## Usage\n\n### Basic Parsing\n\n```swift\nimport justhtml\n\n// Parse an HTML document\nlet doc = try JustHTML(\"\u003chtml\u003e\u003cbody\u003e\u003cp\u003eHello, World!\u003c/p\u003e\u003c/body\u003e\u003c/html\u003e\")\n\n// Access the document tree\nprint(doc.root.children)  // [\u003chtml\u003e]\n```\n\n### CSS Selectors\n\n```swift\n// Query with CSS selectors\nlet paragraphs = try doc.query(\"p\")\nlet byClass = try doc.query(\".intro\")\nlet byId = try doc.query(\"#header\")\nlet complex = try doc.query(\"nav \u003e ul \u003e li \u003e a[href]\")\n\n// Check if a node matches a selector\nlet matches = try node.matches(\".highlight\")\n```\n\n### Extracting Content\n\n```swift\n// Get plain text\nlet text = doc.toText()\n\n// Serialize to HTML\nlet html = doc.toHTML()\nlet prettyHtml = doc.toHTML(pretty: true, indentSize: 4)\n\n// Convert to Markdown\nlet markdown = doc.toMarkdown()\n```\n\n### Fragment Parsing\n\n```swift\n// Parse HTML as if inside a specific element\nlet ctx = FragmentContext(\"tbody\")\nlet fragment = try JustHTML(\"\u003ctr\u003e\u003ctd\u003eCell\u003c/td\u003e\u003c/tr\u003e\", fragmentContext: ctx)\n```\n\n### Streaming API\n\n```swift\n// Memory-efficient event-based parsing\nfor event in HTMLStream(\"\u003cp\u003eHello\u003c/p\u003e\") {\n    switch event {\n    case .start(let tag, let attrs):\n        print(\"Start: \\(tag)\")\n    case .end(let tag):\n        print(\"End: \\(tag)\")\n    case .text(let content):\n        print(\"Text: \\(content)\")\n    case .comment(let text):\n        print(\"Comment: \\(text)\")\n    case .doctype(let name, let publicId, let systemId):\n        print(\"Doctype: \\(name)\")\n    }\n}\n```\n\n### Error Handling\n\n```swift\n// Strict mode - throws on first parse error\ndo {\n    let doc = try JustHTML(\"\u003cp\u003eUnclosed\", strict: true)\n} catch let error as StrictModeError {\n    print(\"Error: \\(error.parseError.code)\")\n}\n\n// Collect errors without throwing\nlet doc = try JustHTML(\"\u003cp\u003eUnclosed\", collectErrors: true)\nfor error in doc.errors {\n    print(\"\\(error.line):\\(error.column): \\(error.code)\")\n}\n```\n\n### DoS Protection\n\nswift-justhtml includes configurable limits to protect against denial-of-service attacks from malicious HTML input:\n\n```swift\n// Default limits are applied automatically (recommended)\nlet doc = try JustHTML(untrustedHTML)\n\n// Custom limits for servers with more resources\nvar limits = ParserLimits()\nlimits.maxNestingDepth = 2048\nlet doc = try JustHTML(html, limits: limits)\n\n// Stricter limits for resource-constrained devices\nlet doc = try JustHTML(html, limits: .strict)\n\n// Disable limits for trusted content only\nlet doc = try JustHTML(trustedHTML, limits: .unlimited)\n```\n\nDefault limits:\n- `maxEntityNameLength`: 255 characters (prevents memory attacks from `\u0026aaaa...`)\n- `maxNestingDepth`: 512 levels (prevents stack overflow from deep nesting)\n\nSee the [DoS Protection Guide](https://kylehowells.github.io/swift-justhtml/documentation/justhtml/dosprotection) for details.\n\n## Spec Compliance\n\nswift-justhtml implements the [WHATWG HTML parsing specification](https://html.spec.whatwg.org/multipage/parsing.html) exactly and passes all tests from the official [html5lib-tests](https://github.com/html5lib/html5lib-tests) suite (used by browser vendors), the same as [justhtml](https://github.com/EmilStenstrom/justhtml).\n\n### Test Results\n\n| Test Suite | Passed | Failed |\n|------------|--------|--------|\n| Tree Construction | 1,831 | 0 |\n| Tokenizer | 6,810 | 0 |\n| Serializer | 230 | 0 |\n| Encoding | 82 | 0 |\n| **Total** | **8,953** | **0** |\n\n### Fuzz Testing\n\nThe parser has been fuzz tested with millions of randomized and malformed HTML documents to ensure it never crashes or hangs on any input:\n\n- Random data fuzzing with varying document sizes\n- Fragment context fuzzing\n- Deep nesting stress tests\n- Malformed tag and entity sequences\n\nRun the fuzzer: `swift test --filter fuzzTest`\n\n## Performance\n\nswift-justhtml is optimized for performance, matching or exceeding JavaScript implementations:\n\n### Parse Time\n\n| Implementation | Parse Time | Comparison |\n|----------------|-----------|------------|\n| **Swift** | 97ms | - |\n| JavaScript | 99ms | 1.02x slower |\n| Python | 398ms | 4.1x slower |\n\n*Benchmark: Parsing 2.5MB of HTML across 5 Wikipedia articles*\n\nSee [Benchmarks/BENCHMARK_RESULTS.md](Benchmarks/BENCHMARK_RESULTS.md) for detailed performance comparison.\n\n### Memory Usage\n\n| Implementation | Peak RSS | Comparison |\n|----------------|----------|------------|\n| **Swift** | 103 MB | - |\n| Python | 106 MB | 1.03x more |\n| JavaScript | 226 MB | 2.2x more |\n\n*Benchmark: Average peak memory across 6 test files including 20MB synthetic HTML*\n\nSee [Benchmarks/MEMORY_RESULTS.md](Benchmarks/MEMORY_RESULTS.md) for detailed memory comparison.\n\n### Swift Library Comparison\n\n| Library | html5lib Pass Rate | Crashes/Hangs | Dependencies |\n|---------|-------------------|---------------|--------------|\n| **swift-justhtml** | 100% (1831/1831) | None | None |\n| Kanna | 94.4% (1542/1633) | None | libxml2 |\n| SwiftSoup | 87.9% (1436/1633) | Infinite loop on 197 tests | swift-atomics |\n| LilHTML | 47.4% (775/1634) | Crashes on 855 tests | libxml2 |\n\nSee [notes/comparison.md](notes/comparison.md) for detailed library comparison.\n\n## Platform Support\n\n| Platform | Minimum Version |\n|----------|-----------------|\n| macOS | 13.0+ |\n| iOS | 16.0+ |\n| tvOS | 16.0+ |\n| watchOS | 9.0+ |\n| visionOS | 1.0+ |\n| Linux | Swift 6.0+ |\n\n## Documentation\n\n- [API Documentation](https://kylehowells.github.io/swift-justhtml/documentation/justhtml/)\n- [Getting Started Guide](https://kylehowells.github.io/swift-justhtml/documentation/justhtml/gettingstarted)\n\n## License\n\nMIT License - see [LICENSE](LICENSE) for details.\n\n## Credits\n\n- Original Python implementation: [justhtml](https://github.com/EmilStenstrom/justhtml) by Emil Stenstr\u0026#246;m\n- JavaScript port: [justjshtml](https://github.com/simonw/justjshtml) by Simon Willison\n- Test suite: [html5lib-tests](https://github.com/html5lib/html5lib-tests)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkylehowells%2Fswift-justhtml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkylehowells%2Fswift-justhtml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkylehowells%2Fswift-justhtml/lists"}