{"id":15421655,"url":"https://github.com/jstedfast/htmlkit","last_synced_at":"2025-05-16T10:06:31.327Z","repository":{"id":33638545,"uuid":"37290911","full_name":"jstedfast/HtmlKit","owner":"jstedfast","description":"A cross-platform .NET framework for parsing HTML","archived":false,"fork":false,"pushed_at":"2025-04-27T19:34:04.000Z","size":1192,"stargazers_count":84,"open_issues_count":0,"forks_count":55,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-05-16T10:06:30.915Z","etag":null,"topics":["c-sharp","html","html-parser","html5","parser"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jstedfast.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"License.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"jstedfast"}},"created_at":"2015-06-11T23:09:21.000Z","updated_at":"2025-04-27T19:34:07.000Z","dependencies_parsed_at":"2024-06-05T17:11:19.832Z","dependency_job_id":"829769ac-ad30-4439-aa0a-6830d3211c48","html_url":"https://github.com/jstedfast/HtmlKit","commit_stats":{"total_commits":269,"total_committers":4,"mean_commits":67.25,"dds":"0.42750929368029744","last_synced_commit":"50b6f6167e66a8bb598d275e6ea68e94a654f8d4"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jstedfast%2FHtmlKit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jstedfast%2FHtmlKit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jstedfast%2FHtmlKit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jstedfast%2FHtmlKit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jstedfast","download_url":"https://codeload.github.com/jstedfast/HtmlKit/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254509475,"owners_count":22082891,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c-sharp","html","html-parser","html5","parser"],"created_at":"2024-10-01T17:35:28.889Z","updated_at":"2025-05-16T10:06:31.307Z","avatar_url":"https://github.com/jstedfast.png","language":"HTML","funding_links":["https://github.com/sponsors/jstedfast"],"categories":[],"sub_categories":[],"readme":"# HtmlKit\n\n[![Build Status](https://github.com/jstedfast/HtmlKit/actions/workflows/main.yml/badge.svg?event=push)](https://github.com/jstedfast/HtmlKit/actions/workflows/main.yml)[![Coverity Scan Build Status](https://scan.coverity.com/projects/5621/badge.svg)](https://scan.coverity.com/projects/5621)[![Coverage Status](https://coveralls.io/repos/github/jstedfast/HtmlKit/badge.svg?branch=master)](https://coveralls.io/github/jstedfast/HtmlKit?branch=master)\n\n## What is HtmlKit?\n\nHtmlKit is a cross-platform .NET framework for parsing HTML.\n\nHtmlKit implements the HTML5 tokenizing state machine described in\n[W3C's HTML5 Tokenization Specification](https://dev.w3.org/html5/spec-LC/tokenization.html).\n\n## Goals\n\nI haven't fully figured that out yet.\n\nSo far the goal is tokenizing HTML with the intention of using it for\n[MimeKit](https://github.com/jstedfast/MimeKit)'s\n[HtmlToHtml](http://www.mimekit.net/docs/html/T_MimeKit_Text_HtmlToHtml.htm)\ntext converter, replacing the quick \u0026 dirty HTML tokenizer I originally wrote.\n\nMaybe someday I'll implement a DOM. Who knows.\n\n## License Information\n\nHtmlKit is Copyright (C) 2015-2024 Jeffrey Stedfast and is licensed under the MIT license:\n\n    Permission is hereby granted, free of charge, to any person obtaining a copy\n    of this software and associated documentation files (the \"Software\"), to deal\n    in the Software without restriction, including without limitation the rights\n    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n    copies of the Software, and to permit persons to whom the Software is\n    furnished to do so, subject to the following conditions:\n\n    The above copyright notice and this permission notice shall be included in\n    all copies or substantial portions of the Software.\n\n    THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN\n    THE SOFTWARE.\n\n## Installing via NuGet\n\nThe easiest way to install HtmlKit is via [NuGet](https://www.nuget.org/packages/HtmlKit/).\n\nIn Visual Studio's [Package Manager Console](http://docs.nuget.org/docs/start-here/using-the-package-manager-console),\nsimply enter the following command:\n\n    Install-Package HtmlKit\n\n## Getting the Source Code\n\nFirst, you'll need to clone HtmlKit from my GitHub repository. To do this using the command-line version of Git,\nyou'll need to issue the following command in your terminal:\n\n    git clone https://github.com/jstedfast/HtmlKit.git\n\nIf you are using [TortoiseGit](https://tortoisegit.org) on Windows, you'll need to right-click in the directory\nwhere you'd like to clone HtmlKit and select **Git Clone...** in the menu. Once you do that, you'll get a dialog\nasking you to specify the repository you'd like to clone. In the textbox labeled **URL:**, enter\n`https://github.com/jstedfast/HtmlKit.git` and then click **OK**. This will clone HtmlKit onto your local machine.\n\n## Updating the Source Code\n\nOccasionally you might want to update your local copy of the source code if I have made changes to HtmlKit since you\ndownloaded the source code in the step above. To do this using the command-line version fo Git, you'll need to issue\nthe following command in your terminal within the HtmlKit directory:\n\n    git pull\n\nIf you are using [TortoiseGit](https://tortoisegit.org) on Windows, you'll need to right-click on the HtmlKit\ndirectory and select **Git Sync...** in the menu. Once you do that, you'll need to click the **Pull** button.\n\n## Building\n\nOnce you've opened **HtmlKit.sln** solution file in [Visual Studio](https://www.visualstudio.com/downloads/),\nyou can choose the **Debug** or **Release** build configuration and then build.\n\nBoth Visual Studio 2022 and Visual Studio 2019 should be able to build HtmlKit without any issues, but older versions such as\nVisual Studio 2015 and 2017 will likely require modifications to the projects in order to build correctly.\n\nNote: The **Release** build will generate the xml API documentation, but the **Debug** build will not.\n\n## Using HtmlKit\n\n### Parsing HTML\n\nThe primary purpose of HtmlKit is parsing HTML.\n\n```csharp\nusing (var reader = new StreamReader (stream)) {\n    var tokenizer = new HtmlTokenizer (reader);\n    HtmlToken token;\n\n    // ReadNextToken() returns `false` when the end of the stream is reached.\n    while (tokenizer.ReadNextToken (out token)) {\n        switch (token.Kind) {\n        case HtmlTokenKind.ScriptData:\n        case HtmlTokenKind.CData:\n        case HtmlTokenKind.Data:\n            // ScriptData, CData, and Data tokens contain text data.\n            var text = (HtmlDataToken) token;\n\n            Console.WriteLine (\"{0}: {1}\", token.Kind, text.Data);\n            break;\n        case HtmlTokenKind.Tag:\n            // Tag tokens represent tags and their attributes.\n            var tag = (HtmlTagToken) token;\n\n            Console.Write (\"\u003c{0}{1}\", tag.IsEndTag ? \"/\" : \"\", tag.Name);\n\n            foreach (var attribute in tag.Attributes) {\n                if (attribute.Value != null)\n                    Console.Write (\" {0}={1}\", attribute.Name, Quote (attribute.Value));\n                else\n                    Console.Write (\" {0}\", attribute.Name);\n            }\n\n            Console.WriteLine (tag.IsEmptyElement ? \"/\u003e\" : \"\u003e\");\n            break;\n        case HtmlTokenKind.Comment:\n            var comment = (HtmlCommentToken) token;\n\n            Console.WriteLine (\"Comment: {0}\", comment.Comment);\n            break;\n        case HtmlTokenKind.DocType:\n            var doctype = (HtmlDocTypeToken) token;\n\n            if (doctype.ForceQuirksMode)\n                Console.Write (\"\u003c!-- force quirks mode --\u003e\");\n\n            Console.Write (\"\u003c!DOCTYPE\");\n\n            if (doctype.Name != null)\n                Console.Write (\" {0}\", doctype.Name.ToUpperInvariant ());\n\n            if (doctype.PublicIdentifier != null) {\n                Console.Write (\" PUBLIC \\\"{0}\\\"\", doctype.PublicIdentifier);\n                if (doctype.SystemIdentifier != null)\n                    Console.Write (\" \\\"{0}\\\"\", doctype.SystemIdentifier);\n            } else if (doctype.SystemIdentifier != null) {\n                Console.Write (\" SYSTEM \\\"{0}\\\"\", doctype.SystemIdentifier);\n            }\n\n            Console.WriteLine (\"\u003e\");\n            break;\n        }\n    }\n}\n```\n\n## Contributing\n\nThe first thing you'll need to do is fork HtmlKit to your own GitHub repository. For instructions on how to\ndo that, see the section titled **Getting the Source Code**.\n\nIf you use [Visual Studio for Mac](https://visualstudio.microsoft.com/vs/mac/) or [MonoDevelop](http://monodevelop.com),\nall of the solution files are configured with the coding style used by HtmlKit. If you use Visual Studio on Windows or\nsome other editor, please try to maintain the existing coding style as best as you can.\n\nOnce you've got some changes that you'd like to submit upstream to the official HtmlKit repository,\nsend me a **Pull Request** and I will try to review your changes in a timely manner.\n\nIf you'd like to contribute but don't have any particular features in mind to work on, check out the issue\ntracker and look for something that might pique your interest!\n\n## Reporting Bugs\n\nHave a bug or a feature request? Please open a new\n[bug report](https://github.com/jstedfast/HtmlKit/issues/new?template=bug_report.md)\nor\n[feature request](https://github.com/jstedfast/HtmlKit/issues/new?template=feature_request.md).\n\nBefore opening a new issue, please search through any [existing issues](https://github.com/jstedfast/HtmlKit/issues)\nto avoid submitting duplicates.\n\nIf you are getting an exception from somewhere within HtmlKit, don't just provide the `Exception.Message`\nstring. Please include the `Exception.StackTrace` as well. The `Message`, by itself, is often useless.\n\n## Documentation\n\nAPI documentation can be found in the source code in the form of XML doc comments.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjstedfast%2Fhtmlkit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjstedfast%2Fhtmlkit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjstedfast%2Fhtmlkit/lists"}