Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/brandon689/htmlconverter

HtmlToJsonParser: A versatile C# library for converting HTML to JSON with multiple parsing modes and customizable options.
https://github.com/brandon689/htmlconverter

anglesharp converter csharp dotnet-core html html-parser html-to-json json json-converter parsing web-development web-tool

Last synced: 9 days ago
JSON representation

HtmlToJsonParser: A versatile C# library for converting HTML to JSON with multiple parsing modes and customizable options.

Awesome Lists containing this project

README

        

# HtmlToJsonParser 🔄

HtmlToJsonParser is a versatile C# library that converts HTML to JSON using various parsing modes and customizable options. It leverages AngleSharp for HTML parsing and provides flexible output formatting.

## ✨ Features

- Multiple parsing modes:
- Generic: Converts all HTML nodes to JSON
- Table: Converts HTML tables to structured JSON
- JSON-LD: Extracts JSON-LD data from HTML
- Customizable options:
- New line conversion in values
- Attribute prefix customization
- Text property name customization
- Output indentation control
- JSON unescaping
- Inside word trimming
- Multiple table conversion

## 🚀 Installation

To use HtmlToJsonParser in your project, you need to install the following NuGet packages:
- Install-Package AngleSharp
- Install-Package Newtonsoft.Json

## 🔍 Parsing Modes

### Generic Mode

Converts all HTML nodes to JSON objects and properties.

### Table Mode

Converts HTML tables into a structured JSON format. Each row becomes a JSON object with column headers as keys.

### JSON-LD Mode

Extracts and parses JSON-LD data from HTML documents.

## ⚙️ Options

- `ValueNewLineConversion`: Specifies how to handle new lines in text values.
- `AttributePrefix`: Sets the prefix for HTML attributes in the JSON output.
- `TextPropertyName`: Defines the property name for text nodes.
- `Indent`: Controls whether the output JSON is indented.
- `UnescapeJson`: Attempts to unescape the input if it appears to be HTML wrapped in a JSON string.
- `TrimInsideWords`: Trims multiple consecutive spaces inside words to a single space.
- `ConvertAllTables`: Controls whether all tables or just the first one should be converted in Table mode.

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## 📄 License

This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details.