https://github.com/ajordan2984/htmltotext
A compact library written in C# to parse out all the text from news articles.
https://github.com/ajordan2984/htmltotext
csharp-library csharp-web html-parser-library htmlparser htmltag newsarticles textparser
Last synced: 5 months ago
JSON representation
A compact library written in C# to parse out all the text from news articles.
- Host: GitHub
- URL: https://github.com/ajordan2984/htmltotext
- Owner: ajordan2984
- Created: 2020-11-11T22:53:05.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2025-03-23T22:02:22.000Z (10 months ago)
- Last Synced: 2025-06-23T03:42:23.423Z (7 months ago)
- Topics: csharp-library, csharp-web, html-parser-library, htmlparser, htmltag, newsarticles, textparser
- Language: C#
- Homepage:
- Size: 8.79 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[DEPRECATED]
 
# HtmlToText
## Version
1.0
## Summary
A compact library written in C# to parse out all the text from news articles.
## Support
Text from the follow tags can be accessed:
* p
* div
* h1 - h6
* meta
* og:site_name
* og:url
* og:title
* og:description
* og:image
* og:image:alt
* article:author
* article:section
* article:tag
* article:published_time
* article:modified_time
* script
* application/ld+json
## Example
This is how you would make your request.
```cs
HtmlParser hp = new HtmlParser();
hp.ParseUrl(@"SAMPLE URL HERE");
foreach(var item in hp.AllExceptions)
Console.WriteLine(item);
foreach(var item in hp.Paragraph)
Console.WriteLine(item);
foreach(var item in hp.Div)
Console.WriteLine(item);
```