{"id":13825544,"url":"https://github.com/stanzhai/Html2Article","last_synced_at":"2025-07-08T22:31:29.131Z","repository":{"id":6164491,"uuid":"7394180","full_name":"stanzhai/Html2Article","owner":"stanzhai","description":"Html网页正文提取","archived":false,"fork":false,"pushed_at":"2022-05-09T10:42:24.000Z","size":747,"stargazers_count":492,"open_issues_count":7,"forks_count":173,"subscribers_count":36,"default_branch":"master","last_synced_at":"2024-11-14T01:02:20.710Z","etag":null,"topics":["article","content","crawler","html","spider","topic"],"latest_commit_sha":null,"homepage":"http://www.cnblogs.com/jasondan/p/3497757.html","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stanzhai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"License.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-01-01T08:38:36.000Z","updated_at":"2024-11-01T06:29:10.000Z","dependencies_parsed_at":"2022-08-07T17:45:08.901Z","dependency_job_id":null,"html_url":"https://github.com/stanzhai/Html2Article","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stanzhai%2FHtml2Article","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stanzhai%2FHtml2Article/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stanzhai%2FHtml2Article/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stanzhai%2FHtml2Article/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stanzhai","download_url":"https://codeload.github.com/stanzhai/Html2Article/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225362123,"owners_count":17462371,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["article","content","crawler","html","spider","topic"],"created_at":"2024-08-04T09:01:23.079Z","updated_at":"2024-11-20T04:30:32.375Z","avatar_url":"https://github.com/stanzhai.png","language":"C#","readme":"# Html2Article\n\n.NET平台下，一个高效的从Html中提取正文的工具。  \n正文提取采用了基于文本密度的提取算法，支持从压缩的Html文档中提取正文，每个页面平均提取时间为30ms，正确率在95%以上。  \n![Html2Article](http://stanzhai.github.io/images/project/Html2Article.png)\n\n## Html2Article特色\n\n* 标签无关，提取正文不依赖标签；\n* 支持从压缩的html文档中提取正文内容；\n* 支持带标签输出原始正文；\n* 核心算法简洁高效，平均提取时间在30ms左右。\n\n## 让你的项目支持Html正文提取\n\n- **`PM\u003e Install-Package Html2Article`**\n- **引入命名空间`using StanSoft;`。**\n- **添加如下代码：**\n\n```C#\n// html为你要提取的html文本\nstring html = \"\u003chtml\u003e....\u003c/html\u003e\";\n// article对象包含Title(标题)，PublishDate(发布日期)，Content(正文)和ContentWithTags(带标签正文)四个属性\nArticle article = Html2Article.GetArticle(html);\n```\n\n## Html2Article类\n\n- **Html2Article类是提取正文的核心类**\n- **Html2Article配置说明**  \n\t* AppendMode：是否使用正文追加模式，默认为false，设置为true会将更多符合条件的文本添加到正文。\n\t* Depth：分析的深度，默认为5，对于行空隙较大的页面可增加此值。  \n\t* LimitCount：字符限定数，当分析的文本数量达到限定数则认为进入正文内容，默认为180个字符。  \n\t* GetArticle(string html)：从Html文本中获取Article。\n\n## License\n\n[Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0)\n","funding_links":[],"categories":["C\\#","C# #"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstanzhai%2FHtml2Article","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstanzhai%2FHtml2Article","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstanzhai%2FHtml2Article/lists"}