Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/a-gubskiy/x.web.metaextractor
Library which allow extract meta information from page
https://github.com/a-gubskiy/x.web.metaextractor
dncuug extract-meta-information metadata metadata-extraction net-core open-graph web
Last synced: about 2 months ago
JSON representation
Library which allow extract meta information from page
- Host: GitHub
- URL: https://github.com/a-gubskiy/x.web.metaextractor
- Owner: a-gubskiy
- License: mit
- Created: 2017-07-23T16:18:32.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2024-10-28T08:30:51.000Z (2 months ago)
- Last Synced: 2024-10-28T10:21:33.834Z (2 months ago)
- Topics: dncuug, extract-meta-information, metadata, metadata-extraction, net-core, open-graph, web
- Language: C#
- Homepage: https://nuget.org/packages/X.Web.MetaExtractor
- Size: 304 KB
- Stars: 7
- Watchers: 4
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE.md
Awesome Lists containing this project
README
# X.Web.MetaExtractor
[![NuGet version](https://badge.fury.io/nu/X.Web.MetaExtractor.svg)](https://badge.fury.io/nu/X.Web.MetaExtractor)
[![Twitter URL](https://img.shields.io/twitter/url/https/twitter.com/andrew_gubskiy.svg?style=social&label=Follow%20me!)](https://twitter.com/intent/user?screen_name=andrew_gubskiy)**X.Web.MetaExtractor** is a powerful library that allows you to extract meta information from any web page URL. It provides a variety of content loaders to handle HTTP requests using different libraries.
## Breaking Changes
- **Metadata class was changes**: The `Content` field has been removed from the `Metadata` class. Ensure to update your code to reflect this change if you were using the `Content` field.
- **Description Extraction Logic**: The `Extractor` class now only extracts the description from meta tags, without attempting to parse the content of the page. Adjust your implementation if it relied on content parsing for the description.## Features
- Extract meta information from any web page URL.
- Support for multiple HTTP libraries:
- Flurl
- FsHttp
- RestSharp
- Detect the language of the page content.## Installation
To install the library, use the following command:
```bash
dotnet add package X.Web.MetaExtractor
```## Usage
Here is a basic example of how to use the `X.Web.MetaExtractor` library:
```csharp
using X.Web.MetaExtractor;
using X.Web.MetaExtractor.ContentLoaders;
using X.Web.MetaExtractor.LanguageDetectors;// Create instances of the necessary components
IPageContentLoader contentLoader = new FlurlPageContentLoader();
ILanguageDetector languageDetector = new LanguageDetector();
string defaultImage = "https://example.com/example.jpg";// Create an instance of the Extractor
IExtractor extractor = new Extractor(defaultImage, contentLoader, languageDetector);// Extract meta information from a URL
var metaInfo = await extractor.ExtractAsync( new Uri("https://example.com"));// Display the extracted meta information
Console.WriteLine($"Title: {metaInfo.Title}");
Console.WriteLine($"Description: {metaInfo.Description}");
Console.WriteLine($"Keywords: {metaInfo.Keywords}");
Console.WriteLine($"Language: {metaInfo.Language}");
```## Interfaces and Classes
### IExtractor
`IExtractor` defines the interface for extracting meta information.
### ILanguageDetector
`ILanguageDetector` defines the interface for detecting the language of the page content.
### IPageContentLoader
`IPageContentLoader` defines the interface for loading the content of a web page.
### Metadata
`Metadata` is a class that holds the meta information of a web page, including the title, description, keywords, and language.
## Content Loaders
### Flurl
`X.Web.MetaExtractor.ContentLoaders.Flurl` provides a content loader using the Flurl HTTP library, enabling efficient and fluent HTTP request handling for meta information extraction from any page URL.
### FsHttp
`X.Web.MetaExtractor.ContentLoaders.FsHttp` leverages the FsHttp library to load content, facilitating robust and type-safe HTTP request execution for extracting meta information from any page URL.
### HttpClient
`X.Web.MetaExtractor.ContentLoaders.HttpClient` utilizes the HttpClient class to load content, offering a flexible and reliable approach to perform HTTP requests for meta information extraction from any page URL.
### RestSharp
`X.Web.MetaExtractor.ContentLoaders.RestSharp` uses the RestSharp library for content loading, providing an intuitive and powerful way to handle HTTP requests for extracting meta information from any page URL.
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the MIT License. See the [LICENSE](https://github.com/ernado-x/X.Web.MetaExtractor/blob/master/LICENSE) file for more details.