Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cryptoc1/earl
Earl is looking for URLs in your area.
https://github.com/cryptoc1/earl
crawler middleware nuget webscraping
Last synced: about 1 month ago
JSON representation
Earl is looking for URLs in your area.
- Host: GitHub
- URL: https://github.com/cryptoc1/earl
- Owner: Cryptoc1
- License: mit
- Created: 2021-08-04T07:47:40.000Z (over 3 years ago)
- Default Branch: develop
- Last Pushed: 2022-03-30T18:58:21.000Z (almost 3 years ago)
- Last Synced: 2024-03-15T13:11:14.710Z (10 months ago)
- Topics: crawler, middleware, nuget, webscraping
- Language: C#
- Homepage:
- Size: 796 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
earl
*Looking for URLs in your area.*
![Language](https://img.shields.io/github/languages/top/cryptoc1/earl)
[![Checks](https://img.shields.io/github/checks-status/cryptoc1/earl/develop)](https://github.com/Cryptoc1/earl/actions/workflows/default.yml)
[![Coverage](https://img.shields.io/codecov/c/github/cryptoc1/earl)](https://app.codecov.io/gh/Cryptoc1/earl/)
[![Version](https://img.shields.io/nuget/vpre/Earl.Crawler)](https://www.nuget.org/packages/Earl.Crawler)Earl is a suite of APIs for developing url crawlers & web scrapers driven by a middleware pattern similar to, and strongly influenced by, ASP.NET Core.
## Basic Usage
```csharp
var services = new ServiceCollection()
.AddEarlCrawler()
.AddEarlJsonPersistence()
.BuildServiceProvider();var crawler = services.GetService();
var options = CrawlerOptionsBuilder.CreateDefault()
.BatchSize( 50 )
.MaxRequestCount( 500 )
.On(
( CrawlUrlResultEvent e, CancellationToken cancellation ) =>
{
Console.WriteLine( $"Crawled {e.Result.Url}" );
return default;
}
)
.Timeout( TimeSpan.FromMinutes( 30 ) )
.Use(
( CrawlUrlContext context, CrawlUrlDelegate next ) =>
{
Console.WriteLine( $"Executing delegate middleware while crawling {context.Url}" );
return next( context );
}
)
.PersistTo( persist => persist.ToJson( json => json.Destination(...) ) )
.Build();await crawler.CrawlAsync( new Uri(...), options );
```## Documentation
Documentation can be find within the READMEs of the sub-directories representing the conceptual components of Earl:
- [Events](https://github.com/Cryptoc1/earl/tree/develop/src/Crawler/Events/README.md)
- [Middleware](https://github.com/Cryptoc1/earl/tree/develop/src/Crawler/Middleware/README.md)
- [Persistence](https://github.com/Cryptoc1/earl/tree/develop/src/Crawler/Persistence/README.md)All public APIs *should* contain thorough XML (triple slash) comments.
> *Something missing, still have questions? Please open an Issue or submit a PR!*