Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/adamfisher/scrapyrt.client
A C# client to make calls to a scrapyrt (Scrapy real-time) HTTP endpoint.
https://github.com/adamfisher/scrapyrt.client
crawler scraper scrapy scrapy-crawler scrapy-framework scrapy-spider
Last synced: about 1 month ago
JSON representation
A C# client to make calls to a scrapyrt (Scrapy real-time) HTTP endpoint.
- Host: GitHub
- URL: https://github.com/adamfisher/scrapyrt.client
- Owner: adamfisher
- License: mit
- Created: 2019-07-27T15:42:25.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T05:53:04.000Z (about 2 years ago)
- Last Synced: 2023-03-06T08:17:40.832Z (almost 2 years ago)
- Topics: crawler, scraper, scrapy, scrapy-crawler, scrapy-framework, scrapy-spider
- Language: C#
- Size: 85.9 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ScrapyRT.Client [![](https://raw.githubusercontent.com/pixel-cookers/built-with-badges/master/nuget/nuget-long.png)](https://www.nuget.org/packages/ScarpyRT.Client)
A strongly-typed C# client to make calls to a scrapyrt (Scrapy real-time) HTTP endpoint.Please see [scrapyrt documentation](https://scrapyrt.readthedocs.io/en/latest/index.html) for complete details on making requests.
## Getting Started
You can initialize a new scrapyrt client by passing the base address to the location where your server is running:
```csharp
var client = new ScrapyRTClient("http://localhost:9080");
```... or by passing your own `HttpClient` if you want more control over outgoing requests:
```csharp
var client = new ScrapyRTClient(new HttpClient() {BaseAddress = new Uri("http://localhost:9080")});
```Assume we have an item model that correlates to the structure of the items scraped by scrapy:
```csharp
public class CountryItem
{
public string CountryName { get; set; }
}
```### GET Requests
The simplest way to get items from the scrapyrt endpoint is using a `GET` request. The following examples show how we call **ExampleSpider** with the url to be scraped:
Get a single item:
```csharp
CountryItem response = await client.GetSpiderSingleItemAsync("ExampleSpider", "http://example.webscraping.com");
```Get a list of items:
```csharp
List response = await client.GetSpiderItemsAsync("ExampleSpider", "http://example.webscraping.com");
```Get the complete crawl response including crawl stats:
```csharp
CrawlResponse response = await client.GetSpiderCrawlAsync("ExampleSpider", "http://example.webscraping.com");
```### POST Requests
Making a `POST` request allows you to specify more advanced configuration for each call. The following examples show how we call **ExampleSpider** with the url to be scraped.
Get a single item:
```csharp
CountryItem response = await client.PostSpiderSingleItemAsync(new CrawlRequest()
{
SpiderName = "ExampleSpider",
Request = new TwistedRequest()
{
Url = new Uri("http://example.webscraping.com")
}
});
```Get a list of items:
```csharp
List response = await client.PostSpiderItemsAsync(new CrawlRequest()
{
SpiderName = "ExampleSpider",
Request = new TwistedRequest()
{
Url = new Uri("http://example.webscraping.com")
}
});
```Get the complete crawl response including crawl stats:
```csharp
CrawlResponse response = await client.PostSpiderCrawlAsync(new CrawlRequest()
{
SpiderName = "ExampleSpider",
Request = new TwistedRequest()
{
Url = new Uri("http://example.webscraping.com")
}
});
```There are tons of other options available to customize how scrapy's Twisted networking library makes the request on your behalf. Here we specify an `X-Example-Header` that should be passed when scrapy downloads the web page and to return no more than 3 results in the response:
```csharp
List response = await client.PostSpiderItemsAsync(new CrawlRequest()
{
SpiderName = "ExampleSpider",
MaxRequests = 3,
Request = new TwistedRequest()
{
Url = new Uri("http://example.webscraping.com"),
Headers = new Dictionary()
{
{"X-Example-Header", "Scrapy"}
}
}
});
```