Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/adamfisher/scrapyrt.client

A C# client to make calls to a scrapyrt (Scrapy real-time) HTTP endpoint.
https://github.com/adamfisher/scrapyrt.client

crawler scraper scrapy scrapy-crawler scrapy-framework scrapy-spider

Last synced: about 1 month ago
JSON representation

A C# client to make calls to a scrapyrt (Scrapy real-time) HTTP endpoint.

Host: GitHub
URL: https://github.com/adamfisher/scrapyrt.client
Owner: adamfisher
License: mit
Created: 2019-07-27T15:42:25.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2022-12-08T05:53:04.000Z (about 2 years ago)
Last Synced: 2023-03-06T08:17:40.832Z (almost 2 years ago)
Topics: crawler, scraper, scrapy, scrapy-crawler, scrapy-framework, scrapy-spider
Language: C#
Size: 85.9 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        
# ScrapyRT.Client [![](https://raw.githubusercontent.com/pixel-cookers/built-with-badges/master/nuget/nuget-long.png)](https://www.nuget.org/packages/ScarpyRT.Client)

A strongly-typed C# client to make calls to a scrapyrt (Scrapy real-time) HTTP endpoint.

Please see [scrapyrt documentation](https://scrapyrt.readthedocs.io/en/latest/index.html) for complete details on making requests.

## Getting Started

You can initialize a new scrapyrt client by passing the base address to the location where your server is running:

```csharp

var client = new ScrapyRTClient("http://localhost:9080");

```

... or by passing your own `HttpClient` if you want more control over outgoing requests:

```csharp

var client = new ScrapyRTClient(new HttpClient() {BaseAddress = new Uri("http://localhost:9080")});

```

Assume we have an item model that correlates to the structure of the items scraped by scrapy:

```csharp

public class CountryItem

{

	public string CountryName { get; set; }

}

```

### GET Requests

The simplest way to get items from the scrapyrt endpoint is using a `GET` request. The following examples show how we call **ExampleSpider** with the url to be scraped:

Get a single item:

```csharp

CountryItem response = await client.GetSpiderSingleItemAsync("ExampleSpider", "http://example.webscraping.com");

```

Get a list of items:

```csharp

List response = await client.GetSpiderItemsAsync("ExampleSpider", "http://example.webscraping.com");

```

Get the complete crawl response including crawl stats:

```csharp

CrawlResponse response = await client.GetSpiderCrawlAsync("ExampleSpider", "http://example.webscraping.com");

```

### POST Requests

Making a `POST` request allows you to specify more advanced configuration for each call. The following examples show how we call **ExampleSpider** with the url to be scraped.

Get a single item:

```csharp

CountryItem response = await client.PostSpiderSingleItemAsync(new CrawlRequest()

{

    SpiderName = "ExampleSpider",

    Request = new TwistedRequest()

    {

        Url = new Uri("http://example.webscraping.com")

    }

});

```

Get a list of items:

```csharp

List response = await client.PostSpiderItemsAsync(new CrawlRequest()

{

    SpiderName = "ExampleSpider",

    Request = new TwistedRequest()

    {

        Url = new Uri("http://example.webscraping.com")

    }

});

```

Get the complete crawl response including crawl stats:

```csharp

CrawlResponse response = await client.PostSpiderCrawlAsync(new CrawlRequest()

{

    SpiderName = "ExampleSpider",

    Request = new TwistedRequest()

    {

        Url = new Uri("http://example.webscraping.com")

    }

});

```

There are tons of other options available to customize how scrapy's Twisted networking library makes the request on your behalf. Here we specify an `X-Example-Header` that should be passed when scrapy downloads the web page and to return no more than 3 results in the response:

```csharp

List response = await client.PostSpiderItemsAsync(new CrawlRequest()

{

    SpiderName = "ExampleSpider",

    MaxRequests = 3,

    Request = new TwistedRequest()

    {

        Url = new Uri("http://example.webscraping.com"),

        Headers = new Dictionary()

        {

            {"X-Example-Header", "Scrapy"}

        }

    }

});

```