https://github.com/rflechner/scrapysharp

reborn of https://bitbucket.org/rflechner/scrapysharp
https://github.com/rflechner/scrapysharp

csharp dotnet fsharp html parsing scraper scraping scrapysharp

Last synced: 25 days ago
JSON representation

reborn of https://bitbucket.org/rflechner/scrapysharp

Host: GitHub
URL: https://github.com/rflechner/scrapysharp
Owner: rflechner
License: mit
Created: 2018-04-04T22:06:24.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2023-03-06T16:20:43.000Z (over 2 years ago)
Last Synced: 2025-05-16T02:07:11.495Z (25 days ago)
Topics: csharp, dotnet, fsharp, html, parsing, scraper, scraping, scrapysharp
Language: C#
Size: 751 KB
Stars: 352
Watchers: 21
Forks: 76
Open Issues: 18
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Getting started

ScrapySharp has a Web Client able to simulate a real Web browser (handle referrer, cookies …)

Html parsing has to be as natural as possible. So I like to use CSS Selectors and Linq.

This framework wraps HtmlAgilityPack.

## Basic examples of CssSelect usages

```C#

using System.Linq;

using HtmlAgilityPack;

using ScrapySharp.Extensions;

class Example

{

    public void Main()

    {

        var divs = html.CssSelect("div");  //all div elements

        var nodes = html.CssSelect("div.content"); //all div elements with css class ‘content’

        var nodes = html.CssSelect("div.widget.monthlist"); //all div elements with the both css class

        var nodes = html.CssSelect("#postPaging"); //all HTML elements with the id postPaging

        var nodes = html.CssSelect("div#postPaging.testClass"); // all HTML elements with the id postPaging and css class testClass

        var nodes = html.CssSelect("div.content > p.para"); //p elements who are direct children of div elements with css class ‘content’

        var nodes = html.CssSelect("input[type=text].login"); // textbox with css class login

    }

}

```

## Scrapysharp can also simulate a web browser

```C#

ScrapingBrowser browser = new ScrapingBrowser();

//set UseDefaultCookiesParser as false if a website returns invalid cookies format

//browser.UseDefaultCookiesParser = false;

WebPage homePage = browser.NavigateToPage(new Uri("http://www.bing.com/"));

PageWebForm form = homePage.FindFormById("sb_form");

form["q"] = "scrapysharp";

form.Method = HttpVerb.Get;

WebPage resultsPage = form.Submit();

HtmlNode[] resultsLinks = resultsPage.Html.CssSelect("div.sb_tlst h3 a").ToArray();

WebPage blogPage = resultsPage.FindLinks(By.Text("romcyber blog | Just another WordPress site")).Single().Click();

```

## Install Scrapysharp in your project

It's easy to use Scrapysharp in your project.

A Nuget package exists on [nuget.org](https://www.nuget.org/packages/ScrapySharp) and on [myget](https://www.myget.org/feed/romcyber/package/nuget/ScrapySharp)

## News

Scrapysharp V3 is a reborn.

Old version under GPL license is still on [bitbucket](https://bitbucket.org/rflechner/scrapysharp/src)

Version 3 is a conversion to .net standard 2.0 and a relicensing.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rflechner/scrapysharp

Awesome Lists containing this project

README