{"id":15060934,"url":"https://github.com/rflechner/scrapysharp","last_synced_at":"2025-05-16T02:07:22.227Z","repository":{"id":46831813,"uuid":"128127670","full_name":"rflechner/ScrapySharp","owner":"rflechner","description":"reborn of https://bitbucket.org/rflechner/scrapysharp","archived":false,"fork":false,"pushed_at":"2023-03-06T16:20:43.000Z","size":769,"stargazers_count":352,"open_issues_count":18,"forks_count":76,"subscribers_count":21,"default_branch":"master","last_synced_at":"2025-05-16T02:07:11.495Z","etag":null,"topics":["csharp","dotnet","fsharp","html","parsing","scraper","scraping","scrapysharp"],"latest_commit_sha":null,"homepage":null,"language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rflechner.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-04-04T22:06:24.000Z","updated_at":"2025-04-30T21:21:58.000Z","dependencies_parsed_at":"2024-06-21T07:11:25.597Z","dependency_job_id":"2fcf4dd2-5a97-4bb7-b26c-3d60acda5330","html_url":"https://github.com/rflechner/ScrapySharp","commit_stats":{"total_commits":25,"total_committers":5,"mean_commits":5.0,"dds":0.36,"last_synced_commit":"796a486333a9ddcc04e3970610831e63b7d41d55"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rflechner%2FScrapySharp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rflechner%2FScrapySharp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rflechner%2FScrapySharp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rflechner%2FScrapySharp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rflechner","download_url":"https://codeload.github.com/rflechner/ScrapySharp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254453652,"owners_count":22073617,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csharp","dotnet","fsharp","html","parsing","scraper","scraping","scrapysharp"],"created_at":"2024-09-24T23:06:53.197Z","updated_at":"2025-05-16T02:07:22.180Z","avatar_url":"https://github.com/rflechner.png","language":"C#","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Getting started\n\nScrapySharp has a Web Client able to simulate a real Web browser (handle referrer, cookies …)\n\nHtml parsing has to be as natural as possible. So I like to use CSS Selectors and Linq.\n\nThis framework wraps HtmlAgilityPack.\n\n## Basic examples of CssSelect usages\n\n```C#\n\nusing System.Linq;\nusing HtmlAgilityPack;\nusing ScrapySharp.Extensions;\n\nclass Example\n{\n    public void Main()\n    {\n        var divs = html.CssSelect(\"div\");  //all div elements\n        var nodes = html.CssSelect(\"div.content\"); //all div elements with css class ‘content’\n        var nodes = html.CssSelect(\"div.widget.monthlist\"); //all div elements with the both css class\n        var nodes = html.CssSelect(\"#postPaging\"); //all HTML elements with the id postPaging\n        var nodes = html.CssSelect(\"div#postPaging.testClass\"); // all HTML elements with the id postPaging and css class testClass\n\n        var nodes = html.CssSelect(\"div.content \u003e p.para\"); //p elements who are direct children of div elements with css class ‘content’\n\n        var nodes = html.CssSelect(\"input[type=text].login\"); // textbox with css class login\n    }\n}\n```\n\n## Scrapysharp can also simulate a web browser\n\n```C#\n\nScrapingBrowser browser = new ScrapingBrowser();\n\n//set UseDefaultCookiesParser as false if a website returns invalid cookies format\n//browser.UseDefaultCookiesParser = false;\n\nWebPage homePage = browser.NavigateToPage(new Uri(\"http://www.bing.com/\"));\n\nPageWebForm form = homePage.FindFormById(\"sb_form\");\nform[\"q\"] = \"scrapysharp\";\nform.Method = HttpVerb.Get;\nWebPage resultsPage = form.Submit();\n\nHtmlNode[] resultsLinks = resultsPage.Html.CssSelect(\"div.sb_tlst h3 a\").ToArray();\n\nWebPage blogPage = resultsPage.FindLinks(By.Text(\"romcyber blog | Just another WordPress site\")).Single().Click();\n```\n\n## Install Scrapysharp in your project\n\nIt's easy to use Scrapysharp in your project.\n\nA Nuget package exists on [nuget.org](https://www.nuget.org/packages/ScrapySharp) and on [myget](https://www.myget.org/feed/romcyber/package/nuget/ScrapySharp)\n\n## News\n\nScrapysharp V3 is a reborn.\n\nOld version under GPL license is still on [bitbucket](https://bitbucket.org/rflechner/scrapysharp/src)\n\nVersion 3 is a conversion to .net standard 2.0 and a relicensing.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frflechner%2Fscrapysharp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frflechner%2Fscrapysharp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frflechner%2Fscrapysharp/lists"}