{"id":20710176,"url":"https://github.com/oxylabs/csharp-web-scraping","last_synced_at":"2026-06-10T09:31:27.244Z","repository":{"id":134336580,"uuid":"526100080","full_name":"oxylabs/csharp-web-scraping","owner":"oxylabs","description":"Web Scraping With C#","archived":false,"fork":false,"pushed_at":"2025-06-26T08:22:46.000Z","size":36,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-26T09:30:42.438Z","etag":null,"topics":["csharp","web-scraping","web-scraping-csharp"],"latest_commit_sha":null,"homepage":"https://oxylabs.io/blog/csharp-web-scraping","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oxylabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-08-18T07:17:24.000Z","updated_at":"2025-06-26T08:22:49.000Z","dependencies_parsed_at":"2023-12-06T12:42:16.185Z","dependency_job_id":"18903a07-98c5-43d7-bafa-f632a17d30aa","html_url":"https://github.com/oxylabs/csharp-web-scraping","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/oxylabs/csharp-web-scraping","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fcsharp-web-scraping","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fcsharp-web-scraping/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fcsharp-web-scraping/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fcsharp-web-scraping/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oxylabs","download_url":"https://codeload.github.com/oxylabs/csharp-web-scraping/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fcsharp-web-scraping/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34146871,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-10T02:00:07.152Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csharp","web-scraping","web-scraping-csharp"],"created_at":"2024-11-17T02:10:20.496Z","updated_at":"2026-06-10T09:31:27.170Z","avatar_url":"https://github.com/oxylabs.png","language":"C#","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Web Scraping With C#\n\n\n[![Oxylabs promo code](https://raw.githubusercontent.com/oxylabs/product-integrations/refs/heads/master/Affiliate-Universal-1090x275.png)](https://oxylabs.io/pages/gitoxy?utm_source=877\u0026utm_medium=affiliate\u0026groupid=877\u0026utm_content=csharp-web-scraping-github\u0026transaction_id=102f49063ab94276ae8f116d224b67)\n\n\n[![](https://dcbadge.limes.pink/api/server/Pds3gBmKMH?style=for-the-badge\u0026theme=discord)](https://discord.gg/Pds3gBmKMH) [![YouTube](https://img.shields.io/badge/YouTube-Oxylabs-red?style=for-the-badge\u0026logo=youtube\u0026logoColor=white)](https://www.youtube.com/@oxylabs)\nSee the full article on our [website](https://oxylabs.io/blog/csharp-web-scraping), where we also go over how to scrape dynamic pages in C#.\n\n## Setup Development environment\n\n```bash\ndotnet --version\n```\n## Project Structure and Dependencies\n\n```bash\ndotnet new console\n```\n\n```bash\ndotnet add package HtmlAgilityPack\n```\n\n```bash\ndotnet add package CsvHelper\n```\n\n## Download and Parse Web Pages\n\nThe first step of any web scraping program is to download the HTML of a web page. This HTML will be a string that you’ll need to convert into an object that can be processed further. The latter part is called parsing. Html Agility Pack can read and parse files from local files, HTML strings, any URL, or even a browser. \n\nIn our case, all we need to do is get HTML from a URL. Instead of using .NET native functions, Html Agility Pack provides a convenient class – HtmlWeb. This class offers a Load function that can take a URL and return an instance of the HtmlDocument class, which is also part of the package we use. With this information, we can write a function that takes a URL and returns an instance of HtmlDocument.\nThe first step is to import the required library files. Open the `Program.cs` file and import the library files using the following code:\n\n```csharp\nusing HtmlAgilityPack;\n```\n\nThen, open `Program.cs` file and enter this function in the class Program:\n\n```csharp\n// Parses the URL and returns HtmlDocument object\nstatic HtmlDocument GetDocument(string url)\n{\n    HtmlWeb web = new HtmlWeb();\n    HtmlDocument doc = web.Load(url);\n    return doc;\n}\n```\n\nWith this, the first step of the code is complete. The next step is to parse the document. \n\n## Parsing the HTML: Getting Book Links\n\nIn this part of the code, we’ll be extracting the required information from the web page. At this stage, a document is now an object of type HtmlDocument. This class exposes two functions to select the elements. Both functions accept XPath as input and return HtmlNode or HtmlNodeCollection. Here is the signature of these two functions:\n\n```csharp\npublic HtmlNodeCollection SelectNodes(string xpath);\n```\n\n```csharp\npublic HtmlNode SelectSingleNode(string xpath);\n```\n\nLet’s discuss `SelectNodes` first.\n\nFor this example – C# web scraper – we are going to scrape all the book details from this page. First, it needs to be parsed so that all the links to the books can be extracted. To do that, open this page in the browser, right-click any of the book links and click Inspect. This will open the Developer Tools. \n\nAfter understanding some time with the markup, your XPath to select should be something like this:\n\n```css\n//h3/a\n```\n\nThis XPath can now be passed to the `SelectNodes` function.\n\n```csharp\nHtmlDocument doc = GetDocument(url);\nHtmlNodeCollection linkNodes = doc.DocumentNode.SelectNodes(\"//h3/a\");\n```\n\nNote that the `SelectNodes` function is being called by the `DocumentNode` attribute of the `HtmlDocument`.\n\nThe variable `linkNodes` is a collection. We can write a `foreach` loop over it and get the `href` from each link one by one. There is one tiny problem that we need to take care of – the links on the page are relative. Hence, they need to be converted into an absolute URL before we can scrape these extracted links. \n\nFor converting the relative URLs, we can make use of the `Uri` class. We can use this constructor to get a `Uri` object with an absolute URL.\n\n```csharp\nUri(Uri baseUri, string? relativeUri);\n```\n\nOnce we have the Uri object, we can simply check the `AbsoluteUri` property to get the complete URL.\n\nWe can write all this in a function to keep the code organized.\n\n```csharp\nstatic List\u003cstring\u003e GetBookLinks(string url)\n    {\n        var bookLinks = new List\u003cstring\u003e();\n        HtmlDocument doc = GetDocument(url);\n        HtmlNodeCollection linkNodes = doc.DocumentNode.SelectNodes(\"//h3/a\");\n        var baseUri = new Uri(url);\n        foreach (var link in linkNodes)\n        {\n            string href = link.Attributes[\"href\"].Value;\n            bookLinks.Add(new Uri(baseUri, href).AbsoluteUri);\n        }\n        return bookLinks;\n    }\n```\n\nIn this function, we are starting with an empty `List\u003cstring\u003e` object. In the `foreach` loop, we are adding all the links to this object and returning it.\n\nNow, it’s time to modify the `Main()` function so that we can test the C# code that we have written so far. Modify the function so that it looks like this:\n\n```csharp\nstatic void Main(string[] args)\n{\n    var bookLinks = GetBookLinks(\"http://books.toscrape.com/catalogue/category/books/mystery_3/index.html\");\n    Console.WriteLine(\"Found {0} links\", bookLinks.Count);\n}\n```\n\nTo run this code, open the terminal and navigate to the directory which contains this file, and type in the following:\n\n```bash\ndotnet run\n```\n\nThe output should be as follows:\n\n```\nFound 20 links\n```\n\nLet’s move to the next part where we will be processing all the links to get the book data.\n\n## Parsing the HTML: Getting Book Details\n\nAt this point, we have a list of strings that contain the URLs of the books. We can simply write a loop that will first get the document using the GetDocument function that we’ve already written. After that, we’ll use the SelectSingleNode function to extract the title and the price of the book.\n\nTo keep the data organized, let’s start with a class. This class will represent a book. This class will have two properties – Title and Price. It will look like this:\n\n```csharp\npublic class Book\n{\n    public string Title { get; set; }\n    public string Price { get; set; }\n}\n```\n\nNow, open a book page in the browser and create the XPath for the `Title – //h1`. Creating an XPath for the price is a little trickier because the additional books at the bottom have the same class applied.\n\n![](https://images.prismic.io/oxylabs-sm/ZTkxNzAzYWUtMzJmZC00YmIwLTg1MTktODgwMTVlYTcyOTg5_pricexpath.png?auto=compress,format\u0026rect=0,0,1623,600\u0026w=1623\u0026h=600\u0026fm=webp\u0026q=75)\n\nThe XPath of the price will be this:\n\n```\n//div[contains(@class,\"product_main\")]/p[@class=\"price_color\"]\n```\n\nNote that XPath contains double quotes. We will have to escape these characters by prefixing them with a backslash. \n\nNow we can use the `SelectSingleNode` function to get the Node, and then employ the `InnerText` property to get the text contained in the element. We can organize everything in a function as follows:\n\n```csharp\nstatic List\u003cBook\u003e GetBookDetails(List\u003cstring\u003e urls)\n{\n    var books = new List\u003cBook\u003e();\n    foreach (var url in urls)\n    {\n        HtmlDocument document = GetDocument(url);\n        var titleXPath = \"//h1\";\n        var priceXPath = \"//div[contains(@class,\\\"product_main\\\")]/p[@class=\\\"price_color\\\"]\";\n        var book = new Book();\n        book.Title = document.DocumentNode.SelectSingleNode(titleXPath).InnerText;\n        book.Price = document.DocumentNode.SelectSingleNode(priceXPath).InnerText;\n        books.Add(book);\n    }\n    return books;\n}\n```\n\nThis function will return a list of `Book` objects. It’s time to update the `Main()` function as well:\n\n```csharp\nstatic void Main(string[] args)\n{\n    var bookLinks = GetBookLinks(\"http://books.toscrape.com/catalogue/category/books/mystery_3/index.html\");\n    Console.WriteLine(\"Found {0} links\", bookLinks.Count);\n    var books = GetBookDetails(bookLinks);\n}\n```\n\n## Exporting Data\nIf you haven’t yet installed the `CsvHelper`, you can do this by running the command `dotnet add package CsvHelper` from within the terminal.\nAfter installation, import the `CsvHelper` class in your `Program.cs` file like this:\n```csharp\nusing System.Globalization;\nusing CsvHelper;\n```\n\nThe export function is pretty straightforward. First, we need to create a `StreamWriter` and send the CSV file name as the parameter. Next, we will use this object to create a `CsvWriter`. Finally, we can use the `WriteRecords` function to write all the books in just one line of code. \n\nTo ensure that all the resources are closed properly, we can use the `using` block. We can also wrap everything in a function as follows:\n\n```csharp\nstatic void exportToCSV(List\u003cBook\u003e books)\n{\n    using (var writer = new StreamWriter(\"./books.csv\"))\n    using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))\n    {\n        csv.WriteRecords(books);\n    }\n}\n```\n\nFinally, we can call this function from the `Main()` function:\n\n```csharp\nstatic void Main(string[] args)\n{\n    var bookLinks = GetBookLinks(\"http://books.toscrape.com/catalogue/category/books/mystery_3/index.html\");\n    var books = GetBookDetails(bookLinks);\n    exportToCSV(books);\n}\n```\n\nLet’s bring together all the snippets and have a look at the complete code:\n```csharp\nusing System.Globalization;\nusing CsvHelper;\nusing HtmlAgilityPack;\n\nnamespace webscraping\n{\n\n\n\n    class Program\n    {\n        static HtmlDocument GetDocument(string url)\n        {\n            HtmlWeb web = new HtmlWeb();\n            HtmlDocument doc = web.Load(url);\n            return doc;\n        }\n        static List\u003cstring\u003e GetBookLinks(string url)\n        {\n            var bookLinks = new List\u003cstring\u003e();\n            HtmlDocument doc = GetDocument(url);\n            HtmlNodeCollection linkNodes = doc.DocumentNode.SelectNodes(\"//h3/a\");\n            var baseUri = new Uri(url);\n            foreach (var link in linkNodes)\n            {\n                string href = link.Attributes[\"href\"].Value;\n                bookLinks.Add(new Uri(baseUri, href).AbsoluteUri);\n            }\n            return bookLinks;\n        }\n        static List\u003cBook\u003e GetBookDetails(List\u003cstring\u003e urls)\n        {\n            var books = new List\u003cBook\u003e();\n            foreach (var url in urls)\n            {\n                HtmlDocument document = GetDocument(url);\n                var titleXPath = \"//h1\";\n                var priceXPath = \"//div[contains(@class,\\\"product_main\\\")]/p[@class=\\\"price_color\\\"]\";\n                var book = new Book();\n                book.Title = document.DocumentNode.SelectSingleNode(titleXPath).InnerText;\n                book.Price = document.DocumentNode.SelectSingleNode(priceXPath).InnerText;\n                books.Add(book);\n            }\n            return books;\n        }\n        static void exportToCSV(List\u003cBook\u003e books)\n        {\n            using (var writer = new StreamWriter(\"books.csv\"))\n            using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))\n            {\n                csv.WriteRecords(books);\n            }\n        }\n\n        static void Main(string[] args)\n        {\n            var bookLinks = GetBookLinks(\"http://books.toscrape.com/catalogue/category/books/mystery_3/index.html\");\n            Console.WriteLine(\"Found {0} links\", bookLinks.Count);\n            var books = GetBookDetails(bookLinks);\n            exportToCSV(books);\n        }\n\n\n    }\n}\n```\n\nThat’s it! To run this code, open the terminal and run the following command:\n\n```bash\ndotnet run\n```\n\nWithin seconds, you will have a `books.csv` file created.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foxylabs%2Fcsharp-web-scraping","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foxylabs%2Fcsharp-web-scraping","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foxylabs%2Fcsharp-web-scraping/lists"}