{"id":20709980,"url":"https://github.com/oxylabs/web-scraping-r","last_synced_at":"2025-11-01T01:30:25.074Z","repository":{"id":134336693,"uuid":"464486977","full_name":"oxylabs/web-scraping-r","owner":"oxylabs","description":"A tutorial for web scraping with R","archived":false,"fork":false,"pushed_at":"2025-02-11T12:47:27.000Z","size":12,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-11T13:42:44.543Z","etag":null,"topics":["proxy-scraper","r-language","web-scraping","wikipedia-scraper"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oxylabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-28T13:10:10.000Z","updated_at":"2025-02-11T12:47:31.000Z","dependencies_parsed_at":null,"dependency_job_id":"6cf959dd-076a-4c70-8df8-d85eff39ad7c","html_url":"https://github.com/oxylabs/web-scraping-r","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fweb-scraping-r","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fweb-scraping-r/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fweb-scraping-r/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fweb-scraping-r/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oxylabs","download_url":"https://codeload.github.com/oxylabs/web-scraping-r/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239243215,"owners_count":19606178,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["proxy-scraper","r-language","web-scraping","wikipedia-scraper"],"created_at":"2024-11-17T02:09:27.267Z","updated_at":"2025-11-01T01:30:25.029Z","avatar_url":"https://github.com/oxylabs.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Web Scraping With R\n\n[![Oxylabs promo code](https://raw.githubusercontent.com/oxylabs/product-integrations/refs/heads/master/Affiliate-Universal-1090x275.png)](https://oxylabs.go2cloud.org/aff_c?offer_id=7\u0026aff_id=877\u0026url_id=112)\n\n\n[![](https://dcbadge.vercel.app/api/server/eWsVUJrnG5)](https://discord.gg/GbxmdGhZjq)\n\n[\u003cimg src=\"https://img.shields.io/static/v1?label=\u0026message=R\u0026color=brightgreen\" /\u003e](https://github.com/topics/r) [\u003cimg src=\"https://img.shields.io/static/v1?label=\u0026message=Web%20Scraping\u0026color=important\" /\u003e](https://github.com/topics/web-scraping)\n\n- [Installing requirements](#installing-requirements)\n- [Web scraping with rvest](#web-scraping-with-rvest)\n- [Web scraping with RSelenium](#web-scraping-with-rselenium)\n\n\nThis tutorial covers the basics of web scraping with R. We’ll begin with the scraping of static pages and shift the focus to the techniques that can be used for scraping data from dynamic websites that use JavaScript to render the content.\n\nFor a detailed explanation, see [this blog post](https://oxy.yt/1r8m). \n\n## Installing requirements\n\nFor macOS, run the following:\n\n```shell\nbrew install r\nbrew install --cask r-studio\n\n```\n\nFor Windows, run the following:\n\n```batch\nchoco install r.project\nchoco install r.studio\n```\n\n### Installing required libraries\n\n```R\ninstall.packages(\"rvest\")\ninstall.packages(\"dplyr\")\n```\n\n## Web scraping with rvest\n\n```R\nlibrary(rvest)\nlink = \"https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes\"\npage = read_html(link)\n\n```\n\n### Parsing HTML Content\n\n```R\npage %\u003e% html_elements(css=\"\")\npage %\u003e% html_elements(xpath=\"\")\n```\n\n\n\n![](https://oxylabs.io/blog/images/2021/12/wiki_markup.png)\n\nFor above page, use the following:\n\n```R\nhtmlElement \u003c- page %\u003e% html_element(\"table.sortable\")\n```\n\n### Saving data to a data frame\n\n```R\ndf \u003c- html_table(htmlEl, header = FALSE)\nnames(df) \u003c- df[2,]\ndf = df[-1:-2,]\n```\n\n### Exporting data frame to a CSV file\n\n```R\nwrite.csv(df, \"iso_codes.csv\")\n```\n\n### Downloading Images\n\n```R\npage \u003c- read_html(url)\nimage_element \u003c- page %\u003e% html_element(\".thumbborder\")\nimage_url \u003c- image_element %\u003e% html_attr(\"src\")\ndownload.file(image_url, destfile = basename(\"paris.jpg\"))\n```\n\n### Scrape Dynamic Pages with Rvest\n\nFind the API endpoint and use that as following:\n```R\npage\u003c-read_html(GET(api_url, timeout(10)))\njsontext \u003c- page %\u003e% html_element(\"p\")  %\u003e% html_text()\n```\nFor a complete example, see [dynamic_rvest.R](src/dynamic_rvest.R).\n\n## Web scraping with RSelenium\n\n```R\ninstall.package(\"RSelenium\")\nlibrary(RSelenium)\n\n```\n\n### Starting Selenium\n\n#### Method 1\n\n```R\n# Method 1\nrD \u003c- rsDriver(browser=\"chrome\", port=9515L, verbose=FALSE)\nremDr \u003c- rD[[\"client\"]]\n\n```\n\n#### Method 2\n\n```shell\ndocker run -d -p 4445:4444 selenium/standalone-firefox\n```\n\n```R\nremDr \u003c- remoteDriver(\n  remoteServerAddr = \"localhost\",\n  port = 4445L,\n  browserName = \"firefox\"\n)\nremDr$open()\n```\n\n### Working with elements in Selenium\n\n```R\nremDr$navigate(\"https://books.toscrape.com/catalogue/category/books/science-fiction_16\")\n```\n\n![](https://oxylabs.io/blog/images/2021/12/book_title.png)\n\n```R\ntitleElements \u003c- remDr$findElements(using = \"xpath\", \"//article//img\")\ntitles \u003c- sapply(titleElements, function(x){x$getElementAttribute(\"alt\")[[1]]})\n\npricesElements \u003c- remDr$findElements(using = \"xpath\", \"//*[@class='price_color']\")\nprices \u003c-  sapply(pricesElements, function(x){x$getElementText()[[1]]})\n\nstockElements \u003c- remDr$findElements(using = \"xpath\", \"//*[@class='instock availability']\")\nstocks \u003c-  sapply(stockElements, function(x){x$getElementText()[[1]]})\n\n```\n\n### Creating a data frame\n\n```R\ndf \u003c- data.frame(titles, prices, stocks)\n```\n\n#### Save CSV\n\n```R\nwrite.csv(df, \"books.csv\")\n```\n\nIf you wish to find out more about web scraping with R, see our [blog post](https://oxy.yt/1r8m).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foxylabs%2Fweb-scraping-r","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foxylabs%2Fweb-scraping-r","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foxylabs%2Fweb-scraping-r/lists"}