{"id":14066265,"url":"https://github.com/hadley/web-scraping","last_synced_at":"2025-10-13T04:35:34.413Z","repository":{"id":225847226,"uuid":"767022264","full_name":"hadley/web-scraping","owner":"hadley","description":null,"archived":false,"fork":false,"pushed_at":"2024-07-08T08:25:36.000Z","size":20665,"stargazers_count":77,"open_issues_count":0,"forks_count":14,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-12T14:13:14.000Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hadley.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-04T15:08:07.000Z","updated_at":"2025-10-06T09:43:33.000Z","dependencies_parsed_at":"2024-07-08T09:51:00.514Z","dependency_job_id":null,"html_url":"https://github.com/hadley/web-scraping","commit_stats":null,"previous_names":["hadley/web-scraping"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/hadley/web-scraping","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hadley%2Fweb-scraping","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hadley%2Fweb-scraping/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hadley%2Fweb-scraping/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hadley%2Fweb-scraping/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hadley","download_url":"https://codeload.github.com/hadley/web-scraping/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hadley%2Fweb-scraping/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279013683,"owners_count":26085390,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-13T02:00:06.723Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-13T07:05:00.986Z","updated_at":"2025-10-13T04:35:34.386Z","avatar_url":"https://github.com/hadley.png","language":"R","readme":"# Web scraping with rvest (UseR 2024)\n\nIn this tutorial, you'll learn the basics of web scraping with R, using the rvest package. We'll discuss the basic structure of an HTML page, and how to find the elements your interested in with selectorgadget or the browser's developer tools. You'll then learn how to programmatically extract with rvest, turning web pages into tidy data frames.\n\nBonus content includes scraping multiple pages (with rvest and httr2), scraping dynamic sites where content is generated with JavaScript, extracting data from unofficial APIs, and some hints on using LLMs.\n\n[Slides](rvest.pdf)\n\n## Requirements\n\nTo run the code at home, install the following packages:\n\n```R\n# install.packages(\"pak\")\npak::pak(c(\"tidyverse\", \"chromote\"))\n```\n\nTo run the live web-scraping code you'll also need a copy of [Chrome](https://www.google.com/chrome/) installed on your computer.\n","funding_links":[],"categories":["R"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhadley%2Fweb-scraping","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhadley%2Fweb-scraping","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhadley%2Fweb-scraping/lists"}