https://github.com/MUSA-620-Spring-2018/MUSA-620-Week-6
Web scraping 1
https://github.com/MUSA-620-Spring-2018/MUSA-620-Week-6
css html selectors web-inspector web-scraping
Last synced: 2 months ago
JSON representation
Web scraping 1
- Host: GitHub
- URL: https://github.com/MUSA-620-Spring-2018/MUSA-620-Week-6
- Owner: MUSA-620-Spring-2018
- Created: 2018-02-18T22:52:05.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2018-02-22T02:09:02.000Z (about 7 years ago)
- Last Synced: 2024-07-21T06:44:25.915Z (10 months ago)
- Topics: css, html, selectors, web-inspector, web-scraping
- Language: R
- Size: 2.57 MB
- Stars: 2
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MUSA-620-Week-6: Web Scraping Part 1
### Links:
* [HTML vs DOM, Example 1](https://blueshift.io/selectors2.html)
* [HTML vs DOM, Example 2](https://blueshift.io/selectors2.html)
* [rvest package](https://cran.r-project.org/web/packages/rvest/rvest.pdf) for parsing HTML## Getting to Know Your Browser's Web Inspector
### Elements tab
Use the elements tab to explore a webpage's DOM.
##### Examples:
* [Simple example](https://blueshift.io/selectors2.html)
* Grab text from sites that have copying disabled: [NY Times](https://www.nytimes.com/2016/08/23/upshot/50-years-of-electoral-college-maps-how-the-us-turned-red-and-blue.html)
* Find and download a hidden video file: [Imgur](https://i.imgur.com/9M4Rsf0.gifv)### Network tab
Use the network tab to find the data used in these webpages.
##### [FiveThirtyEight: Congressional Districts](https://projects.fivethirtyeight.com/redistricting-maps/)
* [Congressional district map](https://projects.fivethirtyeight.com/redistricting-maps/US-current.topo.json)##### [WSJ: California Drought](http://graphics.wsj.com/californias-long-challenge-with-drought/)
* [CA outline](http://graphics.wsj.com/californias-long-challenge-with-drought/data/shared/california.topo.json)
* [Drought isocrones](http://graphics.wsj.com/californias-long-challenge-with-drought/data/drought/drought.ca.topo.json)
* [Major cities](http://graphics.wsj.com/californias-long-challenge-with-drought/data/shared/major_cities.topo.json)
* [Population affected](http://graphics.wsj.com/californias-long-challenge-with-drought/data/drought/population-affected.csv)
*Data: [Wall Street Journal](http://graphics.wsj.com/californias-long-challenge-with-drought/) / code: [web-inspector-data.r](https://github.com/MUSA-620-Spring-2018/MUSA-620-Week-6/blob/master/web-inspector-data.R)*
##### [Washington Post: US Sunlight Map](https://www.washingtonpost.com/news/wonk/wp/2015/07/13/map-where-americas-sunniest-and-least-sunny-places-are/)
* See if you can you can find the data to recreate their sunlight map yourself. *The steps for doing this have now been added to [web-inspector-data.r](https://github.com/MUSA-620-Spring-2018/MUSA-620-Week-6/blob/master/web-inspector-data.R)*
*Data: [Washington Post](https://www.washingtonpost.com/news/wonk/wp/2015/07/13/map-where-americas-sunniest-and-least-sunny-places-are/) / code: [web-inspector-data.r](https://github.com/MUSA-620-Spring-2018/MUSA-620-Week-6/blob/master/web-inspector-data.R)*
##### [The Economist: Property Values by US County](https://www.economist.com/blogs/graphicdetail/2015/04/daily-chart-2)
* Sometimes data will be hidden in .js files: [county-level real estate values](https://infographics.economist.com/2015/ASBTest/Land/js/countyData.js?__sbCache=0.26521743179319657)
* Used as the data source for the map below[Total residential property value of each US county](http://metrocosm.com/the-housing-value-of-every-county-in-the-u-s/)
