{"id":15473007,"url":"https://github.com/hackvan/scraping-kickstarter","last_synced_at":"2026-02-06T06:02:51.461Z","repository":{"id":145341881,"uuid":"141028842","full_name":"hackvan/scraping-kickstarter","owner":"hackvan","description":null,"archived":false,"fork":false,"pushed_at":"2018-07-15T13:18:13.000Z","size":21,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-05T17:13:22.501Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hackvan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-07-15T13:17:58.000Z","updated_at":"2018-07-15T13:18:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"6b740bd4-7c3f-4cbf-bc0b-0e386ce17c7b","html_url":"https://github.com/hackvan/scraping-kickstarter","commit_stats":{"total_commits":1,"total_committers":1,"mean_commits":1.0,"dds":0.0,"last_synced_commit":"ac0848c8d9b7f1e3f2bd473d639592a9b6df2a79"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hackvan%2Fscraping-kickstarter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hackvan%2Fscraping-kickstarter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hackvan%2Fscraping-kickstarter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hackvan%2Fscraping-kickstarter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hackvan","download_url":"https://codeload.github.com/hackvan/scraping-kickstarter/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240386530,"owners_count":19793193,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-02T02:42:10.376Z","updated_at":"2026-02-06T06:02:51.443Z","avatar_url":"https://github.com/hackvan.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Scraping Kickstarter\n\n## Objectives\n\n1. Use Nokogiri to scrape an HTML document.\n2. Use scraped data to build a nested data structure.\n\n## Overview\n\nIn this lab, you'll be scraping a Kickstarter web page that lists projects requesting funding. The page you'll be scraping displays 20 previews of projects in the NYC area. Each project has a title, an image, a short description, a location and some funding details. Our goal is to collect this information for each project and build a hash for each project:\n\n```ruby\n:projects =\u003e {\n  \"My Great Project\"  =\u003e {\n    :image_link =\u003e \"Image Link\",\n    :description =\u003e \"Description\",\n    :location =\u003e \"Location\",\n    :percent_funded =\u003e \"Percent Funded\"\n  },\n  \"Another Great Project\" =\u003e {\n    :image_link =\u003e \"Image Link\",\n    :description =\u003e \"Description\",\n    :location =\u003e \"Location\",\n    :percent_funded =\u003e \"Percent Funded\"\n  }\n}\n```\n\nThese individual project hashes will be collected into a larger hash called `projects`.\n\n## Fixtures\n\nIn the directory of this project, you'll notice a folder called `fixtures`. Inside that folder, you'll see a file, `kickstarter.html`. If you are using the Learn IDE right click on the `kickstarter.html` file and select `Show in Finder`. Once Finder opens double click `kickstarter.html` to view the file inside your default web browser. If you are not using the Learn IDE, try open `kickstarter.html` inside your text editor and right-click anywhere on the page to select `open in browser` from the menu that appears.\n\nTa-da! We're looking at a web page. For the purposes of this lab, we won't be scraping a live web page. We'll be scraping this HTML page. We're doing this for two reasons. First, because web pages change. If we assign you a lab based on material that will change, things could get really confusing. Secondly, it is common to keep data that the test suite will use to test your program in a `fixtures` directory.\n\nSo, for this lab, we *don't need Open-Uri*. We're not opening a live web page.\n\n## Instructions\n\n### Setting Up Our Project\n\nSince we'll be using that `kickstarter.html` file instead of an Open-URI request, we need to require only `nokogiri` at the top of the `kickstarter_scraper.rb` file\n\nNext, let's set up some variables inside the method called `create_project_hash`:\n\n```ruby\n# This just opens a file and reads it into a variable\nhtml = File.read('fixtures/kickstarter.html')\n\nkickstarter = Nokogiri::HTML(html)\n```\n\nNotice that this is pretty similar to what we did to open HTML documents in the previous exercise in which we did use Open-URI.\n\n### Selecting the Projects\n\nThe first thing we'll want to do is figure out what selector will allow us to grab each project as a whole. Open up `fixtures/kickstarter.html` by typing:\n\n```bash\nopen fixtures/kickstarter.html\n```\n\nin the terminal, or by right clicking on the file and selecting \"open in browser\".\n\nThis should open the file in your web browser. Right click somewhere on the \"Moby Dick\" project and choose \"Inspect Element\". By moving your mouse up and down in the HTML in the inspector, you can see what each element represents on the page via some cool highlighting. By moving your mouse around, it quickly becomes clear that each project is contained in:\n\n```html\n\u003cli class=\"project grid_4\"\u003e...\u003c/li\u003e\n```\n\nSince this Nokogiri object is just a bunch of nested nodes, and we know how to iterate through a nested data structure, we can use the Ruby we already know to iterate through each of these projects and do stuff with them.\n\nJust to check our assumptions, let's add a `require 'pry'` at the top of our file, and add `binding.pry` after the last line. Call the `create_project_hash` method at the bottom of the file. Then type `ruby kickstarter_scraper.rb` into your terminal. This should drop us into Pry, so that we can play around.\n\nIn pry, type in:\n\n```\nkickstarter.css(\"li.project.grid_4\").first\n```\n\nThis will select the first `li` with the `project` and `grid_4` classes just so that we can make sure we've chosen our selectors correctly.\n\nAnd we have! (If you don't see any output, or see an empty array, make sure you've typed everything exactly as it was typed here.)\n\nAwesome! Let's add a comment to `kickstarter_scraper.rb` that reminds us of that selector:\n\n```ruby\n# projects: kickstarter.css(\"li.project.grid_4\")\n```\n\n### Selecting the Title\n\nLet's hop back into Pry and see if we can figure out how to get the title of that project.\n\nIn Pry, type:\n\n```\nproject = _\n```\n\nThis will assign that project to a variable, `project` so that we can play around with it.\n\n**Reminder:** If you're looking at a big chunk of code in Pry that gets cut off at the bottom of your terminal window, you can scroll down with the down arrow key. You can escape the scrolling and go back to entering code in Pry by hitting \"q\".\n\n**Top-Tip:** The `variable_name = _` syntax used in Pry will assign the `variable` name to the return value of whatever was executed above. For example:\n\n```bash\n$ pry \u003e 1 + 1\n  =\u003e 2\n$ pry \u003e two = _\n$ pry \u003e two\n  =\u003e 2\n```\n\nGo back to your browser and use the element inspector to click around a bit and identify the selector for a project's title. A bit of inspection should reveal that the title of each project lives in an `h2` with a class of `bbcard_name`, inside a `strong` and then an `a` tag. Let's check that in pry:\n\n```\nproject.css(\"h2.bbcard_name strong a\").text\n```\n\nSince Nokogiri gives us a bunch of nested nodes that all respond to the same methods, we can just chain a `css` method right onto this `project`. Neat, huh?\n\nNow that we have our `title` selector, let's add it into a comment in our `kickstarter_scraper.rb`.\n\n```ruby\n# projects: kickstarter.css(\"li.project.grid_4\")\n# title: project.css(\"h2.bbcard_name strong a\").text\n```\n\n### Selecting the Image Link\n\nBack in Chrome, we can see in the inspector that there is a `div` with a class of `project-thumbnail`. Seems like a good place to look. Let's give it a try in Pry.\n\nIn Pry, type:\n\n```\nproject.css(\"div.project-thumbnail a img\").attribute(\"src\").value\n```\n\nIt worked! Now, let's continue to keep track of our working code in our project file:\n\n```ruby\n# projects: kickstarter.css(\"li.project.grid_4\")\n# title: project.css(\"h2.bbcard_name strong a\").text\n# image link: project.css(\"div.project-thumbnail a img\").attribute(\"src\").value\n```\n\n#### A Note on `.attribute`\n\nAn image tag in HTML is considered to have a source attribute. In the following example\n\n`\u003cimg src=\"http://www.example.com/pic.jpg\"\u003e`\n\nthe source attribute would be `\"http://www.example.com/pic.jpg\"`. You can use the `.attribute` method on a Nokogiri element to grab the value of that attribute.\n\n### Selecting the Description\n\nAre you starting to see a pattern here? We click around a bit in the Chrome web inspector, take a stab at a CSS selector in Pry, and then keep track of that selector in our project file. Let's grab the description now. In Pry:\n\n```\nproject.css(\"p.bbcard_blurb\").text\n```\n\nThis should return the description of an individual project.\n\nLet's add that to `kickstarter_scraper.rb`:\n\n```ruby\n# projects: kickstarter.css(\"li.project.grid_4\")\n# title: project.css(\"h2.bbcard_name strong a\").text\n# image link: project.css(\"div.project-thumbnail a img\").attribute(\"src\").value\n# description: project.css(\"p.bbcard_blurb\").text\n```\n\n### Selecting the Location\n\nDo you think you can figure this one out on your own? Examine the web page and then play around in Pry. Try to find the right selector for an individual project's location.\n\n### Selecting the Percent Funded\n\nAnd last, but not least, let's try and grab the percent funded as well! Looking in Chrome, it seems that this one is just a bit trickier, but only because it's more nested than the other ones. In Pry, type:\n\n```\nproject.css(\"ul.project-stats li.first.funded strong\").text\n```\n\nThat does it! To make it useful for later on if, say, we wanted to do some math, let's also tag on a `.gsub(\"%\", \"\").to_i` to remove the percent sign and convert it into an integer.\n\nOur final list of comments in our `kickstarter_scraper.rb` file, then (including the location that you should have figured out on your own), is:\n\n```ruby\n# projects: kickstarter.css(\"li.project.grid_4\")\n# title: project.css(\"h2.bbcard_name strong a\").text\n# image link: project.css(\"div.project-thumbnail a img\").attribute(\"src\").value\n# description: project.css(\"p.bbcard_blurb\").text\n# location: project.css(\"ul.project-meta span.location-name\").text\n# percent_funded: project.css(\"ul.project-stats li.first.funded strong\").text.gsub(\"%\",\"\").to_i\n```\n\n### Let's Scrape!\n\nNow, it's just a matter of putting together the data we can grab with Nokogiri with our knowledge of data iteration in Ruby.\n\nFirst, let's set up a loop to iterate through the projects (and also an empty `projects` hash, which we will fill up with scraped data):\n\n```ruby\n# file: kickstarter_scraper.rb\n\nrequire 'nokogiri'\nrequire 'pry'\n\n# projects: kickstarter.css(\"li.project.grid_4\")\n# title: project.css(\"h2.bbcard_name strong a\").text\n# image link: project.css(\"div.project-thumbnail a img\").attribute(\"src\").value\n# description: project.css(\"p.bbcard_blurb\").text\n# location: project.css(\"ul.project-meta span.location-name\").text\n# percent_funded: project.css(\"ul.project-stats li.first.funded strong\").text.gsub(\"%\",\"\").to_i\n\ndef create_project_hash\n  html = File.read('fixtures/kickstarter.html')\n  kickstarter = Nokogiri::HTML(html)\n\n  projects = {}\n\n  # Iterate through the projects\n  kickstarter.css(\"li.project.grid_4\").each do |project|\n    projects[project] = {}\n  end\n\n  # return the projects hash\n  projects\nend\n```\n\nOk, so that won't work, actually. That's going to make some really wacky key which is a huge Nokogiri object. So, let's change our data structure slightly and make it so that each project title is a key, and the value is another hash with each of our other data points as keys. Sound good?\n\n```ruby\n# file: kickstarter_scraper.rb\n\n...\n\ndef create_project_hash\n  projects = {}\n\n  kickstarter.css(\"li.project.grid_4\").each do |project|\n    title = project.css(\"h2.bbcard_name strong a\").text\n    projects[title.to_sym] = {}\n  end\n\n  # return the projects hash\n  projects\nend\n```\n\nThat's better. You'll notice that we're converting the title into a symbol using the `to_sym` method. Remember that symbols make better hash keys than strings.\n\nFinally, it's just a matter of grabbing each of the data points using the selectors we've already figured out, and adding them to each project's hash. So, our complete code will look something like this:\n\n```ruby\n# file: kickstarter_scraper.rb\n\nrequire 'nokogiri'\nrequire 'pry'\n\n# projects: kickstarter.css(\"li.project.grid_4\")\n# title: project.css(\"h2.bbcard_name strong a\").text\n# image link: project.css(\"div.project-thumbnail a img\").attribute(\"src\").value\n# description: project.css(\"p.bbcard_blurb\").text\n# location: project.css(\"ul.project-meta span.location-name\").text\n# percent_funded: project.css(\"ul.project-stats li.first.funded strong\").text.gsub(\"%\",\"\").to_i\n\ndef create_project_hash\n  html = File.read('fixtures/kickstarter.html')\n  kickstarter = Nokogiri::HTML(html)\n\n  projects = {}\n\n  kickstarter.css(\"li.project.grid_4\").each do |project|\n    title = project.css(\"h2.bbcard_name strong a\").text\n    projects[title.to_sym] = {\n      :image_link =\u003e project.css(\"div.project-thumbnail a img\").attribute(\"src\").value,\n      :description =\u003e project.css(\"p.bbcard_blurb\").text,\n      :location =\u003e project.css(\"ul.project-meta span.location-name\").text,\n      :percent_funded =\u003e project.css(\"ul.project-stats li.first.funded strong\").text.gsub(\"%\",\"\").to_i\n    }\n  end\n\n  # return the projects hash\n  projects\nend\n```\n\nWe did it! Run the test suite and you should see that all of the tests are passing.\n\n\n\u003cp data-visibility='hidden'\u003eView \u003ca href='https://learn.co/lessons/scraping-kickstarter' title='Scraping Kickstarter'\u003eScraping Kickstarter\u003c/a\u003e on Learn.co and start learning to code for free.\u003c/p\u003e\n\n\u003cp class='util--hide'\u003eView \u003ca href='https://learn.co/lessons/scraping-kickstarter'\u003eKickstarter Scraping Lab\u003c/a\u003e on Learn.co and start learning to code for free.\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhackvan%2Fscraping-kickstarter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhackvan%2Fscraping-kickstarter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhackvan%2Fscraping-kickstarter/lists"}