{"id":20709956,"url":"https://github.com/oxylabs/webscraping-with-ruby","last_synced_at":"2025-09-07T17:35:00.641Z","repository":{"id":134336698,"uuid":"464487512","full_name":"oxylabs/webscraping-with-ruby","owner":"oxylabs","description":"A tutorial for web scraping with Ruby","archived":false,"fork":false,"pushed_at":"2025-06-26T08:24:47.000Z","size":24,"stargazers_count":5,"open_issues_count":0,"forks_count":2,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-10T14:46:27.748Z","etag":null,"topics":["ruby","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oxylabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-02-28T13:11:55.000Z","updated_at":"2025-06-26T08:24:51.000Z","dependencies_parsed_at":"2025-08-10T14:47:17.732Z","dependency_job_id":null,"html_url":"https://github.com/oxylabs/webscraping-with-ruby","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/oxylabs/webscraping-with-ruby","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fwebscraping-with-ruby","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fwebscraping-with-ruby/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fwebscraping-with-ruby/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fwebscraping-with-ruby/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oxylabs","download_url":"https://codeload.github.com/oxylabs/webscraping-with-ruby/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fwebscraping-with-ruby/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274069701,"owners_count":25217175,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-07T02:00:09.463Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ruby","web-scraping"],"created_at":"2024-11-17T02:09:15.172Z","updated_at":"2025-09-07T17:35:00.591Z","avatar_url":"https://github.com/oxylabs.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Web Scraping With Ruby\n\n[![Oxylabs promo code](https://raw.githubusercontent.com/oxylabs/product-integrations/refs/heads/master/Affiliate-Universal-1090x275.png)](https://oxylabs.io/pages/gitoxy?utm_source=877\u0026utm_medium=affiliate\u0026groupid=877\u0026utm_content=webscraping-with-ruby-github\u0026transaction_id=102f49063ab94276ae8f116d224b67)\n\n[![](https://dcbadge.vercel.app/api/server/eWsVUJrnG5)](https://discord.gg/GbxmdGhZjq)\n\n[\u003cimg src=\"https://img.shields.io/static/v1?label=\u0026message=Ruby\u0026color=brightgreen\" /\u003e](https://github.com/topics/ruby) [\u003cimg src=\"https://img.shields.io/static/v1?label=\u0026message=Web%20Scraping\u0026color=important\" /\u003e](https://github.com/topics/web-scraping)\n\n- [Installing Ruby](#installing-ruby)\n- [Scraping static pages](#scraping-static-pages)\n- [Scraping dynamic pages](#scraping-dynamic-pages)\n\nRuby is a time-tested, open-source programming language. Its first version was released in 1996, while the latest major iteration 3 was dropped in 2020. This article covers tools and techniques for web scraping with Ruby that work with the latest version 3.\n\nWe’ll begin with a step-by-step overview of scraping static public web pages first and shift our focus to the means of scraping dynamic pages. While the first approach works with most websites, it will not function with the dynamic pages that use JavaScript to render the content. To handle these sites, we’ll look at headless browsers.\n\nFor a detailed explanation, see our [blog post](https://oxy.yt/Dr5a).\n\n## Installing Ruby\n\nTo install Ruby on **Windows**, run the following:\n\n```batch\nchoco install ruby\n```\n\nTo install Ruby on **macOS**, use a package manager such as [Homebrew](https://brew.sh/). Enter the following in the terminal:\n\n```shell\nbrew install ruby\n```\n\nFor **Linux**, use the package manager for your distro. For example, run the following for Ubuntu:\n\n```shell\nsudo apt install ruby-full\n```\n\n## Scraping static pages\n\nIn this section, we’ll write a web scraper that can scrape data from [https://sandbox.oxylabs.io/products])(https://sandbox.oxylabs.io/products) . It is a dummy video game store for practicing web scraping with static websites.\n\n### Installing required gems\n\n```shell\ngem install httparty\ngem install nokogiri\ngem install csv\n```\n\n### Making an HTTP request\n\n```ruby\nrequire 'httparty'\nresponse = HTTParty.get('https://sandbox.oxylabs.io/products')\nif response.code == 200\n    puts response.body\nelse\n    puts \"Error: #{response.code}\"\n    exit\nend\n```\n\n### Parsing HTML with Nokogiri\n\n```ruby\nrequire 'nokogiri'\ndocument = Nokogiri::HTML4(response.body)\n```\n\n![](https://oxylabs.io/blog/images/2021/12/book_container.png)\n\n```ruby\ngames = []\n50.times do |i|\n  url = \"https://sandbox.oxylabs.io/products?page={i+1}\"\n  response = HTTParty.get(url)\n  document = Nokogiri::HTML(response.body)\n  all_game_containers = document.css('.product-card')\n\n  all_game_containers.each do |container|\n    title = container.css('h4').text.strip\n    price = container.css('.price-wrapper').text.delete('^0-9.')\n    category_elements = container.css('.category span')\n    categories = category_elements.map { |elem| elem.text.strip }.join(', ')\n    game = [title, price, categories]\n  end\nend\n\n```\n\n### Writing scraped data to a CSV file\n\n```ruby\nrequire 'csv'\nCSV.open(\n  'games.csv',\n  'w+',\n  write_headers: true,\n  headers: %w[Title, Price, Categories]\n) do |csv|\n  50.times do |i|\n    response = HTTParty.get(\"https://sandbox.oxylabs.io/products?page={i+1}\")\n    document = Nokogiri::HTML4(response.body)\n    all_game_containers = document.css('.product-card')\n    all_games_containers.each do |container|\n      title = container.css('h4').text.strip\n      price = container.css('.price-wrapper').text.delete('^0-9.')\n      category_elements = container.css('.category span')\n      categories = category_elements.map { |elem| elem.text.strip }.join(', ')    \n      game = [title, price, categories]\n      csv \u003c\u003c game\n    end\n  end\nend\n\n```\n\n## Scraping dynamic pages\n\n### Required installation\n\n```shell\ngem install selenium-webdriver\ngem install csv\n```\n\n### Loading a dynamic website\n\n```ruby\nrequire 'selenium-webdriver'\n\ndriver = Selenium::WebDriver.for(:chrome)\n```\n\n### Locating HTML elements via CSS selectors\n\n```ruby\ndocument = Nokogiri::HTML(driver.page_source)\n```\n\n![](https://oxylabs.io/blog/images/2021/12/quotes_to_scrape.png)\n\n```ruby\nquotes = []\nquote_elements = driver.find_elements(css: '.quote')\nquote_elements.each do |quote_el|\n  quote_text = quote_el.find_element(css: '.text').attribute('textContent')\n  author = quote_el.find_element(css: '.author').attribute('textContent')\n  quotes \u003c\u003c [quote_text, author]\nend\n```\n\n### Handling pagination\n\n```ruby\nquotes = []\nwhile true do\n  quote_elements = driver.find_elements(css: '.quote')\n  quote_elements.each do |quote_el|\n    quote_text = quote_el.find_element(css: '.text').attribute('textContent')\n    author = quote_el.find_element(css: '.author').attribute('textContent')\n    quotes \u003c\u003c [quote_text, author]\n  end\n  begin\n    driver.find_element(css: '.next \u003ea').click\n  rescue\n    break # Next button not found\n  end\nend\n```\n\n### Creating a CSV file\n\n```ruby\nrequire 'csv'\n\nCSV.open('quotes.csv', 'w+', write_headers: true,\n         headers: %w[Quote Author]) do |csv|\n  quotes.each do |quote|\n    csv \u003c\u003c quote\n  end\nend\n```\n\nIf you wish to find out more about web scraping with Ruby, see our [blog post](https://oxy.yt/Dr5a).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foxylabs%2Fwebscraping-with-ruby","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foxylabs%2Fwebscraping-with-ruby","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foxylabs%2Fwebscraping-with-ruby/lists"}