{"id":20709931,"url":"https://github.com/oxylabs/web-scraping-google-sheets","last_synced_at":"2025-03-11T05:48:45.109Z","repository":{"id":134336682,"uuid":"565702003","full_name":"oxylabs/web-scraping-google-sheets","owner":"oxylabs","description":"Guide to Using Google Sheets for Basic Web Scraping","archived":false,"fork":false,"pushed_at":"2025-02-11T13:06:15.000Z","size":23,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-11T14:23:47.066Z","etag":null,"topics":["email-scraper","google-news-api","google-news-scraper","google-search-scraper","google-sheets-web-scraping","google-trends-api","python-web-scraper","web-crawler-python","web-scraping","web-scraping-google-sheets","web-scraping-python"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oxylabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-14T06:19:59.000Z","updated_at":"2025-02-11T13:06:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"b9174991-51ea-4828-89d6-1239f4dcffff","html_url":"https://github.com/oxylabs/web-scraping-google-sheets","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fweb-scraping-google-sheets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fweb-scraping-google-sheets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fweb-scraping-google-sheets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fweb-scraping-google-sheets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oxylabs","download_url":"https://codeload.github.com/oxylabs/web-scraping-google-sheets/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242980783,"owners_count":20216285,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["email-scraper","google-news-api","google-news-scraper","google-search-scraper","google-sheets-web-scraping","google-trends-api","python-web-scraper","web-crawler-python","web-scraping","web-scraping-google-sheets","web-scraping-python"],"created_at":"2024-11-17T02:09:05.879Z","updated_at":"2025-03-11T05:48:45.100Z","avatar_url":"https://github.com/oxylabs.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Guide to Using Google Sheets for Basic Web Scraping\n\n[![Oxylabs promo code](https://raw.githubusercontent.com/oxylabs/product-integrations/refs/heads/master/Affiliate-Universal-1090x275.png)](https://oxylabs.go2cloud.org/aff_c?offer_id=7\u0026aff_id=877\u0026url_id=112)\n\n\n[![](https://dcbadge.vercel.app/api/server/eWsVUJrnG5)](https://discord.gg/GbxmdGhZjq)\n\n* [What is IMPORTXML](#WhatisIMPORTXML)\n* [How to extract data from a website to Google Sheets](#HowtoextractdatafromawebsitetoGoogleSheets)  \n* [Other related functions](#Otherrelatedfunctions)\n* [Import table from website to Google Sheets](#ImporttablefromwebsitetoGoogleSheets)\n* [Import data from XML Feeds to Google Sheets](#ImportdatafromXMLFeedstoGoogleSheets)\n* [Customizing data imported by IMPORTFEED](#CustomizingdataimportedbyIMPORTFEED)\n* [Importing Data from CSV to Google Sheets](#ImportingDatafromCSVtoGoogleSheets)\n* [Does the data stay fresh?](#Doesthedatastayfresh)\n* [Common Errors](#common-errors)\n* [Errors related to volatile functions](#Errorsrelatedtovolatilefunctions)\n\n\nGoogle Sheets can be a very effective tool for web scraping. While most ways of web scraping require you to write code, web scraping with Google Sheets needs no coding and or add on. All you need to do is use a built-in function of Google Sheets.\n\nThis guide will give an overview you how to scrape website data with Google Sheets. \n\nIf you want to learn more, see our blog post.\n\n##  \u003ca name='WhatisIMPORTXML'\u003e\u003c/a\u003eWhat is IMPORTXML\n\nIMPORTXML is a function that can import data from various data types. \n\nIf you want to extract the title element from the Quotes to Scrape web page, the formula would be as follows:\n\n```\n=IMPORTXML(\"https://quotes.toscrape.com/\",\"//title\")\n```\n\nAs evident here, the first parameter is the web page URL, and the second parameter is the XPath query.\n\nIf you want to extract the first quote from the webpage, the formula will be as follows:\n\n```\n=IMPORTXML(\"https://quotes.toscrape.com/\",\"(//[@class='text']/text())[1]\")\n```\n\n\n\nIf this XPath query seems like something you are not comfortable with, we recommend reading the XPath section on [our blog](https://oxylabs.io/blog/xpath-vs-css) to learn more about writing XPath queries. \n\nAlternatively, you can enter the URL in a cell:\n\n```\n=IMPORTXML(A1,A2)\n```\n\n##  2. \u003ca name='HowtoextractdatafromawebsitetoGoogleSheets'\u003e\u003c/a\u003eHow to extract data from a website to Google Sheets\n\n###  2.1. \u003ca name='Step1:FindXPathforselectingElements'\u003e\u003c/a\u003eStep 1: Find XPath for selecting Elements\n\nIn this example, we will work with https://books.toscrape.com/, and we want to get all the book titles. These requirements mean that we need to write a custom XPath. This Xpath is as follows:\n\n//h3/a/@title\n\n###  \u003ca name='Step2:CreateanewGoogleSheet'\u003e\u003c/a\u003eStep 2: Create a new Google Sheet\n\nNavigate to [Google Sheets](https://docs.google.com/spreadsheets/u/0/) and create a new sheet. This step requires you to log in to your Google account if you haven't done so already.\n\n###  \u003ca name='Step3:EntertheURLandXPathintwocells'\u003e\u003c/a\u003eStep 3: Enter the URL and XPath in two cells\n\nEnter the URL of the webpage and the XPath in two cells.\n\n###  \u003ca name='Step4:ExtractWebsiteDataWithGoogleSheets'\u003e\u003c/a\u003eStep 4: Extract Website Data With Google Sheets\n\nIn a new cell, for example, A2, enter the following formula:\n\n```\n=IMPORTXML(B1,B2)\n```\n\nThis formula effectively calls the following function:\n\n```\n=IMPORTXML(\"ttps://books.toscrape.com/\",\"//h3/a/@title\")\n```\n\n\n\nIf you want to extract the book prices, the first step is to create the XPath for prices. This XPath would be as follows:\n\n```\n//*[@class=\"price_color\"]/text()\n```\n\n\n\nEnter this XPath in a Cell, let's say, B3. After that, enter the following formula in the cell B4:\n\n```\n=IMPORTXML(B1, B3)\n```\n\n\n\n##  \u003ca name='Otherrelatedfunctions'\u003e\u003c/a\u003eOther related functions\n\nApart from IMPORTXML, a few other functions can be used for web scraping directly from Google Sheets:\n\n- IMPORTHTML\n- IMPORTFEED\n- IMPORTDATA\n\n\n\n### \u003ca name='ImporttablefromwebsitetoGoogleSheets'\u003e\u003c/a\u003eImport table from website to Google Sheets\n\nThis function expects three parameters:\n\n- URL\n- Either \"table\" or \"list\"\n- The index of the table or the list you want to scrape.\n\nFor example, see [List of highest-grossing films - Wikipedia](https://en.wikipedia.org/wiki/List_of_highest-grossing_films). This page contains the list in a table.\n\n```\n=IMPORTHTML(B1,\"table\",1)\n```\n\nFor example, if we wanted only the movie titles, which are in column number 3, our formula would be as follows:\n\n```\n=INDEX(IMPORTHTML(\"https://en.wikipedia.org/wiki/List_of_highest-grossing_films\",\"table\",1),,3)\n```\n\n##  \u003ca name='ImportdatafromXMLFeedstoGoogleSheets'\u003e\u003c/a\u003eImport data from XML Feeds to Google Sheets\n\nLet's take the example of the [New York Times Technology feeds](https://rss.nytimes.com/services/xml/rss/nyt/Technology.xml) to see this function in action. \n\nCreate a new sheet and enter the url of the feed in cell B1:\n\nhttps://rss.nytimes.com/services/xml/rss/nyt/Technology.xml\n\nNow in the cell A2, enter the following formula:\n\n```\n=IMPORTFEED(B1)\n```\n\n\n\n##  \u003ca name='CustomizingdataimportedbyIMPORTFEED'\u003e\u003c/a\u003eCustomizing data imported by IMPORTFEED\n\nThe IMPORTFEED function has the following optional parameters:\n\n- Query - You can use this to specify which information you want to import. More on this just in a bit.\n- Headers - As you can see from the above image, there are no headers in the imported data. If you want to see column headers, then set this parameter to TRUE.\n- num_items - You can also control how many items are fetched. If you want only five items to be imported, set this parameter to 5.\n\nUpdate the function call to the following:\n\n```\n=IMPORTFEED(B1,,TRUE,5)\n```\n\nIf you want only the information about the feed, enter the following formula:\n\n```\n=IMPORTFEED(B1,\"feed\")\n```\n\n\n\nIf you want to get only the titles, enter the following formula:\n\n```\n=IMPORTFEED(B1,\"items title\")\n```\n\n\n\n##  \u003ca name='ImportingDatafromCSVtoGoogleSheets'\u003e\u003c/a\u003eImporting Data from CSV to Google Sheets\n\nIf you have a URL that contains a CSV file, you can use the IMPORTDATA function to get the data.\n\nFor example, create a new sheet and enter the following URL in the cell B1:\n\nhttps://www2.census.gov/programs-surveys/decennial/2020/data/apportionment/apportionment.csv\n\nIn the cell A2, enter the following formula:\n\n```\n=IMPORTDATA(B1)\n```\n\n\n\n## \u003ca name='Doesthedatastayfresh'\u003e\u003c/a\u003eDoes the data stay fresh?\n\nIf you keep your google sheet open, these functions check for updated data every hour.\n\nData will also be refreshed if you delete and add the same cell.\n\nNote that data will not be refreshed if you refresh your sheet.\n\nData will also not be refreshed if you copy-paste a cell with these functions.\n\n# Common Errors\n\nThe following are some of the common errors you may face while creating your web scraping Google Sheet:\n\n##  \u003ca name='Error:Arrayresultwasnotexpanded'\u003e\u003c/a\u003eError: Array result was not expanded\n\nArray result was not expanded because it would overwrite data in A36.\n\nThis error means you need to make room by adding more cells for the results.\n\n##  \u003ca name='Error:Resulttoolarge'\u003e\u003c/a\u003eError: Result too large\n\nThe solution is to update the XPath query so that a smaller amount of data is returned. \n\n##  \u003ca name='Errorsrelatedtovolatilefunctions'\u003e\u003c/a\u003eErrors related to volatile functions\n\nIf you see the following error:\n\nError: This function is not allowed to reference a cell with NOW(), RAND(), or RANDBETWEEN()\n\nIt means that you are trying to reference one of the volatile functions, such as NOW, RAND, or RANDBETWEEN, in one of the parameters. These references may be indirect or direct.\n\n\n\nRead More Google Scraping Related Repositories: [Google Shopping Scraper](https://github.com/oxylabs/google-shopping-scraper), [How to Scrape Google Shopping Results](https://github.com/oxylabs/scrape-google-shopping), [Google Maps Scraper](https://github.com/oxylabs/google-maps-scraper), [Google Play Scraper](https://github.com/oxylabs/google-play-scraper), [How To Scrape Google Jobs](https://github.com/oxylabs/how-to-scrape-google-jobs), [Google News Scrpaer](https://github.com/oxylabs/google-news-scraper), [How to Scrape Google Scholar](https://github.com/oxylabs/how-to-scrape-google-scholar), [How to Scrape Google Flights with Python](https://github.com/oxylabs/how-to-scrape-google-flights), [How To Scrape Google Images](https://github.com/oxylabs/how-to-scrape-google-images), [Scrape Google Search Results](https://github.com/oxylabs/scrape-google-python), [Scrape Google Trends](https://github.com/oxylabs/how-to-scrape-google-trends)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foxylabs%2Fweb-scraping-google-sheets","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foxylabs%2Fweb-scraping-google-sheets","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foxylabs%2Fweb-scraping-google-sheets/lists"}