{"id":28998654,"url":"https://github.com/luminati-io/complex-navigation-scraping","last_synced_at":"2026-05-18T02:33:23.893Z","repository":{"id":283783982,"uuid":"923525569","full_name":"luminati-io/complex-navigation-scraping","owner":"luminati-io","description":"Selenium for web scraping on sites with complex navigation patterns like dynamic pagination, infinite scrolling, and 'Load More' buttons.","archived":false,"fork":false,"pushed_at":"2025-01-28T12:08:27.000Z","size":211,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-07T18:07:38.381Z","etag":null,"topics":["html","pagination","python","selenium","web-scraping"],"latest_commit_sha":null,"homepage":"https://brightdata.com/blog/web-data/scraping-complex-navigation","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/luminati-io.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-28T12:05:19.000Z","updated_at":"2025-01-28T12:28:17.000Z","dependencies_parsed_at":"2026-04-20T17:00:38.303Z","dependency_job_id":null,"html_url":"https://github.com/luminati-io/complex-navigation-scraping","commit_stats":null,"previous_names":["luminati-io/complex-navigation-scraping"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/luminati-io/complex-navigation-scraping","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luminati-io%2Fcomplex-navigation-scraping","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luminati-io%2Fcomplex-navigation-scraping/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luminati-io%2Fcomplex-navigation-scraping/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luminati-io%2Fcomplex-navigation-scraping/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/luminati-io","download_url":"https://codeload.github.com/luminati-io/complex-navigation-scraping/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luminati-io%2Fcomplex-navigation-scraping/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33162632,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-17T22:39:12.733Z","status":"online","status_checked_at":"2026-05-18T02:00:06.436Z","response_time":71,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["html","pagination","python","selenium","web-scraping"],"created_at":"2025-06-25T07:09:32.326Z","updated_at":"2026-05-18T02:33:23.875Z","avatar_url":"https://github.com/luminati-io.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Scraping Websites With Complex Navigation\n\n[![Promo](https://github.com/luminati-io/LinkedIn-Scraper/raw/main/Proxies%20and%20scrapers%20GitHub%20bonus%20banner.png)](https://brightdata.com/) \n\nThis guide explains how to use Selenium and browser automation to scrape websites with complex navigation patterns, such as dynamic pagination, infinite scrolling, and ‘Load More’ buttons, using Selenium and browser automation.\n\n- [What Is Considered Complex Navigation?](#what-is-considered-complex-navigation)\n- [Tools to Handle Complex Navigation Websites](#tools-to-handle-complex-navigation-websites)\n- [Scraping Common Complex Navigation Patterns](#scraping-common-complex-navigation-patterns)\n  - [Dynamic Pagination](#dynamic-pagination)\n  - [‘Load More’ Button](#load-more-button)\n  - [Infinite Scrolling](#infinite-scrolling)\n- [Conclusion](#conclusion)\n\n## What Is Considered Complex Navigation?\n\nIn web scraping, complex navigation refers to website structures where content or pages are not easily accessible. Complex navigation scenarios often involve dynamic elements, asynchronous data loading, or user-driven interactions. These aspects may enhance user experiences, but they also significantly complicate data extraction. Here are some very common examples:\n\n- **JavaScript-rendered navigation**: Websites that rely on JavaScript frameworks to generate content directly in the browser.\n- **Paginated content**: Sites with data spread across multiple pages where pagination is loaded dynamically via AJAX.\n- **Infinite scrolling**: Pages that load additional content dynamically as users scroll, typical for social media feeds, Discourse-based forums, and news websites.\n- **Multi-level menus**: Sites with nested menus requiring multiple clicks or hover actions to reveal deeper layers of navigation, common for product category trees on marketplaces.\n- **Interactive map interfaces**: Websites displaying data on maps or graphs, where information is dynamically loaded as users pan or zoom.\n- **Tabs or accordions**: Pages with content hidden under dynamically rendered tabs or collapsible accordions that are not directly embedded in the HTML returned by the server.\n- **Dynamic filters and sorting options**: Sites with complex filtering systems where applying multiple filters reloads the item listing dynamically without altering the URL structure.\n\n## Tools to Handle Complex Navigation Websites\n\nMany of the complex interactions listed above need JavaScript execution, something only a browser can do. This means you cannot rely on simple [HTML parsers](https://brightdata.com/blog/web-data/best-html-parsers) for such pages. Instead, you must use a browser automation tool like Selenium, Playwright, or Puppeteer. These solutions allow you to programmatically instruct a browser to perform specific actions on a web page, mimicking user behavior.\n\n## Scraping Common Complex Navigation Patterns\n\nThis guides covers three specific types of complex navigation patterns:\n\n- **Dynamic pagination**: Sites with paginated data loaded dynamically via AJAX.\n- **‘Load More’ button**: A common JavaScript-based navigation example.\n- **Infinite scrolling**: A page that continuously loads data as the user scrolls down.\n\nWe will use Selenium in Python, but the logic can be adapted to Playwright, Puppeteer, or any other browser automation tools. The guide also assumes that you are already familiar with the basics of [web scraping using Selenium](https://brightdata.com/blog/how-tos/using-selenium-for-web-scraping).\n\n### Dynamic Pagination\n\nWe will use the “[Oscar Winning Films: AJAX and Javascript](https://www.scrapethissite.com/pages/ajax-javascript/#2014)” scraping sandbox:\n\n![The target page. Note how pagination data is loaded dynamically](https://github.com/luminati-io/complex-navigation-scraping/blob/main/Images/Dynamic-pagniation-example-1536x752.gif)\n\nThis site dynamically loads Oscar-winning film data, paginated by year.\n\nTo navigate and scrape such a page effectively, you need to follow the following steps:\n\n1. Click on a new year to trigger data loading (a loader element will appear).\n2. Wait for the loader element to disappear (indicating the data has fully loaded).\n3. Verify that the table with the data has been properly rendered on the page.\n4. Scrape the data once it becomes available.\n\nBelow is an example of how to implement this logic using Selenium in Python:\n\n```python\nfrom selenium import webdriver\nfrom selenium.webdriver.chrome.service import Service\nfrom selenium.webdriver.common.by import By\nfrom selenium.webdriver.support.ui import WebDriverWait\nfrom selenium.webdriver.support import expected_conditions as EC\nfrom selenium.webdriver.chrome.options import Options\n\n# Set up Chrome options for headless mode\noptions = Options()\noptions.add_argument(\"--headless\")\n\n# Create a Chrome web driver instance\ndriver = webdriver.Chrome(service=Service(), options=options)\n\n# Connect to the target page\ndriver.get(\"https://www.scrapethissite.com/pages/ajax-javascript/\")\n\n# Click the \"2012\" pagination button\nelement = driver.find_element(By.ID, \"2012\")\nelement.click()\n\n# Wait until the loader is no longer visible\nWebDriverWait(driver, 10).until(\n    lambda d: d.find_element(By.CSS_SELECTOR, \"#loading\").get_attribute(\"style\") == \"display: none;\"\n)\n\n# Data should now be loaded...\n\n# Wait for the table to be present on the page\nWebDriverWait(driver, 10).until(\n    EC.presence_of_element_located((By.CSS_SELECTOR, \".table\"))\n)\n\n# Where to store the scraped data\nfilms = []\n\n# Scrape data from the table\ntable_body = driver.find_element(By.CSS_SELECTOR, \"#table-body\")\nrows = table_body.find_elements(By.CSS_SELECTOR, \".film\")\nfor row in rows:\n    title = row.find_element(By.CSS_SELECTOR, \".film-title\").text\n    nominations = row.find_element(By.CSS_SELECTOR, \".film-nominations\").text\n    awards = row.find_element(By.CSS_SELECTOR, \".film-awards\").text\n    best_picture_icon = row.find_element(By.CSS_SELECTOR, \".film-best-picture\").find_elements(By.TAG_NAME, \"i\")\n    best_picture = True if best_picture_icon else False\n\n    # Store the scraped data\n    films.append({\n      \"title\": title,\n      \"nominations\": nominations,\n      \"awards\": awards,\n      \"best_picture\": best_picture\n    })\n\n# Data export logic...\n\n# Close the browser driver\ndriver.quit()\n```\n\nHere is the breakdown of that code snippet:\n\n1.  The code sets up a headless Chrome instance.\n2.  The script opens the target page and clicks the “2012” pagination button to trigger data loading.\n3.  Selenium waits for the loader to disappear using [`WebDriverWait()`](https://selenium-python.readthedocs.io/waits.html).\n4.  After the loader disappears, the script waits for the table to appear.\n5.  After the data is fully loaded, the script extracts details such as film titles, nominations, awards, and whether the film won Best Picture. The extracted information is then stored in a list of dictionaries.\n\nThe result will be:\n\n```json\n[\n  {\n    \"title\": \"Argo\",\n    \"nominations\": \"7\",\n    \"awards\": \"3\",\n    \"best_picture\": true\n  },\n  // ...\n  {\n    \"title\": \"Curfew\",\n    \"nominations\": \"1\",\n    \"awards\": \"1\",\n    \"best_picture\": false\n  }\n]\n```\n\nKeep in mind that there isn’t always a single best approach to handling this navigation pattern. Alternative methods may be necessary depending on the page's behavior. Here are some examples:\n\n*   Use `WebDriverWait()` in combination with expected conditions to wait for specific HTML elements to appear or disappear.\n*   Monitor traffic for AJAX requests to detect when new content is fetched. This may involve using browser logging.\n*   Identify the API request triggered by pagination and make direct requests to fetch the data programmatically (e.g., using the [`requests` library](https://brightdata.com/blog/web-data/python-requests-guide)).\n\n### ‘Load More’ Button\n\nTo illustrate JavaScript-based complex navigation scenarios involving user interactions, let's use an example of a 'Load More' button. The concept is straightforward: a list of items is displayed, and clicking the button loads additional items.\n\nThis time, the target site will be the [‘Load More’ example](https://www.scrapingcourse.com/button-click) page from the Scraping Course:\n\n![The ‘Load More’ target page in action](https://github.com/luminati-io/complex-navigation-scraping/blob/main/Images/Clicking-on-the-load-more-button-1536x752.gif)\n\nTo handle this complex navigation scraping pattern, follow these steps:\n\n1.  Find the ‘Load More’ button and click it.\n2.  Wait for the new elements to load onto the page.\n\nHere is the code to use with Selenium:\n\n```python\nfrom selenium import webdriver\nfrom selenium.webdriver.common.by import By\nfrom selenium.webdriver.chrome.options import Options\nfrom selenium.webdriver.support.ui import WebDriverWait\n\n# Set up Chrome options for headless mode\noptions = Options()\noptions.add_argument(\"--headless\")\n\n# Create a Chrome web driver instance\ndriver = webdriver.Chrome(options=options)\n\n# Connect to the target page\ndriver.get(\"https://www.scrapingcourse.com/button-click\")\n\n# Collect the initial number of products\ninitial_product_count = len(driver.find_elements(By.CSS_SELECTOR, \".product-item\"))\n\n# Locate the \"Load More\" button and click it\nload_more_button = driver.find_element(By.CSS_SELECTOR, \"#load-more-btn\")\nload_more_button.click()\n\n# Wait until the number of product items on the page has increased\nWebDriverWait(driver, 10).until(lambda driver: len(driver.find_elements(By.CSS_SELECTOR, \".product-item\")) \u003e initial_product_count)\n\n# Where to store the scraped data\nproducts = []\n\n# Scrape product details\nproduct_elements = driver.find_elements(By.CSS_SELECTOR, \".product-item\")\nfor product_element in product_elements:\n    # Extract product details\n    name = product_element.find_element(By.CSS_SELECTOR, \".product-name\").text\n    image = product_element.find_element(By.CSS_SELECTOR, \".product-image\").get_attribute(\"src\")\n    price = product_element.find_element(By.CSS_SELECTOR, \".product-price\").text\n    url = product_element.find_element(By.CSS_SELECTOR, \"a\").get_attribute(\"href\")\n\n    # Store the scraped data\n    products.append({\n        \"name\": name,\n        \"image\": image,\n        \"price\": price,\n        \"url\": url\n    })\n\n# Data export logic...\n\n# Close the browser driver\ndriver.quit()\n```\n\nTo handle the 'Load More' button navigation pattern, the script:\n\n1.  Records the initial number of products on the page\n2.  Clicks the “Load More” button\n3.  Waits until the product count increases, confirming that new items have been added\n\nThis approach is both efficient and versatile, as it eliminates the need to know the exact number of elements to be loaded. However, alternative methods can also achieve similar results.\n\n### Infinite Scrolling\n\nInfinite scrolling is a popular interaction widely used on social media and e-commerce platforms to enhance user engagement. In this case, the target will be the same page as above but with [infinite scrolling instead of a ‘Load More’ button](https://www.scrapingcourse.com/infinite-scrolling):\n\n![infinite scrolling instead of a 'Load More' button](https://github.com/luminati-io/complex-navigation-scraping/blob/main/Images/Infinite-scrolling-example-1024x501.gif)\n\nMost browser automation tools do not provide a direct method for scrolling down or up a page, and Selenium is not an exception. Instead, you need to execute a JavaScript script on the page to perform the scrolling operation.\n\nThe solution is to write a custom JavaScript script that scrolls down:\n\n1.  A specified number of times, or\n2.  Until no more data is available to load.\n\n\u003e **Note**:\\\n\u003e Each scroll loads new data and increments the number of elements on the page.\n\nAfter that, you can scrape the newly loaded content.\n\nHere is the code to use infinite scrolling in Selenium:\n\n```python\nfrom selenium import webdriver\nfrom selenium.webdriver.common.by import By\nfrom selenium.webdriver.chrome.options import Options\nfrom selenium.webdriver.support.ui import WebDriverWait\n\n# Set up Chrome options for headless mode\noptions = Options()\n# options.add_argument(\"--headless\")\n\n# Create a Chrome web driver instance\ndriver = webdriver.Chrome(options=options)\n\n# Connect to the target page with infinite scrolling\ndriver.get(\"https://www.scrapingcourse.com/infinite-scrolling\")\n\n# Current page height\nscroll_height = driver.execute_script(\"return document.body.scrollHeight\")\n# Number of products on the page\nproduct_count = len(driver.find_elements(By.CSS_SELECTOR, \".product-item\"))\n\n# Max number of scrolls\nmax_scrolls = 10\nscroll_count = 1\n\n# Limit the number of scrolls to 10\nwhile scroll_count \u003c max_scrolls:\n    # Scroll down\n    driver.execute_script(\"window.scrollTo(0, document.body.scrollHeight);\")\n\n    # Wait until the number of product items on the page has increased\n    WebDriverWait(driver, 10).until(lambda driver: len(driver.find_elements(By.CSS_SELECTOR, \".product-item\")) \u003e product_count)\n\n    # Update the product count\n    product_count = len(driver.find_elements(By.CSS_SELECTOR, \".product-item\"))\n\n    # Get the new page height\n    new_scroll_height = driver.execute_script(\"return document.body.scrollHeight\")\n\n    # If no new content has been loaded\n    if new_scroll_height == scroll_height:\n        break\n\n    # Update scroll height and increment scroll count\n    scroll_height = new_scroll_height\n    scroll_count += 1\n\n# Scrape product details after infinite scrolling\nproducts = []\nproduct_elements = driver.find_elements(By.CSS_SELECTOR, \".product-item\")\nfor product_element in product_elements:\n    # Extract product details\n    name = product_element.find_element(By.CSS_SELECTOR, \".product-name\").text\n    image = product_element.find_element(By.CSS_SELECTOR, \".product-image\").get_attribute(\"src\")\n    price = product_element.find_element(By.CSS_SELECTOR, \".product-price\").text\n    url = product_element.find_element(By.CSS_SELECTOR, \"a\").get_attribute(\"href\")\n\n    # Store the scraped data\n    products.append({\n        \"name\": name,\n        \"image\": image,\n        \"price\": price,\n        \"url\": url\n    })\n\n# Export to CSV/JSON...\n\n# Close the browser driver\ndriver.quit() \n```\n\nThis script handles infinite scrolling by first identifying the current page height and product count. It limits the scrolling process to a maximum of 10 iterations. During each iteration, it:\n\n1.  Scrolls down to the bottom\n2.  Waits for the product count to increase (indicating new content has loaded)\n3.  Compares the page height to detect whether further content is available\n\nIf the page height remains unchanged after a scroll, the loop terminates, signaling that there is no more data to load.\n\n## Conclusion\n\nWeb scraping can be challenging when complex navigation patterns are involved, businesses can make it even more difficult by employing anti-scraping measures to block automated scripts. Browser automation tools, like Selenium, cannot bypass those restrictions.\n\nThe solution is to use a cloud-based browser like [Scraping Browser](https://brightdata.com/products/scraping-browser) which integrates with Playwright, Puppeteer, Selenium, and other tools, automatically rotating IPs with each request. It can manage browser fingerprinting, retries, CAPTCHA solving, and more. Say goodbye to getting blocked when navigating complex sites!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluminati-io%2Fcomplex-navigation-scraping","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fluminati-io%2Fcomplex-navigation-scraping","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluminati-io%2Fcomplex-navigation-scraping/lists"}