{"id":28998580,"url":"https://github.com/luminati-io/seleniumbase-web-scraping","last_synced_at":"2026-04-29T04:38:05.310Z","repository":{"id":283784033,"uuid":"913763672","full_name":"luminati-io/Seleniumbase-web-scraping","owner":"luminati-io","description":"Simplify web scraping with SeleniumBase, leveraging its user-friendly framework and advanced automation features to extract data seamlessly.","archived":false,"fork":false,"pushed_at":"2025-01-15T13:46:48.000Z","size":13,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-03-22T07:02:03.781Z","etag":null,"topics":["python","selenium","seleniumbase","web-scraping"],"latest_commit_sha":null,"homepage":"https://brightdata.com/blog/web-data/web-scraping-with-seleniumbase","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/luminati-io.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-08T10:01:06.000Z","updated_at":"2025-01-15T13:53:33.000Z","dependencies_parsed_at":"2025-03-22T07:12:08.668Z","dependency_job_id":null,"html_url":"https://github.com/luminati-io/Seleniumbase-web-scraping","commit_stats":null,"previous_names":["luminati-io/seleniumbase-web-scraping"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/luminati-io/Seleniumbase-web-scraping","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luminati-io%2FSeleniumbase-web-scraping","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luminati-io%2FSeleniumbase-web-scraping/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luminati-io%2FSeleniumbase-web-scraping/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luminati-io%2FSeleniumbase-web-scraping/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/luminati-io","download_url":"https://codeload.github.com/luminati-io/Seleniumbase-web-scraping/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luminati-io%2FSeleniumbase-web-scraping/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261823771,"owners_count":23215149,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["python","selenium","seleniumbase","web-scraping"],"created_at":"2025-06-25T07:09:16.020Z","updated_at":"2026-04-29T04:38:05.262Z","avatar_url":"https://github.com/luminati-io.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Web Scraping With SeleniumBase\n\n[![Promo](https://github.com/luminati-io/LinkedIn-Scraper/blob/main/Proxies%20and%20scrapers%20GitHub%20bonus%20banner.png)](https://brightdata.com)\n\nSimplify web scraping with SeleniumBase using its advanced features and step-by-step guide. Interested in Selenium web scraping? Check out [this guide](https://brightdata.com/blog/how-tos/using-selenium-for-web-scraping).\n\n## What Is SeleniumBase?\n\nSeleniumBase is a Python framework for browser automation, built on top of Selenium/WebDriver APIs. It supports tasks from testing to scraping and includes features like CAPTCHA bypassing and bot-detection avoidance.\n\n## SeleniumBase vs Selenium: Feature and API Comparison\n\n| Feature                  | SeleniumBase                                      | Selenium                                    |\n|--------------------------|---------------------------------------------------|---------------------------------------------|\n| Built-in test runners    | Integrates with pytest, pynose, and behave        | Requires manual setup for test integration  |\n| Driver management        | Auto-downloads matching browser driver            | Manual download and configuration           |\n| Web automation logic     | Combines steps into single method call            | Requires multiple lines of code             |\n| Selector handling        | Auto-detects CSS or XPath selectors               | Requires explicit selector types            |\n| Timeout handling         | Default timeouts to prevent failures              | Immediate failures without explicit timeouts|\n| Error outputs            | Clean, readable error messages                    | Verbose, less interpretable error logs      |\n| Dashboards and reports   | Built-in dashboards, reports, and screenshots     | No built-in dashboards or reporting         |\n| Desktop GUI applications | Visual tools for test running                     | Lacks desktop GUI tools                     |\n| Test recorder            | Built-in test recorder                            | Requires manual script writing              |\n| Test case management     | Provides CasePlans                                | No built-in test case management            |\n| Data app support         | Includes ChartMaker for data apps                 | No additional tools for data apps           |\n\n## Using SeleniumBase for Web Scraping: Step-By-Step Guide\n\n### Step #1: Project Initialization\n\n```bash\nmkdir seleniumbase-scraper\ncd seleniumbase-scraper\npython -m venv env\n```\n\nActivate the virtual environment:\n\n- On Linux/macOS: `./env/bin/activate`\n- On Windows: `env/Scripts/activate`\n\nInstall SeleniumBase:\n\n```bash\npip install seleniumbase\n```\n\n### Step #2: SeleniumBase Test Setup\n\n```python\nfrom seleniumbase import SB\n\nwith SB() as sb:\n    pass\n```\n\nRun the script:\n\n```bash\npython3 scraper.py --headless\n```\n\n### Step #3: Connect to the Target Page\n\n```python\nsb.open(\"https://quotes.toscrape.com/\")\n```\n\n### Step #4: Select the Quote Elements\n\n```python\nquote_elements = sb.find_elements(\".quote\")\n```\n\n### Step #5: Scrape Quote Data\n\n```python\nfrom selenium.webdriver.common.by import By\n\nfor quote_element in quote_elements:\n    text_element = quote_element.find_element(By.CSS_SELECTOR, \".text\")\n    text = text_element.text.replace(\"“\", \"\").replace(\"”\", \"\")\n    author_element = quote_element.find_element(By.CSS_SELECTOR, \".author\")\n    author = author_element.text\n    tags = [tag.text for tag in quote_element.find_elements(By.CSS_SELECTOR, \".tag\")]\n```\n\n### Step #6: Populate the Quotes Array\n\n```python\nquotes.append({\"text\": text, \"author\": author, \"tags\": tags})\n```\n\n### Step #7: Implement Crawling Logic\n\n```python\nwhile sb.is_element_present(\".next\"):\n    sb.click(\".next a\")\n```\n\n### Step #8: Export the Scraped Data\n\n```python\nimport csv\n\nwith open(\"quotes.csv\", mode=\"w\", newline=\"\", encoding=\"utf-8\") as file:\n    writer = csv.DictWriter(file, fieldnames=[\"text\", \"author\", \"tags\"])\n    writer.writeheader()\n    for quote in quotes:\n        writer.writerow({\"text\": quote[\"text\"], \"author\": quote[\"author\"], \"tags\": \";\".join(quote[\"tags\"])})\n```\n\n### Step #9: Put It All Together\n\n```python\nfrom seleniumbase import SB\nfrom selenium.webdriver.common.by import By\nimport csv\n\nwith SB() as sb:\n    sb.open(\"https://quotes.toscrape.com/\")\n    quotes = []\n    while sb.is_element_present(\".next\"):\n        quote_elements = sb.find_elements(\".quote\")\n        for quote_element in quote_elements:\n            text_element = quote_element.find_element(By.CSS_SELECTOR, \".text\")\n            text = text_element.text.replace(\"“\", \"\").replace(\"”\", \"\")\n            author_element = quote_element.find_element(By.CSS_SELECTOR, \".author\")\n            author = author_element.text\n            tags = [tag.text for tag in quote_element.find_elements(By.CSS_SELECTOR, \".tag\")]\n            quotes.append({\"text\": text, \"author\": author, \"tags\": tags})\n        sb.click(\".next a\")\n    with open(\"quotes.csv\", mode=\"w\", newline=\"\", encoding=\"utf-8\") as file:\n        writer = csv.DictWriter(file, fieldnames=[\"text\", \"author\", \"tags\"])\n        writer.writeheader()\n        for quote in quotes:\n            writer.writerow({\"text\": quote[\"text\"], \"author\": quote[\"author\"], \"tags\": \";\".join(quote[\"tags\"])})\n```\n\nRun the scraper:\n\n```bash\npython3 script.py --headless\n```\n\n## Advanced SeleniumBase Scraping Use Cases\n\n### Automate Form Filling and Submission\n\n```python\nfrom seleniumbase import BaseCase\nBaseCase.main(__name__, __file__)\n\nclass LoginTest(BaseCase):\n    def test_submit_login_form(self):\n        self.open(\"https://quotes.toscrape.com/login\")\n        self.type(\"#username\", \"test\")\n        self.type(\"#password\", \"test\")\n        self.click(\"input[type=\\\"submit\\\"]\")\n        self.assert_text(\"Top Ten tags\")\n```\n\nRun the test:\n\n```bash\npytest login.py\n```\n\n### Bypass Simple Anti-Bot Technologies\n\n```python\nfrom seleniumbase import SB\n\nwith SB(uc=True) as sb:\n    url = \"https://www.scrapingcourse.com/antibot-challenge\"\n    sb.uc_open_with_reconnect(url, reconnect_time=4)\n    sb.uc_gui_click_captcha()\n    sb.save_screenshot(\"screenshot.png\")\n```\n\n### Bypass Complex Anti-Bot Technologies\n\n```python\nfrom seleniumbase import SB\n\nwith SB(uc=True, test=True) as sb:\n    url = \"https://gitlab.com/users/sign_in\"\n    sb.activate_cdp_mode(url)\n    sb.uc_gui_click_captcha()\n    sb.sleep(2)\n    sb.save_screenshot(\"screenshot.png\")\n```\n\n## Conclusion\n\nSeleniumBase offers advanced features for web scraping, including UC Mode and CDP Mode for bypassing anti-bot measures. For more robust solutions, consider using cloud-based browsers like [Scraping Browser from Bright Data](https://brightdata.com/products/scraping-browser).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluminati-io%2Fseleniumbase-web-scraping","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fluminati-io%2Fseleniumbase-web-scraping","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluminati-io%2Fseleniumbase-web-scraping/lists"}