{"id":21985185,"url":"https://github.com/teticio/lambda-selenium","last_synced_at":"2025-07-20T17:04:34.831Z","repository":{"id":195188990,"uuid":"692434062","full_name":"teticio/lambda-selenium","owner":"teticio","description":"Use AWS Lambda functions as a proxy pool to scrape web pages with Selenium.","archived":false,"fork":false,"pushed_at":"2024-02-02T10:01:31.000Z","size":22,"stargazers_count":10,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-30T08:01:54.511Z","etag":null,"topics":["lambda-functions","proxy","scraping","selenium","terraform"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/teticio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-09-16T13:16:49.000Z","updated_at":"2025-01-22T08:02:34.000Z","dependencies_parsed_at":"2023-11-09T11:56:59.338Z","dependency_job_id":null,"html_url":"https://github.com/teticio/lambda-selenium","commit_stats":null,"previous_names":["teticio/lambda-selenium"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/teticio/lambda-selenium","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teticio%2Flambda-selenium","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teticio%2Flambda-selenium/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teticio%2Flambda-selenium/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teticio%2Flambda-selenium/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/teticio","download_url":"https://codeload.github.com/teticio/lambda-selenium/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teticio%2Flambda-selenium/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266161905,"owners_count":23885928,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["lambda-functions","proxy","scraping","selenium","terraform"],"created_at":"2024-11-29T18:12:49.266Z","updated_at":"2025-07-20T17:04:34.812Z","avatar_url":"https://github.com/teticio.png","language":"Python","readme":"# Lambda Selenium\n\n(See also [lambda-scraper](https://github.com/teticio/lambda-scraper))\n\nUse AWS Lambda functions as a proxy to scrape web pages with Selenium. This is a cost effective way to have access to a large pool of IP addresses. Run the following to create as many Lambda functions as you need (one for each IP address). The number of functions as well as the region can be specified in `variables.tf`. Each Lambda function changes IP address after approximately 6 minutes of inactivity. For example, you could create 360 Lambda functions which you cycle through one per second, while making as many requests as possible via each corresponding IP address. Note that, in practice, AWS will sometimes assign the same IP address to more than one Lambda function.\n\n## Pre-requisites\n\nYou will need to have installed Terraform and Docker.\n\n## Usage\n\n```bash\ngit clone https://github.com/teticio/lambda-selenium.git\ncd lambda-selenium\nterraform init\nterraform apply -auto-approve\n# run \"terraform apply -destroy -auto-approve\" in the same directory to tear all this down again\n```\n\nYou can specify an `AWS_PROFILE` and `AWS_REGION` with\n\n```bash\nterraform apply -auto-approve -var 'region=AWS_REGION' -var 'profile=AWS_PROFILE'\n```\n\nAn example of how to use this from Python is provided in `test_selenium.py`. It runs the script in `example.py` to search for descriptions of dog breeds in Google.\n\n```bash\nAWS_DEFAULT_REGION=AWS_REGION python test_selenium.py\n```\n\nThere are also examples of running the Lambda functions in parallel and asynchronously, which greatly speed up the process.\n\n```bash\n# Multi-processing (uses multiple CPU cores)\nAWS_DEFAULT_REGION=AWS_REGION python test_selenium_parallel.py\n# Asynchronous\nAWS_DEFAULT_REGION=AWS_REGION python test_selenium_async.py\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteticio%2Flambda-selenium","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fteticio%2Flambda-selenium","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteticio%2Flambda-selenium/lists"}