{"id":23006663,"url":"https://github.com/aaronvb/aws_lambda_go_scraper","last_synced_at":"2025-04-02T15:13:54.999Z","repository":{"id":138759267,"uuid":"137565329","full_name":"aaronvb/aws_lambda_go_scraper","owner":"aaronvb","description":"Website text scraper in Go using AWS Lambda","archived":false,"fork":false,"pushed_at":"2018-08-16T02:12:38.000Z","size":5,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-21T22:38:17.353Z","etag":null,"topics":["aws-lambda","aws-ses","cron","go","golang","recurring","scraper"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aaronvb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-06-16T08:21:30.000Z","updated_at":"2024-04-21T15:23:52.000Z","dependencies_parsed_at":null,"dependency_job_id":"e2980270-83e5-460a-8c4a-55d35891aea7","html_url":"https://github.com/aaronvb/aws_lambda_go_scraper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronvb%2Faws_lambda_go_scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronvb%2Faws_lambda_go_scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronvb%2Faws_lambda_go_scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronvb%2Faws_lambda_go_scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aaronvb","download_url":"https://codeload.github.com/aaronvb/aws_lambda_go_scraper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246837685,"owners_count":20841903,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws-lambda","aws-ses","cron","go","golang","recurring","scraper"],"created_at":"2024-12-15T08:13:14.732Z","updated_at":"2025-04-02T15:13:54.968Z","avatar_url":"https://github.com/aaronvb.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Scrape websites with AWS Lambda\nThis is a basic website scraper written in Go that will search for text.\n\nIf text is found, results are e-mailed to user through Amazon SES.\n\n## Why\nI built this to let me know when certain websites update with the text I'm looking for. This utilizes AWS Lambda and cron to run continuously without a server setup.\n\n## How to use this\n\n### Setup\nFirst, clone the repo.\n```\n\u003e git clone git@github.com:aaronvb/aws_lambda_go_scraper.git\n\u003e cd aws_lambda_go_scraper\n```\n\nBuild the Go script and zip for AWS Lambda.\n```\n\u003e GOOS=linux GOARCH=amd64 go build -o main lambda_scraper.go\n\u003e zip main.zip main\n```\n\nUpload the zip file to the AWS Lambda function, and make sure the handler is set to `main`.\n\nNext, create 3 environment variables: `RECIPIENT` will be the email which receives the notification, `SENDER` which will be the email address that sends the notification, and last `SES_LOCATION` which is the location of your SES(ie: us-west-2).\n\nFinally, make sure the role which the AWS Lambda function is using has permission to Amazon SES. Also, don't forget to add the email address to SES and verify it so it can receive emails.\n\n### Running the function\nCreate a test event. In the event data pass a JSON hash which has a key `urls` and a string value with the urls you want to scrape, separated by commas, and a key `words`, with a string value of comma separated words you wish to scrape.\n\nExample:\n\n```\n{\n  \"urls\": \"https://aaronvb.com,https://aaronvb.com/articles/selection-sort-in-ruby.html\",\n  \"words\": \"ruby,Hawaii,foobar\"\n}\n```\n\n## Automation\n\nI [wrote an article](https://medium.com/@aaronvb/simple-website-text-scraping-with-go-and-aws-lambda-cd5df25f5b2b) explaining how to setup automated scraping using AWS CloudWatch with AWS Lambda.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaronvb%2Faws_lambda_go_scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faaronvb%2Faws_lambda_go_scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaronvb%2Faws_lambda_go_scraper/lists"}