{"id":18783953,"url":"https://github.com/sparkpost/event-data","last_synced_at":"2026-04-29T09:36:55.314Z","repository":{"id":138516333,"uuid":"104136314","full_name":"SparkPost/event-data","owner":"SparkPost","description":"self-hosted message events","archived":false,"fork":false,"pushed_at":"2017-09-21T21:21:47.000Z","size":37,"stargazers_count":3,"open_issues_count":1,"forks_count":1,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-05-21T08:48:49.933Z","etag":null,"topics":["api","aws","data","email","webhooks"],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SparkPost.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-09-19T22:35:37.000Z","updated_at":"2022-10-04T12:34:41.000Z","dependencies_parsed_at":"2023-07-14T04:30:35.440Z","dependency_job_id":null,"html_url":"https://github.com/SparkPost/event-data","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/SparkPost/event-data","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SparkPost%2Fevent-data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SparkPost%2Fevent-data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SparkPost%2Fevent-data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SparkPost%2Fevent-data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SparkPost","download_url":"https://codeload.github.com/SparkPost/event-data/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SparkPost%2Fevent-data/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32420350,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T06:29:02.080Z","status":"ssl_error","status_checked_at":"2026-04-29T06:29:00.631Z","response_time":110,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","aws","data","email","webhooks"],"created_at":"2024-11-07T20:41:28.029Z","updated_at":"2026-04-29T09:36:55.286Z","avatar_url":"https://github.com/SparkPost.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Event Data\n\n## What is this?\n\nIf you've ever been frustrated with the data retention limitation on our Message Events Interface, this is a possible solution.\n\nEvent Data is an (almost) drop-in replacement for the Message Events interface.\nIt lets you configure your own data retention period, add custom filters, and optimize for your most common queries.\nAll of the system components are eligible for the AWS free tier, so this system will be no- or low-cost to operate.\n\nBy default, this system **allows anyone who knows your API Gateway url to see your event data, which includes your customers' email addresses**.\nHere are the [official docs](https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-control-access-using-iam-policies-to-invoke-api.html) on setting up an IP Whitelist for API Gateway, and [another post](http://benfoster.io/blog/aws-api-gateway-ip-restrictions) that covers whitelist setup, as well as how to use [Postman](https://getpostman.com) to sign and submit requests.\n\nThe main difference between [Message Events](https://developers.sparkpost.com/api/message-events.html) and this system is that Message Events returns JSON that looks like:\n\n    {\n      \"links\": [],\n      \"results\": [],\n      \"total_count\": 0\n    }\n\nwhereas this system returns an array of events, the `results` value from above, since pagination isn't currently supported.\n\n## How do I use it?\n\nThe majority of the system is described in [this CloudFormation template](./event-data.yaml).\nSome of the setup steps require using the AWS Console (webui) or the `aws` command line tool.\nFor starters, you'll need [an AWS account](https://aws.amazon.com/).\n\n### RDS Setup\n\nOnce you're signed into your shiny new account, select [RDS](https://us-west-2.console.aws.amazon.com/rds/), then `Launch a DB Instance`.\nClick the `Free tier eligible only` checkbox, and click on the PostgreSQL elephant, then the `Select` button.\nChange `Allocated Storage` to `20GB`, which was the maximum amount of free tier storage allowed as of this writing.\nChoose a `DB Instance Identifier` (nickname), and `Username` / `Password` you'll use later to connect, then click `Next Step`.\nAfter saving the password in your favorite [password](https://1password.com/) [management](https://www.lastpass.com/) [tool](http://keepass.info/), of course.\nDefaults are mostly good on this screen (`Configure Advanced Settings`) except for `Database Name`, which can be the same as your `Instance Identifier` / nickname.\nWhen you're ready, click `Launch DB Instance` to set the wheels in motion. It'll probably take a few minutes to launch.\n\n### CLI and S3\n\nTo continue feeling productive, you can install and configure the `aws` CLI, which is covered in great detail [here](https://github.com/aws/aws-cli#readme).\n\nOnce that's all [configured](https://github.com/aws/aws-cli#user-content-getting-started), let's do the next manual step:\nmake an S3 bucket that our Lambda functions can call home.\n\n    $ aws s3 mb s3://best-lambdas-evar\n\n### RDS Setup, Part Deux\n\nHooray, we have a database!\nBut we have a blank database.\nAnd we can't connect to our blank database.\nLet's fix that last bit first by adding a rule that allows our IP to connect: `Services \u003e VPC \u003e Security Groups \u003e (select yours)`.\nHere's two quick ways to get your IP:\n\n    $ dig +short myip.opendns.com @resolver1.opendns.com\n    $ dig +short txt o-o.myaddr.l.google.com @ns1.google.com\n\nClick the `Inbound Rules` tab at the bottom, then `Edit` and `Add another rule`.\nEnter `5432` in the `Port Range` column, and your IP in the `Source` column and click `Save`.\nNow we can at least connect to our database using [`psql`](https://www.postgresql.org/docs/current/static/app-psql.html).\nWhich we will not do now. Yet.\n\n### Environment Variables\n\nGathering all of this information is one of the big reasons I'd like to manage the RDS setup with CloudFormation, since then the majority of these can be references internal to the CF template. What I've done to make this easier on myself is to keep all this information in my password manager so I can do a quick copy/paste to set all the required env vars.\n\n#### CloudFormation Stuff\n\n`CF_STACK_NAME` - pick a name for your \"stack\" (group of things CF creates)\n\n#### S3 Stuff\n\n`LAMBDA_S3_BUCKET` - this is the name of the bucket we created in `CLI and S3`\n\n`WEBHOOKS_S3_BUCKET` - name of bucket to be created by CloudFormation\n\n#### RDS Stuff\n\n`PGHOST` - hostname of RDS database (`Services \u003e RDS \u003e DB Instances \u003e (expand row) \u003e Endpoint`)\n\n`PGDB` - `nickname` from the `RDS Setup` section\n\n`PGUSER` - `Username` from the `RDS Setup` section\n\n`PGPASS` - `Password` from the `RDS Setup` section\n\n#### VPC Stuff\n\nIf you have more than one VPC, you AWS pro, make a note of which one your RDS instance runs in.\nClick `Services \u003e VPC`, click `VPC` again to bring up a listing, select the VPC that has your RDS instance.\n\n`RDS_VPC_ID` - `VPC ID` column value\n\n`RDS_RTB_ID` - `Route table` column value\n\nIn the far left menu, `Your VPCs` will be selected, click `Subnets`.\n\n`RDS_SN` - comma-separated list of `Subnet ID` column values for RDS VPC\n\n    RDS_SN=subnet-f0000000,subnet-d0000000,subnet-b0000000\n\nAgain in the far left menu, click `Security Groups`\n\n`RDS_SG` - `Group ID` column value, for RDS VPC\n\n### RDS Setup, Part Trois\n\nSince we have our handy env vars filled with PostgreSQL connection info, let's create our tables:\n\n    $ psql -h $PGHOST -U $PGUSER -d $PGDB\n    Password for user msgevents:\n    psql (9.6.1, server 9.6.2)\n    SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)\n    msgevents=\u003e \\i ./sql/tables.ddl\n\nand load in the auto-partitioning code:\n\n    msgevents=\u003e \\i ./sql/auto-partitioner.sql\n\nand voila, our database is ready to accept event data.\n\n### CloudFormation\n\nAKA the go button(s) for all the not-database stuff.\nThis repo contains two scripts, `package` and `deploy`, corresponding to the `aws cloudformation` CLI commands.\nIf you'd like some more insight into what they're doing, [this blog post](https://aws.amazon.com/blogs/compute/introducing-simplified-serverless-application-deplyoment-and-management/) gives an overview.\nFor even more detail, read through and cross-reference with `aws cloudformation package help` / `aws cloudformation deploy help`.\n\nBasically, `package` gets everything ready to go.\nIt uploads your Lambda code to the specified S3 bucket and generates another CloudFormation template referencing that.\n\n    $ ./package\n    Uploading to 8774ed690767d127efd4345b71235945  559845 / 559845.0  (100.00%)\n    Successfully packaged artifacts and wrote output template to file event-data.cf.yaml.\n    Execute the following command to deploy the packaged template\n    aws cloudformation deploy --template-file ./event-data.cf.yaml --stack-name \u003cYOUR STACK NAME\u003e\n\nOnce that's done, we can `deploy`.\n\n    $ ./deploy\n    Waiting for changeset to be created..\n    Waiting for stack create/update to complete\n    Successfully created/updated stack - \u003cYOUR STACK NAME\u003e\n\n\n## How do I test it?\n\nSparkPost's webhook config page lets us send test payloads to sanity check our setup, so let's do that.\nFirst we need the URL of our endpoint: `Services \u003e API Gateway \u003e \u003cYOUR STACK NAME\u003e \u003e Stages \u003e Prod \u003e Invoke URL`.\nAlso, we need to append the correct `Resource` path, which in this case is `/store_batch`.\nThat should end up looking something like:\n\n    https://0123456789.execute-api.us-west-2.amazonaws.com/Prod/store_batch\n\nLog in to your SparkPost account and click `Account \u003e Webhooks \u003e New Webhook`.\nPick any `Webhook Name` you like, use the url we found above as the `Target URL`, and `Add Webhook`.\nTo send a batch of test data, click the aptly-named `Test` link, then scroll down and click `Send Test Batch`.\nThe batch will be sent, and the UI will display the server's response.\nNow remember we're only storing the batch in-band, so there are a couple places we can look for info on what happened.\n\nThe first place is CloudWatch: `Services \u003e CloudWatch \u003e Logs`.\nThere will be a few `Log Groups` there.\nThe one containing `StoreBatch` isn't very interesting, it shows things we can also see by looking at `ProcessBatch`, so let's click through into that one.\nWe can see a message containing `deleted processed batch`, and a batch UUID, which means success.\n\nTo search through the test data we've just loaded, let's use the `query_events` endpoint. When querying the test data, you'll need to specify `from` and `to`, since the test data is dated `2016-02-02`, and the default time window is the last 24 hours.\n\n    $ curl https://0123456789.execute-api.us-west-2.amazonaws.com/Prod/query_events\\?type\\=open\\\u0026from\\=2016-02-02T00:00:00Z\\\u0026to\\=2016-02-03T00:00:00Z\n\nWhich will hand back a JSON-encoded array of matching events.\nIf you're handy with `psql`, you can also connect directly and examine the `batches` and `events` tables.\nThe `events` data is [partitioned](https://www.postgresql.org/docs/current/static/ddl-partitioning.html) by month in this setup, which makes it super easy to do things like archive data a month at a time, and lets the query planner scan only the relevant months.\n\n### That's all folks!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsparkpost%2Fevent-data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsparkpost%2Fevent-data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsparkpost%2Fevent-data/lists"}