https://github.com/sparkpost/event-data

self-hosted message events
https://github.com/sparkpost/event-data

api aws data email webhooks

Last synced: about 2 months ago
JSON representation

self-hosted message events

Host: GitHub
URL: https://github.com/sparkpost/event-data
Owner: SparkPost
Created: 2017-09-19T22:35:37.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2017-09-21T21:21:47.000Z (over 8 years ago)
Last Synced: 2024-12-29T11:49:12.618Z (about 1 year ago)
Topics: api, aws, data, email, webhooks
Language: JavaScript
Size: 36.1 KB
Stars: 3
Watchers: 10
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Event Data

## What is this?

If you've ever been frustrated with the data retention limitation on our Message Events Interface, this is a possible solution.

Event Data is an (almost) drop-in replacement for the Message Events interface.
It lets you configure your own data retention period, add custom filters, and optimize for your most common queries.
All of the system components are eligible for the AWS free tier, so this system will be no- or low-cost to operate.

By default, this system **allows anyone who knows your API Gateway url to see your event data, which includes your customers' email addresses**.
Here are the [official docs](https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-control-access-using-iam-policies-to-invoke-api.html) on setting up an IP Whitelist for API Gateway, and [another post](http://benfoster.io/blog/aws-api-gateway-ip-restrictions) that covers whitelist setup, as well as how to use [Postman](https://getpostman.com) to sign and submit requests.

The main difference between [Message Events](https://developers.sparkpost.com/api/message-events.html) and this system is that Message Events returns JSON that looks like:

{
"links": [],
"results": [],
"total_count": 0
}

whereas this system returns an array of events, the `results` value from above, since pagination isn't currently supported.

## How do I use it?

The majority of the system is described in [this CloudFormation template](./event-data.yaml).
Some of the setup steps require using the AWS Console (webui) or the `aws` command line tool.
For starters, you'll need [an AWS account](https://aws.amazon.com/).

### RDS Setup

Once you're signed into your shiny new account, select [RDS](https://us-west-2.console.aws.amazon.com/rds/), then `Launch a DB Instance`.
Click the `Free tier eligible only` checkbox, and click on the PostgreSQL elephant, then the `Select` button.
Change `Allocated Storage` to `20GB`, which was the maximum amount of free tier storage allowed as of this writing.
Choose a `DB Instance Identifier` (nickname), and `Username` / `Password` you'll use later to connect, then click `Next Step`.
After saving the password in your favorite [password](https://1password.com/) [management](https://www.lastpass.com/) [tool](http://keepass.info/), of course.
Defaults are mostly good on this screen (`Configure Advanced Settings`) except for `Database Name`, which can be the same as your `Instance Identifier` / nickname.
When you're ready, click `Launch DB Instance` to set the wheels in motion. It'll probably take a few minutes to launch.

### CLI and S3

To continue feeling productive, you can install and configure the `aws` CLI, which is covered in great detail [here](https://github.com/aws/aws-cli#readme).

Once that's all [configured](https://github.com/aws/aws-cli#user-content-getting-started), let's do the next manual step:
make an S3 bucket that our Lambda functions can call home.

$ aws s3 mb s3://best-lambdas-evar

### RDS Setup, Part Deux

Hooray, we have a database!
But we have a blank database.
And we can't connect to our blank database.
Let's fix that last bit first by adding a rule that allows our IP to connect: `Services > VPC > Security Groups > (select yours)`.
Here's two quick ways to get your IP:

$ dig +short myip.opendns.com @resolver1.opendns.com
$ dig +short txt o-o.myaddr.l.google.com @ns1.google.com

Click the `Inbound Rules` tab at the bottom, then `Edit` and `Add another rule`.
Enter `5432` in the `Port Range` column, and your IP in the `Source` column and click `Save`.
Now we can at least connect to our database using [`psql`](https://www.postgresql.org/docs/current/static/app-psql.html).
Which we will not do now. Yet.

### Environment Variables

Gathering all of this information is one of the big reasons I'd like to manage the RDS setup with CloudFormation, since then the majority of these can be references internal to the CF template. What I've done to make this easier on myself is to keep all this information in my password manager so I can do a quick copy/paste to set all the required env vars.

#### CloudFormation Stuff

`CF_STACK_NAME` - pick a name for your "stack" (group of things CF creates)

#### S3 Stuff

`LAMBDA_S3_BUCKET` - this is the name of the bucket we created in `CLI and S3`

`WEBHOOKS_S3_BUCKET` - name of bucket to be created by CloudFormation

#### RDS Stuff

`PGHOST` - hostname of RDS database (`Services > RDS > DB Instances > (expand row) > Endpoint`)

`PGDB` - `nickname` from the `RDS Setup` section

`PGUSER` - `Username` from the `RDS Setup` section

`PGPASS` - `Password` from the `RDS Setup` section

#### VPC Stuff

If you have more than one VPC, you AWS pro, make a note of which one your RDS instance runs in.
Click `Services > VPC`, click `VPC` again to bring up a listing, select the VPC that has your RDS instance.

`RDS_VPC_ID` - `VPC ID` column value

`RDS_RTB_ID` - `Route table` column value

In the far left menu, `Your VPCs` will be selected, click `Subnets`.

`RDS_SN` - comma-separated list of `Subnet ID` column values for RDS VPC

RDS_SN=subnet-f0000000,subnet-d0000000,subnet-b0000000

Again in the far left menu, click `Security Groups`

`RDS_SG` - `Group ID` column value, for RDS VPC

### RDS Setup, Part Trois

Since we have our handy env vars filled with PostgreSQL connection info, let's create our tables:

$ psql -h $PGHOST -U $PGUSER -d $PGDB
Password for user msgevents:
psql (9.6.1, server 9.6.2)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
msgevents=> \i ./sql/tables.ddl

and load in the auto-partitioning code:

msgevents=> \i ./sql/auto-partitioner.sql

and voila, our database is ready to accept event data.

### CloudFormation

AKA the go button(s) for all the not-database stuff.
This repo contains two scripts, `package` and `deploy`, corresponding to the `aws cloudformation` CLI commands.
If you'd like some more insight into what they're doing, [this blog post](https://aws.amazon.com/blogs/compute/introducing-simplified-serverless-application-deplyoment-and-management/) gives an overview.
For even more detail, read through and cross-reference with `aws cloudformation package help` / `aws cloudformation deploy help`.

Basically, `package` gets everything ready to go.
It uploads your Lambda code to the specified S3 bucket and generates another CloudFormation template referencing that.

$ ./package
Uploading to 8774ed690767d127efd4345b71235945 559845 / 559845.0 (100.00%)
Successfully packaged artifacts and wrote output template to file event-data.cf.yaml.
Execute the following command to deploy the packaged template
aws cloudformation deploy --template-file ./event-data.cf.yaml --stack-name

Once that's done, we can `deploy`.

$ ./deploy
Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack -

## How do I test it?

SparkPost's webhook config page lets us send test payloads to sanity check our setup, so let's do that.
First we need the URL of our endpoint: `Services > API Gateway > > Stages > Prod > Invoke URL`.
Also, we need to append the correct `Resource` path, which in this case is `/store_batch`.
That should end up looking something like:

https://0123456789.execute-api.us-west-2.amazonaws.com/Prod/store_batch

Log in to your SparkPost account and click `Account > Webhooks > New Webhook`.
Pick any `Webhook Name` you like, use the url we found above as the `Target URL`, and `Add Webhook`.
To send a batch of test data, click the aptly-named `Test` link, then scroll down and click `Send Test Batch`.
The batch will be sent, and the UI will display the server's response.
Now remember we're only storing the batch in-band, so there are a couple places we can look for info on what happened.

The first place is CloudWatch: `Services > CloudWatch > Logs`.
There will be a few `Log Groups` there.
The one containing `StoreBatch` isn't very interesting, it shows things we can also see by looking at `ProcessBatch`, so let's click through into that one.
We can see a message containing `deleted processed batch`, and a batch UUID, which means success.

To search through the test data we've just loaded, let's use the `query_events` endpoint. When querying the test data, you'll need to specify `from` and `to`, since the test data is dated `2016-02-02`, and the default time window is the last 24 hours.

$ curl https://0123456789.execute-api.us-west-2.amazonaws.com/Prod/query_events\?type\=open\&from\=2016-02-02T00:00:00Z\&to\=2016-02-03T00:00:00Z

Which will hand back a JSON-encoded array of matching events.
If you're handy with `psql`, you can also connect directly and examine the `batches` and `events` tables.
The `events` data is [partitioned](https://www.postgresql.org/docs/current/static/ddl-partitioning.html) by month in this setup, which makes it super easy to do things like archive data a month at a time, and lets the query planner scan only the relevant months.

### That's all folks!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sparkpost/event-data

Awesome Lists containing this project

README