Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/prx/dovetail-cdn-usage
Lambda to query Dovetail CloudFront usage and insert into BigQuery
https://github.com/prx/dovetail-cdn-usage
Last synced: about 1 month ago
JSON representation
Lambda to query Dovetail CloudFront usage and insert into BigQuery
- Host: GitHub
- URL: https://github.com/prx/dovetail-cdn-usage
- Owner: PRX
- License: agpl-3.0
- Created: 2024-05-02T14:41:49.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-05-08T16:18:34.000Z (8 months ago)
- Last Synced: 2024-05-08T23:24:28.680Z (8 months ago)
- Language: JavaScript
- Size: 163 KB
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Dovetail CDN Usage
AWS Lambda to query Dovetail CloudFront usage and insert into BigQuery
## Overview
1. Requests to the [Dovetail CDN](https://github.com/PRX/Infrastructure/tree/main/cdn/dovetail-cdn) are logged to an S3 bucket.
2. This lambda BigQuery-queries for the `MAX(day) FROM dt_bytes`, and processes days >= the result (or all the way back to the S3 expiration date).
3. Then we Athena-query for a day of logs, grouping by path and summing bytes sent.
4. Paths are parsed and grouped as `///...` or `///episode/...`. Unrecognized paths that use a bunch of bandwidth are warning-logged.
5. Resulting bytes usage is inserted back into BigQuery:```
{day: "2024-04-23", feeder_podcast: 123, feeder_episode: "abcd-efgh", feeder_feed: null, bytes: 123456789}
```## Development
Local development is dependency free! Just:
```sh
yarn install
yarn test
yarn lint
```However, if you actually want to hit Athena/BigQuery, you'll need to `cp env-example .env` and fill in several dependencies:
- `ATHENA_DB` the athena database you're using
- `ATHENA_TABLE` the athena table that has been configured to [query to the Dovetail CDN S3 logs](https://docs.aws.amazon.com/athena/latest/ug/cloudfront-logs.html#create-cloudfront-table-standard-logs)
- **NOTE:** you must have your AWS credentials setup and configured locally to reach/query Athena
- `BQ_DATASET` the BigQuery dataset to load the `dt_bytes` table in. You should use `development` or something locally (not `staging` or `production`)Then run `yarn start` and you're off!
## Deployment
This function's code is deployed as part of the usual
[PRX CI/CD](https://github.com/PRX/Infrastructure/tree/main?tab=readme-ov-file#cicd) process.
The lambda zip is built via `yarn build`, uploaded to S3, and deployed into the wild.While that's all straightforward, there are some gotchas setting up access:
1. AWS permissions are (Athena, S3, Glue, etc) are documented in the [Cloudformation Stack](https://github.com/PRX/Infrastructure/blob/main/spire/templates/apps/dovetail-cdn-usage.yml) for this app.
2. Google is configured via the `BQ_CLIENT_CONFIG` ENV and [Federated Access](https://github.com/PRX/internal/wiki/Guide:-Google-Cloud-Workload-Identity-Federation)
3. _In addition to the steps documented in (2)_, the Service Account you create must have the following permissions:
- `BigQuery Job User` in your BigQuery project
- _Any_ role on the BigQuery dataset that provides `bigquery.tables.create`, so the table load jobs can execute. We have a custom role to provide this minimal access, but any role with that create permission will work.
- `BigQuery Data Editor` _only_ on the `dt_bytes` table in the dataset for this environment (click the table name in BigQuery UI -> Share -> Manage Permissions)## License
[AGPL-3.0 License](https://www.gnu.org/licenses/agpl-3.0.html)