{"id":22546644,"url":"https://github.com/prx/dovetail-counts-lambda","last_synced_at":"2025-07-28T12:16:32.819Z","repository":{"id":33826650,"uuid":"150329858","full_name":"PRX/dovetail-counts-lambda","owner":"PRX","description":"Count which bytes of a multi-segment mp3 were downloaded","archived":false,"fork":false,"pushed_at":"2025-03-04T20:02:42.000Z","size":358,"stargazers_count":1,"open_issues_count":6,"forks_count":0,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-03-28T08:45:56.998Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://dovetail.prx.org","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PRX.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-09-25T21:11:41.000Z","updated_at":"2025-03-04T20:02:44.000Z","dependencies_parsed_at":"2024-02-29T23:23:30.454Z","dependency_job_id":"189d7c8d-d85d-49ae-889f-91a1ef11a3c7","html_url":"https://github.com/PRX/dovetail-counts-lambda","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/PRX/dovetail-counts-lambda","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PRX%2Fdovetail-counts-lambda","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PRX%2Fdovetail-counts-lambda/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PRX%2Fdovetail-counts-lambda/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PRX%2Fdovetail-counts-lambda/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PRX","download_url":"https://codeload.github.com/PRX/dovetail-counts-lambda/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PRX%2Fdovetail-counts-lambda/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267514958,"owners_count":24100030,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-28T02:00:09.689Z","response_time":68,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-07T15:08:27.585Z","updated_at":"2025-07-28T12:16:32.797Z","avatar_url":"https://github.com/PRX.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Dovetail Counts Lambda\n\nCount which bytes of a multi-segment mp3 were downloaded\n\n# Description\n\nThis lambda function processes incoming kinesis events, counts how many bytes\nwere downloaded by each request, and when a threshold number of seconds has\nbeen sent, logs that request-uuid as a \"download\", in compliance with the\n[IAB Podcast Measurement Technical Guidelines v2.0](https://www.iab.com/wp-content/uploads/2017/12/Podcast_Measurement_v2-Final-Dec2017.pdf)\n\n## Inputs\n\n### Kinesis events\n\nKinesis events come from one of three sources:\n\n1. **DEPRECATED** [dovetail-bytes-lambda](https://github.com/PRX/dovetail-bytes-lambda) [logged JSON](https://github.com/PRX/dovetail-counts-lambda/blob/main/lib/decoder/bytes-lambda.js)\n2. CloudFront [realtime logs](https://github.com/PRX/dovetail-counts-lambda/blob/main/lib/decoder/real-time.js)\n3. CloudFront [standard logs](https://github.com/PRX/dovetail-counts-lambda/blob/main/lib/decoder/standard.js)\n\nAfter decoding (gunzipping, base64-decoding, etc), all three of those sources are normalized\ninto an \"event\" that looks something like:\n\n```json\n{\n  \"le\": \"the-listener-episode\",\n  \"digest\": \"BtTifRE9b9iscgXovKINxPG5HX4Iqzlu1851WvgcCPY\",\n  \"start\": 489274,\n  \"end\": 21229635,\n  \"total\": 21229636,\n  \"region\": \"us-west-2\"\n}\n```\n\nThis tells us which listener-episode (listener-id + episode) and arrangement-digest to lookup. The kinesis\ndata also includes a milliseconds \"timestamp\" of when the CloudWatch log line was\nlogged.\n\n### Dovetail Arrangements\n\nTo find the arrangement of files that made up this mp3, we look in the DynamoDB\narrangements table configured by the `ARRANGEMENTS_DDB_TABLE` env.\n\nThis json was set by the [dovetail-cdn-arranger](https://github.com/PRX/dovetail-cdn-arranger)\nwhen it creates the stitched file, and has the format:\n\n```\n{\n  \"version\": 4,\n  \"data\": {\n    \"f\": [\n      \"https://f.prxu.org/70/d87b79c6-734b-4022-b4ac-4c7da706a505/31f4ddc8-7f66-42a7-91aa-33a2202ce94f.mp3\",\n      \"http://static.adzerk.net/Advertisers/68649ed71ce74259a57f24ce13e5a6cc.mp3\",\n      \"http://static.adzerk.net/Advertisers/68649ed71ce74259a57f24ce13e5a6cc.mp3\",\n      \"http://static.adzerk.net/Advertisers/b09ccdfb797d45a98fc0d0caca13a0b8.mp3\",\n      \"https://f.prxu.org/70/d87b79c6-734b-4022-b4ac-4c7da706a505/91df9d5a-dd0a-471e-89b2-742203bd95c9.mp3\",\n      \"https://f.prxu.org/70/d87b79c6-734b-4022-b4ac-4c7da706a505/111ada0d-46bc-440f-9e06-7cb07a4de83c.mp3\",\n      \"http://static.adzerk.net/Advertisers/b4c8dde093294f388b59e94540876345.mp3\"\n    ],\n    \"t\": \"oaaaooi\",\n    \"a\": {\"f\": \"mp3\", \"b\": 128, \"c\": 2, \"s\": 44100},\n    \"b\": [\n      81204,\n      165841,\n      166625,\n      167409,\n      196405,\n      8610446,\n      8682283,\n      8745238\n    ]\n  }\n}\n```\n\n### Redis Byte-Ranges\n\nSince there is no guarantee when we'll get each byte-range request event, this\nlambda \"pushes\" each request onto a list of bytes for each listener-episode +\nutc-day + arrangement-digest. A lua function in the Redis lib accomplishes this.\n\nSince these are keyed on the UTC day the byte-downloads occurred on, we can\nsafely expire them slightly after midnight. When we're fairly sure no straggler\ndownloaded-bytes will be coming in over kinesis.\n\n```\nredis:6379\u003e GET dtcounts:bytes:\u003clistener-episode\u003e/2019-02-28/\u003cdigest\u003e\n\"0-1,250289-360548,489274-21229635\"\n```\n\n## Outputs\n\nAfter pushing the new byte-range to Redis, we also get back the _complete_ range\ndownloaded for that listener-episode-day-digest. That can be compared to the arrangement to\ndetermine how many total-bytes, and bytes-of-each-segment were downloaded. After\na threshold seconds (or a percentage) of the entire file is downloaded, we then log\nthat as an IAB-2.0 complaint download. For segments, we wait for _all_ the bytes\nto be downloaded before sending an IAB-2.0 complaint impression.\n\nSince we might receive additional requests _after we've logged a download_, we\nalso lock the impression via a redis hash:\n\n```\nredis:6379\u003e HGETALL dtcounts:imp:\u003clistener-episode\u003e:2019-02-28:\u003cdigest\u003e\n1) \"0\"\n2) \"\"\n3) \"1\"\n4) \"\"\n5) \"2\"\n6) \"\"\n7) \"all\"\n8) \"\"\n```\n\nThe TTL on this (`REDIS_IMPRESSION_TTL`) defaults to 24 hours. This should prevent\nlogging duplicate downloads/impressions until the `\u003cutc-day\u003e` rolls over to the\nnext day.\n\n### Digest Cache\n\nRight now, it's possible a listener will download _different arrangements_ of the\nsame episode throughout a single utc-day. In this case, we only want to count the\n**first** arrangement-digest we get a complete download/impression for. This is\naccomplished by locking a redis key to the arrangement-digest for that user, for\n24-hours (or `REDIS_IMPRESSION_TTL`):\n\n```\nredis:6379\u003e GET dtcounts:imp:\u003clistener-episode\u003e:2019-02-28\n\"\u003cthe-arrangement-digest\u003e\"\n```\n\nDownloads/impressions against other digests will be logged to kinesis, but they\nwill include the flags `{isDuplicate: true, cause: 'digestCache'}`.\n\n**NOTE**: this is likely a temporary measure, if we start locking listeners to\na single arrangement for the entire 24-hour UTC day.\n\n### Kinesis Impressions stream\n\nThe `KINESIS_IMPRESSION_STREAM` is the main output of this function. These should be processed by another lambda and streamed to BigQuery via the Dovetail [analytics-ingest-lambda](https://github.com/PRX/analytics-ingest-lambda) These kinesis json records have the format:\n\n```json\n{\n  \"type\": \"bytes\",\n  \"timestamp\": 1539206255516,\n  \"listenerEpisode\": \"some-listener-episode\",\n  \"digest\": \"the-arrangement-digest\",\n  \"bytes\": 9999,\n  \"seconds\": 1.84,\n  \"percent\": 0.65,\n  \"durations\": [12.85924, 948.9482285, 1.5846666666],\n  \"types\": \"aoi\",\n  \"isDuplicate\": true,\n  \"cause\": \"digestCache\"\n}\n```\n\nor\n\n```json\n{\n  \"type\": \"segmentbytes\",\n  \"timestamp\": 1539206255516,\n  \"listenerEpisode\": \"some-listener-episode\",\n  \"digest\": \"the-arrangement-digest\",\n  \"segment\": 3,\n  \"isDuplicate\": true,\n  \"cause\": \"digestCache\"\n}\n```\n\n## Error handling\n\nGenerally, this Lambda attempts to log any errors, but only passes things that\ncan truly be retried back to the\nthe callback() function. Instead, it just allows the origin-pull request to\nproceed, and let CloudFront return whatever it finds in S3.\n\n# Installation\n\nTo get started, first install dev dependencies with `yarn`. Then run `yarn test`. End of list!\n\nOr to use docker, just run `docker-compose build` and `docker-compose run test`.\n\n# License\n\n[AGPL License](https://www.gnu.org/licenses/agpl-3.0.html)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprx%2Fdovetail-counts-lambda","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprx%2Fdovetail-counts-lambda","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprx%2Fdovetail-counts-lambda/lists"}