{"id":13418851,"url":"https://github.com/18F/analytics-reporter","last_synced_at":"2025-03-15T04:30:58.441Z","repository":{"id":24026991,"uuid":"27411729","full_name":"18F/analytics-reporter","owner":"18F","description":"Lightweight analytics reporting and publishing tool for Digital Analytics Program's Google Analytics 360 data.","archived":false,"fork":false,"pushed_at":"2024-04-03T17:59:42.000Z","size":5966,"stargazers_count":620,"open_issues_count":33,"forks_count":152,"subscribers_count":40,"default_branch":"develop","last_synced_at":"2024-04-09T15:13:37.433Z","etag":null,"topics":["analytics","google-analytics"],"latest_commit_sha":null,"homepage":"https://analytics.usa.gov/","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/18F.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2014-12-02T02:44:42.000Z","updated_at":"2024-04-11T16:47:56.941Z","dependencies_parsed_at":"2023-10-20T15:38:51.880Z","dependency_job_id":"aacdc8ef-6963-46e3-96dc-6ecbdee6cc2c","html_url":"https://github.com/18F/analytics-reporter","commit_stats":{"total_commits":404,"total_committers":33,"mean_commits":"12.242424242424242","dds":0.5693069306930694,"last_synced_commit":"7a515d26e35752eba88e50d594ec15e914dfaf96"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/18F%2Fanalytics-reporter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/18F%2Fanalytics-reporter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/18F%2Fanalytics-reporter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/18F%2Fanalytics-reporter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/18F","download_url":"https://codeload.github.com/18F/analytics-reporter/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221536657,"owners_count":16839542,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","google-analytics"],"created_at":"2024-07-30T22:01:07.920Z","updated_at":"2025-03-15T04:30:58.431Z","avatar_url":"https://github.com/18F.png","language":"JavaScript","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"readme":"![Build Status](https://github.com/18F/analytics-reporter/actions/workflows/ci.yml/badge.svg?branch=master)\n[![Snyk](https://snyk.io/test/github/18F/analytics-reporter/badge.svg)](https://snyk.io/test/github/18F/analytics-reporter)\n\n# Analytics Reporter\n\nA lightweight system for publishing analytics data from the Digital Analytics Program (DAP) Google Analytics 4 government-wide property.\nThis project uses the [Google Analytics Data API v1](https://developers.google.com/analytics/devguides/reporting/data/v1/rest) to acquire analytics data and then processes it into a flat data structure.\n\nThis is used in combination with [analytics-reporter-api](https://github.com/18F/analytics-reporter-api) to provide the data which powers the government analytics website, [analytics.usa.gov](https://analytics.usa.gov).\n\nAvailable reports are named and described in [`api.json`](reports/api.json) and [`usa.json`](reports/usa.json). For now, they're hardcoded into the repository.\n\nThe process for adding features to this project is described in\n[Development and deployment process](docs/development_and_deployment_process.md).\n\n## Architecture and Technical Overview\n\nThe application has multiple jobs which run at scheduled intervals. See `deploy/publisher.js`\nfor details on the jobs and the timing at which they are kicked off.\n\nThe database functions as a queue using the [pg-boss library](https://github.com/timgit/pg-boss).\nThe publisher process puts messages on the queue which represent analytics reports\nand how those reports should be fetched, processed, and published. One or more\nconsumer processes receive messages from the queue in parallel and execute the\ncorresponding tasks. See the usage section below for more details about reports\nand jobs.\n\nThe application can publish analytics data reports to AWS S3, to local file, to\nstdout, and/or to the database in JSON or CSV format.\n\nThe two application components are deployed to cloud.gov for dev, staging, and\nproduction environments using GitHub Actions.  See `.github/workflows/ci.yml`\nfor details on the CI and deployment processes.\n\n## Local development setup\n\n### Prerequisites\n\n* NodeJS \u003e v22.x\n* A postgres DB running and/or docker installed\n\n### Install dependencies\n\n```bash\nnpm install\n```\n\n### Linting\n\nThis repo uses Eslint and Prettier for code static analysis and formatting. Run\nthe linter with:\n\n```bash\nnpm run lint\n```\n\nAutomatically fix lint issues with:\n\n```bash\nnpm run lint:fix\n```\n\n### Install git hooks\n\nThere are some git hooks provided in the `./hooks` directory to help with\ncommon development tasks. These will checkout current NPM packages on branch\nchange events, and run the linter on pre-commit.\n\nInstall the provided hooks with the following command:\n\n```bash\nnpm run install-git-hooks\n```\n\n### Running the unit tests\n\nThe unit tests for this repo require a local PostgreSQL database. You can run a\nlocal DB server or create a docker container using the provided test compose\nfile. (Requires docker and docker-compose to be installed)\n\nStarting a docker test DB:\n\n```bash\ndocker-compose -f docker-compose.test.yml up\n```\n\nOnce you have a PostgreSQL DB running locally, you can run the tests. The test\nDB connection in knexfile.js has some default connection config which can be\noverridden with environment variables.  If using the provided docker-compose DB\nthen you can avoid setting the connection details.\n\nRun the tests (pre-test hook runs DB migrations):\n\n```bash\nnpm test\n```\n\n#### Running the unit tests with code coverage reporting\n\nIf you wish to see a code coverage report after running the tests, use the\nfollowing command. This runs the DB migrations, tests, and the NYC code coverage\ntool:\n\n```bash\nnpm run coverage\n```\n\n### Running the integration tests\n\nThe integration tests for this repo require the google analytics credentials to\nbe set in the environment. This can be setup with the dotenv-cli package as\ndescribed in \"Setup Environment\" section above.\n\nNote that these tests make real requests to google analytics APIs and should be\nrun sparingly to avoid being rate limited in our live apps which use the\nsame account credentials.\n\n```bash\n# Run cucumber integration tests\ndotenv -e .env npm run cucumber\n\n# Run cucumber integration tests with node debugging enabled\ndotenv -e .env npm run cucumber:debug\n```\n\nThe cucumber features and support files can be found in the `features` directory\n\n### Running the application locally\n\n#### Setup environment\n\nSee \"Configuration and Google Analytics Setup\" below for the required environment variables and other setup for Google Analytics auth.\n\nIt may be easiest to use the dotenv-cli package to configure the environment for the application.\n\nCreate a `.env` file using `env.example` as a template, with the correct credentials and other config values.\nThis file is ignored in the `.gitignore` file and should not be checked in to the repository.\n\n#### Run the application\n\nTo run the application locally, you'll need a postgres\ndatabase running on port 5432. There is a docker-compose file provided in the\nrepo so that you can start an empty database with the command:\n\n```bash\ndocker-compose up\n```\n\nOnce the database is running, run the database migrations to set the database\nschema:\n\n```bash\nnpm run migrate\n```\n\nThe application runs a queue publisher and a queue consumer, so the following\ncommands will need to be run as separate processes to start the app (uses the\ndotenv package to set the environment variables for the processes):\n\n```bash\n# start publisher\nnpx dotenv -e .env.analytics node -- deploy/publisher.js\n\n# start consumer\nnpx dotenv -e .env.analytics node -- deploy/consumer.js\n```\n\n## Configuration\n\n### Google Analytics\n\n* Enable [Google Analytics API](https://console.cloud.google.com/apis/library/analytics.googleapis.com) for your project in the Google developer dashboard.\n\n* Create a service account for API access in the [Google developer dashboard](https://console.cloud.google.com/iam-admin/serviceaccounts).\n\n* Go to the \"KEYS\" tab for your service account, create new key using \"ADD KEY\" button, and download the **JSON** private key file it gives you.\n\n* Grab the generated client email address (ends with `gserviceaccount.com`) from the contents of the .json file.\n\n* Grant that email address `Read, Analyze \u0026 Collaborate` permissions on the Google Analytics profile(s) whose data you wish to publish.\n\n* Set environment variables for `analytics-reporter`. It needs email address of service account, and view ID in the profile you authorized it to:\n\n```bash\nexport ANALYTICS_REPORT_EMAIL=\"YYYYYYY@developer.gserviceaccount.com\"\nexport ANALYTICS_REPORT_IDS=\"XXXXXX\"\n```\n\nYou may wish to manage these using [`autoenv`](https://github.com/kennethreitz/autoenv). If you do, there is an `example.env` file you can copy to `.env` to get started.\n\nTo find your Google Analytics view ID:\n\n  1. Sign in to your Analytics account.\n  1. Select the Admin tab.\n  1. Select an account from the dropdown in the ACCOUNT column.\n  1. Select a property from the dropdown in the PROPERTY column.\n  1. Select a view from the dropdown in the VIEW column.\n  1. Click \"View Settings\"\n  1. Copy the view ID.  You'll need to enter it with `ga:` as a prefix.\n\n* You can specify your private key through environment variables either as a file path, or the contents of the key (helpful for Heroku and Heroku-like systems).\n\nTo specify a file path (useful in development or Linux server environments):\n\n```\nexport ANALYTICS_KEY_PATH=\"/path/to/secret_key.json\"\n```\n\nAlternatively, to specify the key directly (useful in a PaaS environment), paste in the contents of the JSON file's `private_key` field **directly and exactly**, in quotes, and **rendering actual line breaks** (not `\\n`'s) (below example has been sanitized):\n\n```\nexport ANALYTICS_KEY=\"-----BEGIN PRIVATE KEY-----\n[contents of key]\n-----END PRIVATE KEY-----\n\"\n```\n\nIf you have multiple accounts for a profile, you can set the `ANALYTICS_CREDENTIALS` variable with a JSON encoded array of those credentials and they'll be used to authorize API requests in a round-robin style.\n\n```\nexport ANALYTICS_CREDENTIALS='[\n  {\n    \"key\": \"-----BEGIN PRIVATE KEY-----\\n[contents of key]\\n-----END PRIVATE KEY-----\",\n    \"email\": \"email_1@example.com\"\n  },\n  {\n    \"key\": \"-----BEGIN PRIVATE KEY-----\\n[contents of key]\\n-----END PRIVATE KEY-----\",\n    \"email\": \"email_2@example.com\"\n  }\n]'\n```\n\n* Make sure your computer or server is syncing its time with the world over NTP. Your computer's time will need to match those on Google's servers for the authentication to work.\n\n### AWS\n\nTo configure the app for publishing data to S3 set the following environment variables:\n\n```\nexport AWS_REGION=us-east-1\nexport AWS_ACCESS_KEY_ID=[your-key]\nexport AWS_SECRET_ACCESS_KEY=[your-secret-key]\nexport AWS_BUCKET=[your-bucket]\nexport AWS_BUCKET_PATH=[your-path]\nexport AWS_CACHE_TIME=0\n```\n\nThere are cases where you want to use a custom object storage server compatible with Amazon S3 APIs, like [minio](https://github.com/minio/minio), in that specific case you should set an extra env variable:\n\n```\nexport AWS_S3_ENDPOINT=http://your-storage-server:port\n```\n\n### Egress proxy config\n\nThe application can be configured to use an egress proxy for HTTP calls which are external to the application's running environment.\nTo configure the app to use an egress proxy, set the following environment variables:\n\n```\nexport PROXY_FQDN=[The fully qualified domain of your proxy server]\nexport PROXY_PORT=[The port for the proxy server]\nexport PROXY_USERNAME=[The username to use for proxy requests]\nexport PROXY_PASSWORD=[The password to use for proxy requests]\n```\n\n## Usage\n\nReports are created and stored by various methods by the consumer process.  Messages\ndescribing the reports are created by the publisher process which runs jobs at intervals.\n\nThe publishing jobs pass options to the main `runQueuePublish` function in `index.js`\nExample:\n\n```javascript\n// jobs/some_job.js\nconst { runQueuePublish } = require(\"../index.js\");\nconst options = {\n  json: true,\n  agenciesFile: `../deploy/agencies.json`,\n};\n\n(async () =\u003e {\n  await runQueuePublish(options);\n})();\n```\n\nThis will create a message on the queue for every report, in sequence, for all\nagencies defined in `../deploy/agencies.json`.  Consumer processes will print out\nthe resulting report JSON to STDOUT for each message.\n\nA report might look something like this:\n\n```javascript\n{\n  \"name\": \"devices\",\n  \"frequency\": \"daily\",\n  \"slim\": true,\n  \"query\": {\n    \"dimensions\": [\n      {\n        \"name\": \"date\"\n      },\n      {\n        \"name\": \"deviceCategory\"\n      }\n    ],\n    \"metrics\": [\n      {\n        \"name\": \"sessions\"\n      }\n    ],\n    \"dateRanges\": [\n      {\n        \"startDate\": \"30daysAgo\",\n        \"endDate\": \"yesterday\"\n      }\n    ],\n    \"orderBys\": [\n      {\n        \"dimension\": {\n          \"dimensionName\": \"date\"\n        },\n        \"desc\": true\n      }\n    ]\n  },\n  \"meta\": {\n    \"name\": \"Devices\",\n    \"description\": \"30 days of desktop/mobile/tablet visits for all sites.\"\n  }\n  \"data\": [\n    {\n      \"date\": \"2023-12-25\",\n      \"device\": \"mobile\",\n      \"visits\": \"13681896\"\n    },\n    {\n      \"date\": \"2023-12-25\",\n      \"device\": \"desktop\",\n      \"visits\": \"5775002\"\n    },\n    {\n      \"date\": \"2023-12-25\",\n      \"device\": \"tablet\",\n      \"visits\": \"367039\"\n    },\n   ...\n  ],\n  \"totals\": {\n    \"visits\": 3584551745,\n    \"devices\": {\n      \"mobile\": 2012722956,\n      \"desktop\": 1513968883,\n      \"tablet\": 52313579,\n      \"smart tv\": 5546327\n    }\n  },\n  \"taken_at\": \"2023-12-26T20:52:50.062Z\"\n}\n```\n\n### Options\n\n* `output` - (string) Write the report result to the directory path provided in\nthe string. Report files will be named with the name in the report configuration.\n* `publish` - (boolean) If true, publish reports to an S3 bucket. Requires AWS\nenvironment variables set as described above.\n* `write-to-database` - (boolean) If true, write data to a database. Requires a\npostgres configuration to be set in environment variables.\n* `only` - (string) Only run one specific report matching the name provided in\nthe string.\n* `slim` - (boolean) Where supported, use totals only (omit the `data` array).\nOnly applies to JSON format, and reports where `\"slim\": true`.\n* `csv` - (boolean) Formats reports as CSV format. Multiple formats can be set.\n* `json` - (boolean) Formats reports as JSON format. Multiple formats can be set.\n* `frequency` - (string) Run only reports with 'frequency' value matching the\nprovided string.\n* `debug` - (boolean) Print debug details on STDOUT.\n* agenciesFile - (string) The path to a JSON file describing an array of objects\nwith the GA property to use for reporting queries and the internal name of the\nagency. Reports will be run for all agency configuration objects in the file.\n\n## Saving data to postgres\n\nThe analytics reporter can write data it pulls from Google Analytics to a\nPostgres database when the `write-to-database` option is set. The postgres\nconfiguration can be set using environment variables:\n\n```bash\nexport POSTGRES_HOST = \"my.db.host.com\"\nexport POSTGRES_USER = \"postgres\"\nexport POSTGRES_PASSWORD = \"123abc\"\nexport POSTGRES_DATABASE = \"analytics\"\n```\n\nThe database expects a particular schema which will be described in the [API\nserver](https://github.com/18f/analytics-reporter-api) that consumes and publishes this data.\n\n## Cloud.gov setup\n\nThe application requires an S3 bucket and RDS instance running a Postgres database\nsetup in cloud.gov as services.\n\nExamples below use the Cloudfoundry CLI.\n\n```bash\n# Create and bind an S3 bucket service to the app\ncf create-service s3 basic-public analytics-s3\ncf bind-service analytics-reporter-consumer analytics-s3\n\n# Create a RDS Postgres service for use by the app\ncf create-service aws-rds small-psql analytics-reporter-database\n\n# Connect to the database, enable pgcrypto extension, and create a new database\n# for the PgBoss message queue library\ncf connect-to-service -no-client analytics-develop analytics-reporter-database-develop\npsql -h localhost -p \u003cport\u003e -U \u003cusername\u003e -d \u003cdatabase\u003e\n`CREATE EXTENSION IF NOT EXISTS \"pgcrypto\";`\n`\\dx` # check installed extension to ensure pgcrypto exists now.\n`CREATE DATABASE \u003cmessage_queue_database_name\u003e;`\n\n# Bind the database to both the publisher and consumer apps\ncf bind-service analytics-reporter-publisher analytics-reporter-database\ncf bind-service analytics-reporter-consumer analytics-reporter-database\n\n# Database migrations for the reporter's analytics database are handled by the\n# analytics-reporter-api application. Deploy the API server via CI to migrate\n# the database.\n\n# Remove public egress permissions from the space running the application if it has them\ncf unbind-security-group public_networks_egress gsa-opp-analytics analytics-dev --lifecycle running\n\n# Create a network policy in the application's space which allows communication to the egress proxy which runs in a space with public egress permissions\ncf add-network-policy analytics-reporter-consumer analytics-egress-proxy -s analytics-public-egress -o gsa-opp-analytics --protocol tcp --port 8080\n\n\n# Create a network policy in the public-egress space which allows communication from the egress proxy back to the application.\n# The port for each API call the app makes is determined randomly, so allow the full range of port numbers.\ncf target -s analytics-public-egress\ncf add-network-policy analytics-egress-proxy analytics-reporter-consumer -s analytics-dev -o gsa-opp-analytics --protocol tcp --port 1-65535\n```\n\n## Public domain\n\nThis project is in the worldwide [public domain](LICENSE.md). As stated in [CONTRIBUTING](CONTRIBUTING.md):\n\n\u003e This project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the [CC0 1.0 Universal public domain dedication](https://creativecommons.org/publicdomain/zero/1.0/).\n\u003e\n\u003e All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F18F%2Fanalytics-reporter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F18F%2Fanalytics-reporter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F18F%2Fanalytics-reporter/lists"}