https://github.com/openaq/openaq-lcs-fetch
Perform ETL into the OpenAQ Low Cost Sensor Database
https://github.com/openaq/openaq-lcs-fetch
Last synced: 16 days ago
JSON representation
Perform ETL into the OpenAQ Low Cost Sensor Database
- Host: GitHub
- URL: https://github.com/openaq/openaq-lcs-fetch
- Owner: openaq
- License: mit
- Created: 2020-09-18T14:56:33.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2025-04-19T01:26:26.000Z (30 days ago)
- Last Synced: 2025-04-19T10:46:33.884Z (29 days ago)
- Language: JavaScript
- Size: 901 KB
- Stars: 3
- Watchers: 4
- Forks: 2
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
OpenAQ-ETL
Perform ETL into the OpenAQ Low Cost Sensor Database
## Deploy
- `yarn cdk deploy` deploy this stack to your default AWS account/region
- `yarn cdk diff` compare deployed stack with current state
- `yarn cdk synth` emits the synthesized CloudFormation template## Development
Javascript Documentation can be obtained by running the following
```
yarn doc
```Tests can be run using the following
```
yarn test
```### Env Variables
Configuration for the ingestion is provided via environment variables.
- `BUCKET`: The bucket to which the ingested data should be written. **Required**
- `SOURCE`: The [data source](#data-sources) to ingest. **Required**
- `LCS_API`: The API used when fetching supported measurands. _Default: `'https://api.openaq.org'`_
- `STACK`: The stack to which the ingested data should be associated. This is mainly used to apply a prefix to data uploaded to S3 in order to separate it from production data. _Default: `'local'`_
- `SECRET_STACK`: The stack to which the used [Secrets](#provider-secrets) are associated. At times, a developer may want to use credentials relating to a different stack (e.g. a devloper is testing the script, they want output data uploaded to the `local` stack but want to use the production stack's secrets). _Default: the value from the `STACK` env variable_
- `VERBOSE`: Enable verbose logging. _Default: disabled_### Running locally
To run the ingestion script locally (useful for testing without deploying), see the following example:
```sh
LCS_API=https://api.openaq.org \
STACK=my-dev-stack \
SECRET_STACK=my-prod-stack \
BUCKET=openaq-fetches \
VERBOSE=1 \
SOURCE=habitatmap \
node fetcher/index.js
```## Data Sources
Data Sources can be configured by adding a config file & corresponding provider script. The two sections below
outline what is necessary to create and a new source.### Source Config
The first step for a new source is to add JSON config file to the the `fetcher/sources` directory.
```json
{
"schema": "v1",
"provider": "example",
"frequency": "hour",
"meta": {}
}
```| Attribute | Note |
| ----------- | -------------------------- |
| `provider` | Unique provider name |
| `frequency` | `day`, `hour`, or `minute` |The config file can contain any properties that should be configurable via the
provider script. The above table however outlines the attributes that are required.### Provider Script
The second step is to add a new provider script to the `fetcher/providers` directory.
The script here should expose a function named `processor`. This function should pass
`SensorSystem` & `Measures` objects to the `Providers` class.The script below is a basic example of a new source:
```js
const Providers = require("../providers");
const { Sensor, SensorNode, SensorSystem } = require("../station");
const { Measures, FixedMeasure, MobileMeasure } = require("../measure");async function processor(source_name, source) {
// Get Locations/Sensor Systems via http/s3 etc.
const locs = await get_locations();// Map locations into SensorNodes
const station = new SensorNode();await Providers.put_stations(source_name, [station]);
const fixed_measures = new Measures(FixedMeasure);
// or
const mobile_measures = new Measures(MobileMeasure);fixed_measures.push(
new FixedMeasure({
sensor_id: "PurpleAir-123",
measure: 123,
timestamp: Math.floor(new Date() / 1000), //UNIX Timestamp
})
);await Providers.put_measures(source_name, fixed_measures);
}module.exports = { processor };
```### Provider Secrets
For data providers that require credentials, credentials should be store on AWS Secrets Manager with an ID composed of the stack name and provider name, such as `:stackName/:providerName`.
#### Google Keys
Some providers (e.g. CMU, Clarity) require us to read data from Google services (e.g. Drive, Sheets). To do this, the organization hosting the data should do the following:
1. [create a project & enable access to the required APIs](https://developers.google.com/workspace/guides/create-project)
1. [create a service account](https://cloud.google.com/iam/docs/creating-managing-service-accounts)
1. [generate service account keys](https://cloud.google.com/iam/docs/creating-managing-service-account-keys)The should look something like the following and be stored in its entirety within the AWS Secrets Manager.
```json
{
"type": "service_account",
"project_id": "project-id",
"private_key_id": "key-id",
"private_key": "-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n",
"client_email": "service-account-email",
"client_id": "client-id",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://accounts.google.com/o/oauth2/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account-email"
}
```