{"id":20109868,"url":"https://github.com/hyper63/etl-template2","last_synced_at":"2025-11-28T06:08:03.491Z","repository":{"id":110718905,"uuid":"333942456","full_name":"hyper63/etl-template2","owner":"hyper63","description":null,"archived":false,"fork":false,"pushed_at":"2021-01-28T21:54:57.000Z","size":17,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-13T05:41:41.805Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hyper63.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-28T20:58:55.000Z","updated_at":"2021-01-28T21:54:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"36a8c021-b26b-4e30-b19d-371a857e1920","html_url":"https://github.com/hyper63/etl-template2","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyper63%2Fetl-template2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyper63%2Fetl-template2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyper63%2Fetl-template2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyper63%2Fetl-template2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hyper63","download_url":"https://codeload.github.com/hyper63/etl-template2/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241550100,"owners_count":19980648,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-13T18:09:41.882Z","updated_at":"2025-11-28T06:08:03.458Z","avatar_url":"https://github.com/hyper63.png","language":"JavaScript","readme":"# hyper63 ETL Jobs\n\nAn ETL job is the process of extracting data from and source, modifying it and loading the data into a target. \nThis document describes the ETL approach using architect `arc.codes` and nodejs.\n\n## Developer Machine Requirements\n\n* NodeJS - https://nodejs.org \n* AWS CLI - https://docs.aws.amazon.com/cli/index.html\n* Architect - https://arc.codes\n\n\u003e Follow the install instructions at each of the linked sites\n\n## Configure aws\n\n```\naws configure\n```\n\nAdd your ACCESS KEY and ACCESS SECRET as the default profile\nSet the region to us-east-1\nAnd the format to JSON\n\n## Setup\n\nCreate a new project folder\n\n```\nmkdir foo\ncd foo\n```\n\nIn the project folder create a file called `app.arc`\n\n```\n@app\nfoo\n\n@scheduled\neltoro rate(1 day)\n```\n\n\u003e Under the label `@app` replace foo with the name of your project, and under scheduled place the name \nof the job and the interval you would like to see the job run. [More Info](https://arc.codes)\n\nNow that you have your app file created, you will want to run the `init` command for architect\n\n``` sh\narc init\n```\n\nThis will create a new folder in this case called `src/scheduled/eltoro`, and within that folder is two files:\n\n* config.arc\n* index.js\n\nYou will want to cd into that directory:\n\n```\ncd src/scheduled/eltoro\n```\n\nOpen the config.arc file and add the `timeout 900` line to the file. This will instruct aws to allow \nthe job to run up to 15 minutes if needed. \n\n\u003e Now if your rate interval is less that 15 minutes, you may want to adjust this for your needs.\n\n```\n@aws\nruntime nodejs12.x\ntimeout 900\n```\n\nSave the file.\n\nIn the `index.js` file is where your handler function lives, this is the function that will be invoked\nbased on the scheduled interval. So this is where you want to build your ETL pipeline.\n\nThe basic pipeline will need to do the following things:\n\n* Authenicate with a source endpoint\n* Get Stats Report by date range\n* Transform Stats into target json documents\n* Post JSON Documents to Target\n\nI leverage node modules like:\n\n* node-fetch - for http client\n* zod - for schema validation\n* date-fns for datetime utility\n* ramda for functional utility\n* crocks for pipeline flow\n\nClearly, all of these modules are opinionated and you may choose to use different modules to perform your ETL.\n\nIt is important to initialize the job directory with a package.json\n\ncreate a file called package.json\n\n``` json\n{\n  \"name\": \"myjob\",\n  \"version\": \"1.0\",\n  \"private\": true\n}\n```\n\nThen you can install the npm modules you want to use for this ETL job\n\n```\nnpm install node-fetch ramda date-fns crocks zod@beta\n```\n\nYou can also install development dependencies: For example, I use tape and fetch-mock for testing\n\n```\nnpm install -D tape fetch-mock\n```\n\n### Testing Locally\n\nTo test locally, in your test file, simply require the `index.js` file and invoke the handler function:\n\n``` js\nconst job = require('./index.js')\n\njob.handler()\n```\n\n### Document Structure for Target\n\nWhen using the primal hyper63 data api, you will want to structure your documents in a meaningful and consistently accessible \nway.\n\nI would recommend using the upsert pattern so that you can create an idempotent process, so that it will be impossible to\ncreate duplicate records if the ETL job was run over and over again.\n\n```\nPUT https://api.ignite-board.com/data/[db]/[id]\nContent-Type: application/json\nAuthorization: Bearer [TOKEN]\n\n{\n  \"id\": \"type:stat_timestamp\",\n  \"type\": \"type\",\n  ...\n}\n```\n\nFor example:\n\nType: eltoro\nstat_timestamp: 2020-12-22T02:00:00.000Z\n\n``` json\n{\n  \"id\": \"eltoro:2020-12-22T02:00:00.000Z\",\n  \"type\": \"eltoro\",\n  ...\n}\n```\n\n## Deployment \n\nWith Architect you can deploy your code to a staging environment then a production environment, if deploying to a \nstaging environment make sure your staging environment is not writting out to the production database. You may \nwant to set a flag for the staging enviroment just to log the target information for evaluation purposes.\n\n### Deploying to a staging environment\n\nTo deploy to the staging environment, you would run the following command:\n\n```\narc deploy\n```\n\n### Deploying to a production environment\n\nTo deploy to a production environment you would run the following command:\n\n```\narc deploy --production\n```\n\nThis will take a little time to provision, but once it is up and running you can access the logs via the command line\n\n```\narc logs production src/scheduled/eltoro\n```\n\n### Environment Variables and Secrets\n\nYou will want to store configuration and secret data outside of code base, using `arc env` command you can safely \nstore this information in a secure key value store:\n\n```\narc env production KEY value\n```\n\nExample:\n\n```\narc env production SOURCE_URL https://api-prod.eltoro.com\n```\n\nThen you can access this data using the `process.env` object in NodeJS when the job is running in that environment.\n\n\u003e NOTE: If you have special characters in your value use quotes\n\n```\narc env production SOURCE_URL \"https://api-prod.eltoro.com\"\n```\n\nFor more information: https://arc.codes/docs/en/reference/cli/env\n\n\n\n\n### Fin\n\nA couple of notes, when building ETL Jobs, try to create idempotent writes to the target.\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhyper63%2Fetl-template2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhyper63%2Fetl-template2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhyper63%2Fetl-template2/lists"}