{"id":20109858,"url":"https://github.com/hyper63/etl-template","last_synced_at":"2025-03-02T18:24:55.327Z","repository":{"id":110718897,"uuid":"327938827","full_name":"hyper63/etl-template","owner":"hyper63","description":null,"archived":false,"fork":false,"pushed_at":"2021-01-08T15:25:29.000Z","size":20,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-13T05:41:40.242Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hyper63.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-08T15:22:44.000Z","updated_at":"2021-01-08T15:25:32.000Z","dependencies_parsed_at":null,"dependency_job_id":"04637838-f50b-4d1b-b148-c2a260687833","html_url":"https://github.com/hyper63/etl-template","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyper63%2Fetl-template","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyper63%2Fetl-template/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyper63%2Fetl-template/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyper63%2Fetl-template/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hyper63","download_url":"https://codeload.github.com/hyper63/etl-template/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241550090,"owners_count":19980648,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-13T18:09:40.242Z","updated_at":"2025-03-02T18:24:55.318Z","avatar_url":"https://github.com/hyper63.png","language":"JavaScript","readme":"# hyper63 ETL Job Template\n\nThis is a template project for creating an ETL Job.\n\nYou can create a new ETL job from this template by running the following\ncommand:\n\n```\nnpx bam github:hyper63/etl-template [your-jobname]\n```\n\nThis will create a new directory based on your jobname\n\nThen you will want to cd into the directory and open `app.arc` in your\neditor. You will want to modify the name of the app.\n\n```\n@app\n[your-jobname]\n\n@scheduled\netl\n  rate 1 day\n  src src\n\n```\n\n\u003e NOTE: It is very important to change the name of your app, also change the rate of your scheduled ETL Job in this file if different than daily.\n\n## Developer Setup Requirements\n\n\u003e NOTE: nodejs and aws-cli are required see https://nodejs.org and https://aws.amazon.com/cli/\n\n```\ncd src\nnpm install -g @architect/architect\nnpm install\n```\n\n## Setting up Environment Variables\n\nThis tempate is setup to leverage environment variables for the job specialized configuration information.\n\n```\nSOURCE_URL\nSOURCE_TOKEN\nTARGET_URL\nTARGET_TOKEN\n```\n\nThere may be more config params based on the source endpoint you need to specify. Using the `arc env` cli command you can set these variables in aws.\n\n```\narc env production KEY value\n```\n\n## Project Structure\n\n```\n- src\n  - lib\n    index.js - ETL Pipeline Details\n    index_test.js - Pipeline Test\n    utils.js - Async Utils\n    get-data.js - example source/extract function\n    put-stats.js - example target/load function\n  index.js - main handler for scheduled events\n  package.json - manifest file\n  config.arc - architect aws lambda config\nREADME.md\napp.arc - architect aws app config\n```\n\n## About the ETL Job Code \n\nThe ETL Job is broken out into three distinct functions: Extract, Transform and Load. In the `lib/index.js` file you can see each function defined with some sample code for each. \n\n### Extract Function\n\nThe Extract function is responsible for getting all of the data from the source or sources. In this template, there is a `lib/get-data.js` that shows an example of how to get the data, this function takes an `object` and returns an `Async` which is like a promise but gets lazy loaded so that the caller can control when the async call will occur. You can `map` and `chain` on the `Async` object, if you map, the value you return will be placed in the `Async` and if you `chain` you must return a new `Async` object.\n\n### Transform Function\n\nThe transform function takes `data` and then returns an `AsyncReader` which wraps around a value. The easiest way to work with the transform function is this pattern:\n\n```\nexports.transform = data =\u003e AsyncReader.of(\n  ...do stuff..\n)\n```\n\nThen you can map over the data an create any modifications or changes to the data you want.\n\n### Load function\n\nThe load function is very similar to the extract function, you will get data as your argument and you will want to put each item in the data to the data warehouse. For the most part, if you are using `hyper63` as your data warehouse, you should not have to modify the load function, it should just work, as long as your data is ready to go and your target is properly setup.\n\n\n## Testing\n\nThe easiest way to test in a development environment is with `fetch-mock`, which allows you to simulate exactly what the api servers will return back and it can allow your job to react to it. This allows you to focus on your code and your patterns.\n\nThis template has a test setup and ready to run, you can find it `lib/index_test.js`, if you look at the file, you can see that it has a fetchMock setup for two endpoints and the actual test routine should look very similar to the handler code that is getting invoked. You can use this test script to verify your code is properly running each step.\n\n## Deployment\n\nNow you have tested locally, you are ready to deploy, make sure you have the right region and profile set.\n\n```\nexport AWS_PROFILE=default\nexport AWS_REGION=us-east-1\n```\n\nThen you will want to make sure you are in the project root directory.\n\n```\ncat app.arc\n```\n\n\u003e NOTE: if you do not see the app.arc file you are not in the project root directory\n\nThen run\n\n```\narc deploy --production\n```\n\n## Monitoring\n\nNow that you are up and running, you may want to check out your jobs progress or any errors that may be happening.\n\nYou can access the logs for your deployment\n\n```\narc logs production src\n```\n\n## Misc\n\nYou can make changes and then deploy often, it will replace the existing jobs.\n\n## Destroy Job\n\n```\narc destroy --production --name your-jobname\n```\n\nThis command will destroy the job from aws and remove all traces.\n\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhyper63%2Fetl-template","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhyper63%2Fetl-template","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhyper63%2Fetl-template/lists"}