{"id":15715285,"url":"https://github.com/crazyoptimist/nodejs-etl-poc","last_synced_at":"2025-03-30T20:16:20.728Z","repository":{"id":65919228,"uuid":"602227043","full_name":"crazyoptimist/nodejs-etl-poc","owner":"crazyoptimist","description":"NodeJS ETL POC","archived":false,"fork":false,"pushed_at":"2023-02-16T07:59:04.000Z","size":151,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-05T22:59:05.262Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/crazyoptimist.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-15T19:05:17.000Z","updated_at":"2023-02-15T19:42:28.000Z","dependencies_parsed_at":null,"dependency_job_id":"c68f17f0-648f-4146-b0af-b30fa3ac4d8a","html_url":"https://github.com/crazyoptimist/nodejs-etl-poc","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crazyoptimist%2Fnodejs-etl-poc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crazyoptimist%2Fnodejs-etl-poc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crazyoptimist%2Fnodejs-etl-poc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crazyoptimist%2Fnodejs-etl-poc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/crazyoptimist","download_url":"https://codeload.github.com/crazyoptimist/nodejs-etl-poc/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246372746,"owners_count":20766635,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-03T21:40:51.980Z","updated_at":"2025-03-30T20:16:20.573Z","avatar_url":"https://github.com/crazyoptimist.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ETL Processing Test\n\n[![build \u0026 test](https://github.com/crazyoptimist/nodejs-etl-poc/actions/workflows/build-and-test.yaml/badge.svg)](https://github.com/crazyoptimist/nodejs-etl-poc/actions/workflows/build-and-test.yaml)\n\n### Requirements\n\n- Extract JSON objects from files on a local disk\n- Transform the extracted objects into a given JSON format\n- Save the new objects to files on a local disk.\n\nExample input object\n\n```js\n{\n  \"ts\": 1234567890,                                                         // unix timestamp\n  \"u\": \"https://www.test.com/products/productA.html?a=5435\u0026b=test#reviews\"  // a url\n  \"e\": [ {list of events} ]                                                 // an array of objects, each object represents an event\n}\n```\n\nExample output object\n\n```js\n{\n  \"timestamp\": ....                   // same timestamp as parent\n  \"url_object\": {                     // parsed URL object\n    \"domain\": \"www.test.com\"          // domain\n    \"path\": \"/products/productA.html\" // path\n    \"query_object\": {                 // query string object e.g. from ?q1=val1\u0026q2=val2\n      \"a\": \"5435\",\n      \"b\": \"test\",\n      ...\n    },\n    \"hash\": \"#reviews\"                // hash\n  }\n  \"ec\": {original event content}\n}\n```\n\n### Design\n\n- Source data format is known, one gzip file contains only one JSON object\n- Read one gzip at a time\n- Perform the transformation\n- Buffer the transformed data, buffer capacity 8Kb\n- Write the buffered array of transformed objects to a file\n- Repeat the process as a pipeline\n\n### Build \u0026 Run\n\n```\nnpm install\nnpm run build\nnpm start\n```\n\nThis applicaiton is to run as a one time job. In a real scenario, the pipeline will be run as a long running job.\n\n### Development\n\n```\nnpm install\nnpm run dev\n```\n\n### Test\n\n```\nnpm test\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcrazyoptimist%2Fnodejs-etl-poc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcrazyoptimist%2Fnodejs-etl-poc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcrazyoptimist%2Fnodejs-etl-poc/lists"}