https://github.com/crazyoptimist/nodejs-etl-poc
NodeJS ETL POC
https://github.com/crazyoptimist/nodejs-etl-poc
Last synced: about 1 year ago
JSON representation
NodeJS ETL POC
- Host: GitHub
- URL: https://github.com/crazyoptimist/nodejs-etl-poc
- Owner: crazyoptimist
- Created: 2023-02-15T19:05:17.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-02-16T07:59:04.000Z (over 3 years ago)
- Last Synced: 2025-02-05T22:59:05.262Z (over 1 year ago)
- Language: TypeScript
- Size: 147 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ETL Processing Test
[](https://github.com/crazyoptimist/nodejs-etl-poc/actions/workflows/build-and-test.yaml)
### Requirements
- Extract JSON objects from files on a local disk
- Transform the extracted objects into a given JSON format
- Save the new objects to files on a local disk.
Example input object
```js
{
"ts": 1234567890, // unix timestamp
"u": "https://www.test.com/products/productA.html?a=5435&b=test#reviews" // a url
"e": [ {list of events} ] // an array of objects, each object represents an event
}
```
Example output object
```js
{
"timestamp": .... // same timestamp as parent
"url_object": { // parsed URL object
"domain": "www.test.com" // domain
"path": "/products/productA.html" // path
"query_object": { // query string object e.g. from ?q1=val1&q2=val2
"a": "5435",
"b": "test",
...
},
"hash": "#reviews" // hash
}
"ec": {original event content}
}
```
### Design
- Source data format is known, one gzip file contains only one JSON object
- Read one gzip at a time
- Perform the transformation
- Buffer the transformed data, buffer capacity 8Kb
- Write the buffered array of transformed objects to a file
- Repeat the process as a pipeline
### Build & Run
```
npm install
npm run build
npm start
```
This applicaiton is to run as a one time job. In a real scenario, the pipeline will be run as a long running job.
### Development
```
npm install
npm run dev
```
### Test
```
npm test
```