{"id":20208059,"url":"https://github.com/gkampitakis/etl-json-to-sql","last_synced_at":"2026-04-11T15:39:05.628Z","repository":{"id":116153877,"uuid":"411039935","full_name":"gkampitakis/ETL-json-to-SQL","owner":"gkampitakis","description":"Extract data from JSON file, transform it and load it to PostgreSQL","archived":false,"fork":false,"pushed_at":"2021-10-10T11:18:56.000Z","size":62,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-13T21:09:55.580Z","etag":null,"topics":["docker-compose","etl","golang","learning-by-doing","nodejs","postgresql"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gkampitakis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-09-27T20:54:30.000Z","updated_at":"2024-02-07T08:53:51.000Z","dependencies_parsed_at":null,"dependency_job_id":"e7aa0053-4f40-49ea-a6a7-a3b87ad18799","html_url":"https://github.com/gkampitakis/ETL-json-to-SQL","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gkampitakis%2FETL-json-to-SQL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gkampitakis%2FETL-json-to-SQL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gkampitakis%2FETL-json-to-SQL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gkampitakis%2FETL-json-to-SQL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gkampitakis","download_url":"https://codeload.github.com/gkampitakis/ETL-json-to-SQL/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241644554,"owners_count":19996179,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker-compose","etl","golang","learning-by-doing","nodejs","postgresql"],"created_at":"2024-11-14T05:33:56.136Z","updated_at":"2025-12-31T01:05:32.174Z","avatar_url":"https://github.com/gkampitakis.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ETL-json-to-SQL\n\nExtract data from JSON file, transform it and load it to PostgreSQL\n\n## Description\n\nThis repository contains two solutions for \"extracting\", \"transforming\" and \"loading\" data to Postgresql. One solution is written in NodeJS and the second one in Golang.\n\nThe focus is how to load data in a performant way (streaming ??) and after transforming save them in Postgresql (in bulk inserts).\n\n\u003e In NodeJS solution I used multi rows insert, couldn't make the solution with `COPY` work. [pg-copy-streams](https://www.npmjs.com/package/pg-copy-streams)\n\n\u003e In Golang solution I was able to use the `COPY` that Postgresql supports.\n\n### Dataset\n\nThe file used for running the ETL pipeline was taken from [here](https://www.kaggle.com/jasperan/league-of-legends-1v1-matchups-results?select=matchups.json). \nIf it's not available you can also find it in this [Google Drive](https://drive.google.com/file/d/1DTq50VffBrT4NCKAj2gFVhbGhplHG_6w/view?usp=sharing). The default path that the file is loaded from is `./data-to-load` but you can specify another path by setting the env variable `FILE_PATH`.\n\n\u003e Number of records inside the matchups.json 1.312.252\n\nThis is the origin\n\n```json\n{\n  \"p_match_id\":\"TR1_1201957752_top\",\n  \"goldearned\":14425,\n  \"totalminionskilled\":194,\n  \"win\":\"false\",\n  \"kills\":14,\n  \"assists\":5,\n  \"deaths\":7,\n  \"champion\":\"Kassadin\",\n  \"visionscore\":17,\n  \"puuid\":\"phduyQLB8gBjUerFwiVOtyLLHE9jxw7Jq7dwab_CtRddAvzJ7L1uo5kWzLTKSqStAzml_3yGHiNPFA\",\n  \"totaldamagedealttochampions\":33426,\n  \"summonername\":\"Borke\",\n  \"gameversion\":\"11.14.384.6677\"\n}\n```\n\nand we load it in Postgresql as \n\n```sql\nid UUID NOT NULL DEFAULT uuid_generate_v4() PRIMARY KEY,\ngold_earned INTEGER,\nminions_killed INTEGER,\nkda INTEGER,\nchampion VARCHAR,\nvision_score INTEGER,\nsummoner_name VARCHAR,\nwin BOOLEAN,\ngame_version VARCHAR,\ndamage_dealt_to_champions INTEGER,\nlane VARCHAR,\nregion VARCHAR\n```\n\n## Queries\n\nSome queries you can run after loading the data\n\n```sql\n# Get the gold_earned for each lane\n\nSELECT sum(gold_earned),lane \nFROM matchups\nGROUP BY lane\nORDER BY sum DESC;\n```\n\n```sql\n# Get the champions and number of games with the average kda\n\nSELECT count(*) as games,champion,round(AVG(kda),0) as avg_kda \nFROM matchups \nGROUP BY champion \nORDER BY avg_kda DESC;\n```\n\n```sql\n# Get number of unique records in table\n\nSELECT COUNT(*) \nFROM (\n  SELECT DISTINCT * \n  FROM matchups\n  ) as unique_rows;\n```\n\n## Useful commands\n\n__Insert Data to Postgres from CSV:__\n```sql\n\\copy matchups(champion,\ndamage_dealt_to_champions,game_version,gold_earned,win,minions_killed,kda,lane,region,summoner_name,vision_score) from '/usr/log.csv' (FORMAT csv,DELIMITER ',');\n```\n\n__Connect to postgresql container:__\n\n```bash\ndocker exec -it postgres psql -U ETL_user -d ETL_db\n```\n\n## Running the project\n\nFor running etl in both versions\n- you need to have setup the `.env` or providing the correct environmental variables\n- running postgres, in the repo a `docker-compose.yaml` is provided for running postgres. You\n can run it with `make docker-start`\n\n\u003cdetails\u003e\n  \u003csummary\u003eNode\u003c/summary\u003e\n  \n  - Build code `make node-build`\n  - Run code (after building) `make node-run`\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003eGolang\u003c/summary\u003e\n  \n  - Build code `make go-build`\n  - Run code (after building) `make go-run`\n  - Run linter `make go-lint`\n\u003c/details\u003e\n\n## Resources\n\n- [Data Imports](https://github.com/vitaly-t/pg-promise/wiki/Data-Imports) wiki doc for the NodeJS Postgres Driver\n- [Performance Boost](https://github.com/vitaly-t/pg-promise/wiki/Performance-Boost) wiki doc for the NodeJS Postgres Driver\n- [Multi row insert with pg-promise](https://stackoverflow.com/questions/37300997/multi-row-insert-with-pg-promise)\n- [Postgresql - Populating a database](https://www.postgresql.org/docs/current/populate.html)\n- [Golang PGX](https://github.com/jackc/pgx)\n- [Golang Profiling](https://flaviocopes.com/golang-profiling/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgkampitakis%2Fetl-json-to-sql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgkampitakis%2Fetl-json-to-sql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgkampitakis%2Fetl-json-to-sql/lists"}