{"id":13641233,"url":"https://github.com/basin-etl/basin","last_synced_at":"2025-04-20T07:32:33.488Z","repository":{"id":39877709,"uuid":"246944190","full_name":"basin-etl/basin","owner":"basin-etl","description":"Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser","archived":false,"fork":false,"pushed_at":"2023-01-05T12:29:11.000Z","size":7425,"stargazers_count":35,"open_issues_count":45,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-11-09T11:37:41.899Z","etag":null,"topics":["emr","etl","hadoop","informatica","odi","pipeline","pyspark","spark"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/basin-etl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-03-12T22:48:50.000Z","updated_at":"2024-07-25T15:55:53.000Z","dependencies_parsed_at":"2023-02-04T04:34:14.114Z","dependency_job_id":null,"html_url":"https://github.com/basin-etl/basin","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basin-etl%2Fbasin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basin-etl%2Fbasin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basin-etl%2Fbasin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basin-etl%2Fbasin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/basin-etl","download_url":"https://codeload.github.com/basin-etl/basin/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249864313,"owners_count":21336724,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["emr","etl","hadoop","informatica","odi","pipeline","pyspark","spark"],"created_at":"2024-08-02T01:01:19.028Z","updated_at":"2025-04-20T07:32:33.136Z","avatar_url":"https://github.com/basin-etl.png","language":"TypeScript","funding_links":[],"categories":["Data Pipeline ETL Frameworks","Data Pipeline"],"sub_categories":[],"readme":"# Basin\n\nExtract, transform, load using visual programming that can run Spark jobs on any environment\n\nCreate and debug from your browser and export into pure python code!\n\n![Basin screenshot](https://github.com/zalmane/superglue-ui/blob/master/doc/basin_screenshot.png?raw=true)\n\n# Features\n\n- Up and running as simple as `docker pull`\n\n- Create complex pipelines and flows using drag and drop\n\n- Debug and preview step by step\n\n- Integrated dataview grid viewer for easier debugging\n\n- Auto-generates comments so you don't have to\n\n- Export to beautiful, pure python code\n\n- Build artifacts for AWS Glue deployment (Work in progress)\n\n# Install\n\n## Install from dockerhub\n`$ docker pull zalmane/basin:latest`\n\n### Create data folder\n\n`$ mkdir data`\nThis is the folder that will hold all input and output files\n\n### Run image\nRun image mapping data directory to your local environment. This is where input/output goes (extract and load)\n\n`docker run --rm -d -v $PWD/data:/opt/basin/data --name basin_server -p 3000:3000 zalmane/basin:latest`\n\nThat's it. Point your browser to [http://localhost:3000](http://localhost:3000) and you're done!\n\nNotes:\n- Metadata is stored in the browser's indexeddb.\n\n## Install from source\n### Install dev environment with docker\n```\ndocker-compose up\n```\n\nThis will set up 2 containers: `basin-client` and `basin-server`\n\nThat's it. Point your browser to [http://localhost:8860](http://localhost:8860) and you're done!\n\n\nTo run npm commands in the basin-client container use:\n```\ndocker exec basin-client npm \u003ccommand\u003e\n```\n\nTo update changes in py files (block templates, lib), use:\n```\ndocker exec basin-client npm run build-py\n```\n\n# Getting started\n\n## Creating sources\nA source defines the information needed to parse and import a dataset. Sources are referenced when using an *Extract* block.\nThe source defines the following information:\n- type of file (delimited, fixed width, json, parquet)\n- regular expression to match when identifying the file. This will match against the file name\n- information about headers and footers\n- specific metadata based on type of file (for csv includes the delimiter etc)\n\n## Creating a flow\n\n## Running and debugging a flow\n\n## Exporting to python code\n\n# Configuration\n\n# Extending\n## Creating new block types\n\nEach block type consists of:\n\n- Descriptor json\n- code template\n- optional code library template\n- Properties panel\n\n### Descriptor\n### Code template\n### Ccode library template\n### Properties panel\n\n# License\n\nThis program is free software: you can redistribute it and/or modify it under the terms of the Server Side Public License, version 1, as published by MongoDB, Inc. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the Server Side Public License for more details. You should have received a copy of the Server Side Public License along with this program. If not, see \u003chttp://www.mongodb.com/licensing/server-side-public-license\u003e\n\nCopyright © 2018-2020 G.M.M Ltd.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbasin-etl%2Fbasin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbasin-etl%2Fbasin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbasin-etl%2Fbasin/lists"}