{"id":13696106,"url":"https://github.com/easysql/easy_sql","last_synced_at":"2025-05-16T11:04:04.058Z","repository":{"id":37098960,"uuid":"484457292","full_name":"easysql/easy_sql","owner":"easysql","description":"A library developed to ease the data ETL development process.","archived":false,"fork":false,"pushed_at":"2025-03-31T19:27:53.000Z","size":48041,"stargazers_count":134,"open_issues_count":7,"forks_count":28,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-05-09T14:15:44.162Z","etag":null,"topics":["clickhouse","etl","postgres","postgresql","python","spark","sql"],"latest_commit_sha":null,"homepage":"https://easy-sql.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/easysql.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-22T14:09:07.000Z","updated_at":"2025-02-10T03:42:58.000Z","dependencies_parsed_at":"2023-01-31T02:15:38.719Z","dependency_job_id":"fdf02d51-c65b-41a6-81cd-558d1830be6d","html_url":"https://github.com/easysql/easy_sql","commit_stats":{"total_commits":536,"total_committers":24,"mean_commits":"22.333333333333332","dds":0.5671641791044777,"last_synced_commit":"b568542617942f347579ff872d976fd2175aa071"},"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/easysql%2Feasy_sql","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/easysql%2Feasy_sql/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/easysql%2Feasy_sql/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/easysql%2Feasy_sql/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/easysql","download_url":"https://codeload.github.com/easysql/easy_sql/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254518384,"owners_count":22084374,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clickhouse","etl","postgres","postgresql","python","spark","sql"],"created_at":"2024-08-02T18:00:36.340Z","updated_at":"2025-05-16T11:04:04.037Z","avatar_url":"https://github.com/easysql.png","language":"Python","readme":"# Easy SQL\n\nEasy SQL is built to ease the data ETL development process.\nWith Easy SQL, you can develop your ETL in SQL in an imperative way.\nIt defines a few simple syntax on top of standard SQL, with which SQL could be executed one by one.\nEasy SQL also provides a processor to handle all the new syntax.\nSince this is SQL agnostic, any SQL engine could be plugged-in as a backend.\nThere are built-in support for several popular SQL engines, including SparkSQL, PostgreSQL, Clickhouse, FlinkSQL, Aliyun Maxcompute, Google BigQuery.\nMore will be added in the near future.\n\n- Docs: \u003chttps://easy-sql.readthedocs.io/\u003e\n- Enterprise extended product: \u003chttps://data-workbench.com/\u003e\n\n[![GitHub Action Build](https://github.com/easysql/easy_sql/actions/workflows/build.yaml/badge.svg?branch=main\u0026event=push)](https://github.com/easysql/easy_sql/actions/workflows/build.yaml?query=branch%3Amain+event%3Apush)\n[![Docs Build](https://readthedocs.org/projects/easy-sql/badge/?version=latest)](https://easy-sql.readthedocs.io/en/latest/?badge=latest)\n[![EasySQL Coverage](https://codecov.io/gh/easysql/easy_sql/branch/main/graph/badge.svg)](https://codecov.io/gh/easysql/easy_sql)\n[![PyPI](https://img.shields.io/pypi/v/easy-sql-easy-sql)](https://pypi.org/project/easy-sql-easy-sql/)\n\n## Install Easy SQL\n\nInstall Easy SQL using pip: `python3 -m pip install 'easy-sql-easy-sql[extra,extra]'`\n\nCurrently we are providing below extras, choose according to your need:\n- cli\n- linter\n- spark\n- pg\n- clickhouse\n\nWe also provide flink backend, but because of dependency confliction between pyspark and apache-flink, you need to install the flink backend dependencies manually with the following command `python3 -m pip install apache-flink`.\n\nUsually we read data from some data source and write data to some other system using flink with different connectors. So we need to download some jars for the used connectors as well. Refer [here](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/overview/) to get more information and [here](https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/downloads/) to download the required connectors.\n\n## Building Easy SQL\n\nInternally we use `poetry` to manage the dependencies. So make sure you have [installed it](https://python-poetry.org/docs/master/#installation). Package could be built with the following make command: `make package-pip` or just `poetry build`.\n\nAfter the above command, there will be a file named `easy_sql*.whl` generated in the `dist` folder.\nYou can install it with command `python3 -m pip install dist/easy_sql*.whl[extra]` or just `poetry install -E 'extra extra'`.\n\n## First ETL with Easy SQL\n\nInstall easy_sql with spark as the backend: `python3 -m pip install 'easy-sql-easy-sql[spark,cli]'`.\n\n### For spark backend\n\nCreate a file named `sample_etl.spark.sql` with content as below:\n\n```sql\n-- prepare-sql: drop database if exists sample cascade\n-- prepare-sql: create database sample\n-- prepare-sql: create table sample.test as select 1 as id, '1' as val\n\n-- target=variables\nselect true as __create_output_table__\n\n-- target=variables\nselect 1 as a\n\n-- target=log.a\nselect '${a}' as a\n\n-- target=log.test_log\nselect 1 as some_log\n\n-- target=check.should_equal\nselect 1 as actual, 1 as expected\n\n-- target=temp.result\nselect\n    ${a} as id, ${a} + 1 as val\nunion all\nselect id, val from sample.test\n\n-- target=output.sample.result\nselect * from result\n\n-- target=log.sample_result\nselect * from sample.result\n```\n\nRun it with command:\n\n```bash\nbash -c \"$(python3 -m easy_sql.data_process -f sample_etl.spark.sql -p)\"\n```\n\n### For postgres backend:\n\nYou need to start a postgres instance first.\n\nIf you have docker, run the command below:\n\n```bash\ndocker run -d --name postgres -p 5432:5432 -e POSTGRES_PASSWORD=123456 postgres\n```\n\nCreate a file named `sample_etl.postgres.sql` with content as the test file [here](https://github.com/easysql/easy_sql/blob/main/test/sample_etl.postgres.sql).\n\nMake sure that you have install the corresponding backend with `python3 -m pip install 'easy-sql-easy-sql[cli,pg]'`\n\nRun it with command:\n\n```bash\nPG_URL=postgresql://postgres:123456@localhost:5432/postgres python3 -m easy_sql.data_process -f sample_etl.postgres.sql\n```\n\n### For clickhouse backend:\n\nYou need to start a clickhouse instance first.\n\nIf you have docker, run the command below:\n\n```bash\ndocker run -d --name clickhouse -p 9000:9000 yandex/clickhouse-server:20.12.5.18\n```\n\nCreate a file named `sample_etl.clickhouse.sql` with content as the test file [here](https://github.com/easysql/easy_sql/blob/main/test/sample_etl.clickhouse.sql).\n\nMake sure that you have install the corresponding backend with `python3 -m pip install 'easy-sql-easy-sql[cli,clickhouse]'`\n\nRun it with command:\n\n```bash\nCLICKHOUSE_URL=clickhouse+native://default@localhost:9000 python3 -m easy_sql.data_process -f sample_etl.clickhouse.sql\n```\n\n### For flink backend:\n\nBecause of dependency conflictions between pyspark and apache-flink, you need to install flink manually with command `python3 -m pip install apache-flink`.\n\nAfter the installation, you need to add flink commands directory to PATH environment variable to make flink commands discoverable by bash. To do it, execute the commands below:\n\n```bash\nexport FLINK_HOME=$(python3 -m pyflink.find_flink_home)\nexport PATH=$FLINK_HOME/bin:$PATH\nexport PYFLINK_CLIENT_EXECUTABLE=python3  # Set Python interpreter for flink client.\n```\n\nYou can add these commands to your `.bashrc` or `.zshrc` file for convenience.\n\nSince there are many connectors for flink, you need to choose which connector to use before starting.\n\nAs an example, if you want to read or write data to postgres, then you need to start a postgres instance first.\n\nIf you have docker, run the command below:\n\n```bash\ndocker run -d --name postgres -p 5432:5432 -e POSTGRES_PASSWORD=123456 postgres\n```\n\nDownload the required jars as below:\n\n```bash\nmkdir -pv test/flink/jars\nwget -P test/flink/jars https://repo1.maven.org/maven2/org/apache/flink/flink-connector-jdbc/1.15.1/flink-connector-jdbc-1.15.1.jar\nwget -P test/flink/jars https://repo1.maven.org/maven2/org/postgresql/postgresql/42.2.14/postgresql-42.2.14.jar\n```\n\nCreate a file named `sample_etl.flink.postgres.sql` with content as the test file [here](https://github.com/easysql/easy_sql/blob/main/test/sample_etl.flink.postgres.sql).\n\nCreate a connector configuration file named `sample_etl.flink_tables_file.yml` with content as the test configuration file [here](https://github.com/easysql/easy_sql/blob/main/test/sample_etl.flink_tables_file.yml).\n\nRun it with command:\n\n```bash\nbash -c \"$(python3 -m easy_sql.data_process -f sample_etl.flink.postgres.sql -p)\"\n```\n\nThere are a few other things to know about flink, click [here](https://easy-sql.readthedocs.io/en/latest/easy_sql/backend/flink.html) to get more information.\n\n### For other backends:\n\nThe usage is similar, please refer to API doc [here](https://easy-sql.readthedocs.io/en/latest/autoapi/easy_sql/sql_processor/backend/index.html).\n\n## Run ETL in your code\n\nEasy SQL can be used as a very light-weight library. If you'd like to run ETL programmatically in your code.\nPlease refer to the code snippets below:\n\n```python\nfrom pyspark.sql import SparkSession\n\nfrom easy_sql.sql_processor import SqlProcessor\nfrom easy_sql.sql_processor.backend import SparkBackend\n\nif __name__ == '__main__':\n    spark = SparkSession.builder.enableHiveSupport().getOrCreate()\n    backend = SparkBackend(spark)\n    sql = '''\n-- target=log.some_log\nselect 1 as a\n    '''\n    sql_processor = SqlProcessor(backend, sql)\n    sql_processor.run()\n```\n\nMore sample code about other backends could be referred [here](https://github.com/easysql/easy_sql/blob/main/test/sample_data_process.py)\n\n## Debugging ETL\n\nWe recommend debugging ETLs from jupyter. You can follow the steps below to start debugging your ETL.\n\n1. Install jupyter first with command `python3 -m pip install jupyterlab`.\n\n2. Create a file named `debugger.py` with contents like below:\n\nA more detailed sample could be found [here](https://github.com/easysql/easy_sql/blob/main/debugger.py).\n\n```python\nfrom typing import Dict, Any\n\ndef create_debugger(sql_file_path: str, vars: Dict[str, Any] = None, funcs: Dict[str, Any] = None):\n    from pyspark.sql import SparkSession\n    from easy_sql.sql_processor.backend import SparkBackend\n    from easy_sql.sql_processor_debugger import SqlProcessorDebugger\n    spark = SparkSession.builder.enableHiveSupport().getOrCreate()\n    backend = SparkBackend(spark)\n    debugger = SqlProcessorDebugger(sql_file_path, backend, vars, funcs)\n    return debugger\n\n```\n\n3. Create a file named `test.sql` with contents as [here](https://github.com/easysql/easy_sql/blob/main/test/sample_etl.spark.sql).\n\n4. Then start jupyter lab with command: `jupyter lab`.\n\n5. Start debugging like below:\n\n![ETL Debugging](https://raw.githubusercontent.com/easysql/easy_sql/main/debugger-usage.gif)\n\n## ETL Language support\n\nWe've created an extension for VS Code to ease the development of ETL in Easy SQL. A bunch of language features are provided, e.g. syntax highlight, code completion, diagnostics features etc. You can search `Easy SQL` in extension marketplace, or click [here](https://marketplace.visualstudio.com/items?itemName=EasySQL.easysql\u0026ssr=false#overview) to get more information.\n\nWe recommended to install the extension to develop ETL in Easy SQL.\n\n## Contributing\n\nPlease submit PR.\n","funding_links":[],"categories":["Python","Integrations"],"sub_categories":["ETL and Data Processing"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feasysql%2Feasy_sql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feasysql%2Feasy_sql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feasysql%2Feasy_sql/lists"}