{"id":25731052,"url":"https://github.com/bodo-ai/pydough","last_synced_at":"2026-05-22T01:09:00.536Z","repository":{"id":274532367,"uuid":"871310318","full_name":"bodo-ai/PyDough","owner":"bodo-ai","description":"Analytics DSL for Python","archived":false,"fork":false,"pushed_at":"2025-05-01T17:25:26.000Z","size":2510,"stargazers_count":13,"open_issues_count":50,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-01T17:48:47.796Z","etag":null,"topics":["analytics","artificial-intelligence","big-data","data-science","defog","defog-ai","machine-learning","pandas","python","sql","text-to-analytics","text-to-sql","tpch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bodo-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-10-11T17:29:56.000Z","updated_at":"2025-04-30T02:20:33.000Z","dependencies_parsed_at":"2025-05-01T17:54:02.940Z","dependency_job_id":null,"html_url":"https://github.com/bodo-ai/PyDough","commit_stats":null,"previous_names":["bodo-ai/pydough"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bodo-ai%2FPyDough","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bodo-ai%2FPyDough/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bodo-ai%2FPyDough/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bodo-ai%2FPyDough/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bodo-ai","download_url":"https://codeload.github.com/bodo-ai/PyDough/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252911582,"owners_count":21824055,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","artificial-intelligence","big-data","data-science","defog","defog-ai","machine-learning","pandas","python","sql","text-to-analytics","text-to-sql","tpch"],"created_at":"2025-02-26T02:28:21.316Z","updated_at":"2026-03-09T19:18:38.746Z","avatar_url":"https://github.com/bodo-ai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PyDough\n\nPyDough is an alternative DSL that can be used to solve analytical problems by phrasing questions in terms of a logical document model instead of translating to relational SQL logic.\n\n## What Is PyDough\n\nPyDough allows expressing analytical questions with hierarchical thinking, as seen in models such as [MongoDB](https://www.mongodb.com/docs/manual/data-modeling/), since that mental model is closer to human linguistics than a relational model.\nUnlike MongoDB, PyDough only uses a logical document model for abstractly explaining \u0026 interacting with data, rather than a physical document model to store the data.\nPyDough code can be written in and interleaved with Python code, and practices a lazy evaluation scheme that does not qualify or execute any logic until requested.\nPyDough executes by translating its logic into SQL which it can directly executing in an arbitrary database.\n\nConsider the following information represented by the tables in a database:\n- There are people; each person has a name, ssn, birth date, records of jobs they have had, and records of schools they have attended.\n- There are employment records; each job record has the ssn of the person being employed, the name of the company, and the total income they made from the job.\n- There are education records; each education record has the ssn of the person attending the school, the name of the school, and the total tuition they paid to that school.\n\nSuppose I want to know for every person their name \u0026 the total income they've made from all jobs minus the total tuition paid to all schools. However, I want to include people who have never had a job or never attended any schools, and I need to account for people who could have had multiple jobs or attended multiple schools.\nThe following PyDough snippet solves this problem:\n\n```py\nresult = People.CALCULATE(\n    name,\n    net_income = SUM(jobs.income_earned) - SUM(schools.tuition_paid)\n)\npydough.to_df(result)\n```\n\nHowever, if answering the question with SQL, I would need to write the following less-intuitive SQL query:\n\n```sql\nSELECT\n    P.name AS name,\n    COALESCE(J.total_income_earned, 0) - COALESCE(S.total_tuition_paid, 0) AS net_income\nFROM PEOPLE AS P\nLEFT JOIN (\n    SELECT person_ssn, SUM(income_earned) AS total_income_earned\n    FROM EMPLOYMENT_RECORDS\n    GROUP BY person_ssn\n) AS J\nON P.ssn = J.person_ssn\nLEFT JOIN (\n    SELECT person_ssn, SUM(tuition_paid) AS total_tuition_paid\n    FROM EDUCATION_RECORDS\n    GROUP BY person_ssn\n) AS S\nON P.ssn = S.person_ssn\n```\n\nInternally, PyDough solves the question by translating the much simpler logical document model logic into SQL, which can be directly executed on a database. Even if the same SQL is generated by PyDough as the example above, all a user needs to worry about is writing the much smaller PyDough code snippet in Python.\n\nCurrently, the main mechanism to execute PyDough code is via Jupyter notebooks with a special cell magic. See the usage guide and demo notebooks for more details.\n\n## Why Build PyDough?\n\nPyDough as a DSL has several benefits over other solutions, both for human use and LLM generation:\n- ORMs still require understanding \u0026 writing SQL, including dealing directly with joins. If a human or AI is bad at writing SQL, they will be just as bad at writing ORM-based code. PyDough, on the other hand, abstracts away joins in favor of thinking about logical relationships between collections \u0026 sub-collections.\n- The complex semantics of aggregation keys, different types of joins, and aggregating before vs after joining are all abstracted away by PyDough. These details require much deeper understanding of SQL semantics than most have time to learn how to do correctly, meaning that PyDough can have a lower learning curve to write correct code for complex questions.\n- When a question is being asked, the PyDough code to answer it will look more similar to the text of the question than the SQL text would. This makes LLM generation of PyDough code simpler since there is a stronger correlation between a question asked and the PyDough code to answer it.\n- Often, PyDough code will be significantly more compact than equivalent SQL text, and therefore easier for a human to verify for logical correctness.\n- PyDough is portable between various database execution solutions, so you are not locked into one data storage solution while using PyDough.\n\n## Learning About PyDough\n\nRefer to these documents to learn how to use PyDough:\n\n- [Spec for the PyDough DSL](https://github.com/bodo-ai/PyDough/blob/main/documentation/dsl.md)\n- [Spec for the PyDough metadata](https://github.com/bodo-ai/PyDough/blob/main/documentation/metadata.md)\n- [List of builtin PyDough functions](https://github.com/bodo-ai/PyDough/blob/main/documentation/functions.md)\n- [Usage guide for PyDough](https://github.com/bodo-ai/PyDough/blob/main/documentation/usage.md)\n\n## Installing or Developing PyDough\n\nPyDough releases are [available on PyPI](https://pypi.org/project/pydough/) and can be installed via pip:\n\n```\npip install pydough\n```\n\nFor local development, PyDough uses `uv` as a package manager.\nPlease refer to their docs for [installation](https://docs.astral.sh/uv/getting-started/).\n\n\nTo run testing commands after installing `uv`, run the following command:\n\n```bash\nuv run pytest \u003cpytest_arguments\u003e\n```\n\nIf you want to skip tests that execute runtime results because they are slower,\nmake sure to include `-m \"not execute\"` in the pytest arguments.\n\nNote: some tests may require an additional setup to run successfully.\nThe [demos](https://github.com/bodo-ai/PyDough/blob/main/demos/README.md) directory \ncontains more information on how to setup the TPC-H sqlite database. For\ntesting, the `tpch.db` file must be located in the `tests` directory.\nAdditionally, the [`setup_defog.sh`](https://github.com/bodo-ai/PyDough/blob/main/tests/setup_defog.sh)\nscript must be run so that the `defog.db` file is located in the `tests` directory.\n\n## Running CI Tests\n\nWhen submitting a PR, you can control which CI tests run by adding special flags\nto your **latest commit message**.\n\n**Note:** All flags are **case-insensitive**.\n\n- To run **PyDough CI tests**, add: `[run CI]` (only runs **SQLite tests**, no other SQL dialects)  \n- To run **PyDough and all dialect tests**, add: `[run all]`  \n- To run **specific dialect tests**, use the corresponding flag as described below.\n\n### Running Snowflake Tests on CI\nTo run **Snowflake CI tests**, add the flag `[run SF]` to your commit message.\n\n**Running Snowflake tests locally:**\n\n1. Install the Snowflake Connector for Python with Pandas support\n    ```bash\n    pip install \"snowflake-connector-python[pandas]\"\n    ```\n\n2. Set your Snowflake credentials as environment variables:\n    ```bash\n        export SF_USERNAME=\"your_username\"\n        export SF_PASSWORD=\"your_password\"\n        export SF_ACCOUNT=\"your_account\"\n    ```\n\n### Running MySQL Tests on CI\nTo run **MySQL CI tests**, add the flag `[run mysql]` to your commit message.\n\n**Running MySQL tests locally:**\n\n1. Make sure you have [**Docker Desktop**](https://www.docker.com/get-started/)\n installed and running.\n\n2. Install the MySQL Connector for Python\n    ```bash\n    pip install mysql-connector-python\n    ```\n\n3. Set your MySQL credentials as environment variables:\n    ```bash\n        export MYSQL_USERNAME=\"your_username\"\n        export MYSQL_PASSWORD=\"your_password\"\n\n## Running Postgres Tests on CI\nTo run **Postgres CI tests**, add the flag `[run postgres]` to your commit message.\n\n**Running Postgres tests locally:**\n\n1. Make sure you have [**Docker Desktop**](https://www.docker.com/get-started/)\n installed and running.\n\n2. Install the Postgres Connector for Python\n    ```bash\n    pip install psycopg2-binary\n    ```\n    \n3. Set your Postgres credentials as environment variables:\n    ```bash\n        export POSTGRES_DB=\"your_database\"\n        export POSTGRES_USER=\"your_username\"\n        export POSTGRES_PASSWORD=\"your_password\"\n    ```\n\n## Runtime Dependencies\n\nPyDough requires having the following Python modules installed to use\nthe library:\n\n- pytz, pandas, sqlglot\n\nThe full list of dependencies can be found in the `pyproject.toml` file.\n\n## Demo Notebooks\n\nThe `demo` folder contains a series of example Jupyter Notebooks\nthat can be used to understand PyDough's capabilities. We recommend any new user start\nwith the [demo readme](https://github.com/bodo-ai/PyDough/blob/main/demos/README.md) and then walk through the example Juypter notebooks.\n\n## Meta Visualizer\n\nThe `meta_visualizer` folder contains a simple web application that can be used to visualize the metadata of a PyDough Knowledge Graph.\nIt displays the collections, properties, and relationships between collections in a knowledge graph and can be used to check the relations between collections to debug PyDough queries as well as build complex queries.\n\nPlease refer to the [meta visualizer README](https://github.com/bodo-ai/PyDough/blob/main/meta_visualizer/README.md) for more information on features and usage.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbodo-ai%2Fpydough","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbodo-ai%2Fpydough","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbodo-ai%2Fpydough/lists"}