{"id":13465681,"url":"https://github.com/databricks/koalas","last_synced_at":"2025-05-13T19:07:09.840Z","repository":{"id":34538706,"uuid":"164026325","full_name":"databricks/koalas","owner":"databricks","description":"Koalas: pandas API on Apache Spark","archived":false,"fork":false,"pushed_at":"2024-03-20T15:33:34.000Z","size":12259,"stargazers_count":3352,"open_issues_count":110,"forks_count":362,"subscribers_count":318,"default_branch":"master","last_synced_at":"2025-04-27T04:36:13.925Z","etag":null,"topics":["big-data","data-science","dataframe","mlflow","pandas","pydata","spark"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/databricks.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-01-03T21:46:54.000Z","updated_at":"2025-04-22T02:15:45.000Z","dependencies_parsed_at":"2023-01-15T07:45:50.065Z","dependency_job_id":"7b3499e4-8dce-4dae-9302-dad2f42d2b69","html_url":"https://github.com/databricks/koalas","commit_stats":{"total_commits":1547,"total_committers":57,"mean_commits":"27.140350877192983","dds":0.6683904330963155,"last_synced_commit":"bcb3f77cbb9b0c76fc32fcc9b478eb71abfd8e15"},"previous_names":[],"tags_count":49,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fkoalas","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fkoalas/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fkoalas/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databricks%2Fkoalas/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/databricks","download_url":"https://codeload.github.com/databricks/koalas/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251089408,"owners_count":21534511,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","data-science","dataframe","mlflow","pandas","pydata","spark"],"created_at":"2024-07-31T15:00:33.777Z","updated_at":"2025-04-27T04:36:59.838Z","avatar_url":"https://github.com/databricks.png","language":"Python","readme":"## DEPRECATED: Koalas supports Apache Spark 3.1 and below as it is [officially included to PySpark in Apache Spark 3.2](https://issues.apache.org/jira/browse/SPARK-34849). This repository is now in maintenance mode. For Apache Spark 3.2 and above, please use [PySpark](https://spark.apache.org/docs/latest/api/python/migration_guide/koalas_to_pyspark.html) directly.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/databricks/koalas/master/icons/koalas-logo.png\" width=\"140\"/\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  pandas API on Apache Spark\n  \u003cbr/\u003e\n  \u003ca href=\"https://koalas.readthedocs.io/en/latest/?badge=latest\"\u003e\u003cstrong\u003eExplore Koalas docs »\u003c/strong\u003e\u003c/a\u003e\n  \u003cbr/\u003e\n  \u003cbr/\u003e\n  \u003ca href=\"https://mybinder.org/v2/gh/databricks/koalas/master?filepath=docs%2Fsource%2Fgetting_started%2F10min.ipynb\"\u003eLive notebook\u003c/a\u003e\n  ·\n  \u003ca href=\"https://github.com/databricks/koalas/issues\"\u003eIssues\u003c/a\u003e\n  ·\n  \u003ca href=\"https://groups.google.com/forum/#!forum/koalas-dev\"\u003eMailing list\u003c/a\u003e\n  \u003cbr/\u003e\n  \u003cstrong\u003e\u003ca href=\"https://www.gofundme.com/f/help-thirsty-koalas-devastated-by-recent-fires\"\u003eHelp Thirsty Koalas Devastated by Recent Fires\u003c/a\u003e\u003c/strong\u003e\n\u003c/p\u003e\n\nThe Koalas project makes data scientists more productive when interacting with big data, by implementing the pandas DataFrame API on top of Apache Spark.\n\npandas is the de facto standard (single-node) DataFrame implementation in Python, while Spark is the de facto standard for big data processing. With this package, you can:\n - Be immediately productive with Spark, with no learning curve, if you are already familiar with pandas.\n - Have a single codebase that works both with pandas (tests, smaller datasets) and with Spark (distributed datasets).\n\nWe would love to have you try it and give us feedback, through our [mailing lists](https://groups.google.com/forum/#!forum/koalas-dev) or [GitHub issues](https://github.com/databricks/koalas/issues).\n\nTry the Koalas 10 minutes tutorial on a live Jupyter notebook [here](https://mybinder.org/v2/gh/databricks/koalas/master?filepath=docs%2Fsource%2Fgetting_started%2F10min.ipynb). The initial launch can take up to several minutes.\n\n[![Github Actions](https://github.com/databricks/koalas/workflows/master/badge.svg)](https://github.com/databricks/koalas/actions)\n[![codecov](https://codecov.io/gh/databricks/koalas/branch/master/graph/badge.svg)](https://codecov.io/gh/databricks/koalas)\n[![Documentation Status](https://readthedocs.org/projects/koalas/badge/?version=latest)](https://koalas.readthedocs.io/en/latest/?badge=latest)\n[![Latest Release](https://img.shields.io/pypi/v/koalas.svg)](https://pypi.org/project/koalas/)\n[![Conda Version](https://img.shields.io/conda/vn/conda-forge/koalas.svg)](https://anaconda.org/conda-forge/koalas)\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/databricks/koalas/master?filepath=docs%2Fsource%2Fgetting_started%2F10min.ipynb)\n[![Downloads](https://pepy.tech/badge/koalas)](https://pepy.tech/project/koalas)\n\n\n## Getting Started\n\nKoalas can be installed in many ways such as Conda and pip.\n\n```bash\n# Conda\nconda install koalas -c conda-forge\n```\n\n```bash\n# pip\npip install koalas\n```\n\nSee [Installation](https://koalas.readthedocs.io/en/latest/getting_started/install.html) for more details.\n\nFor Databricks Runtime, Koalas is pre-installed in Databricks Runtime 7.1 and above. Try [Databricks Community Edition](https://community.cloud.databricks.com/) for free. You can also follow these [steps](https://docs.databricks.com/libraries/index.html) to manually install a library on Databricks.\n\nLastly, if your PyArrow version is 0.15+ and your PySpark version is lower than 3.0, it is best for you to set `ARROW_PRE_0_15_IPC_FORMAT` environment variable to `1` manually.\nKoalas will try its best to set it for you but it is impossible to set it if there is a Spark context already launched.\n\nNow you can turn a pandas DataFrame into a Koalas DataFrame that is API-compliant with the former:\n\n```python\nimport databricks.koalas as ks\nimport pandas as pd\n\npdf = pd.DataFrame({'x':range(3), 'y':['a','b','b'], 'z':['a','b','b']})\n\n# Create a Koalas DataFrame from pandas DataFrame\ndf = ks.from_pandas(pdf)\n\n# Rename the columns\ndf.columns = ['x', 'y', 'z1']\n\n# Do some operations in place:\ndf['x2'] = df.x * df.x\n```\n\nFor more details, see [Getting Started](https://koalas.readthedocs.io/en/latest/getting_started/index.html) and [Dependencies](https://koalas.readthedocs.io/en/latest/getting_started/install.html#dependencies) in the official documentation.\n\n\n## Contributing Guide\n\nSee [Contributing Guide](https://koalas.readthedocs.io/en/latest/development/contributing.html) and [Design Principles](https://koalas.readthedocs.io/en/latest/development/design.html) in the official documentation.\n\n\n## FAQ\n\nSee [FAQ](https://koalas.readthedocs.io/en/latest/user_guide/faq.html) in the official documentation.\n\n\n## Best Practices\n\nSee [Best Practices](https://koalas.readthedocs.io/en/latest/user_guide/best_practices.html) in the official documentation.\n\n\n## Koalas Talks and Blogs\n\nSee [Koalas Talks and Blogs](https://koalas.readthedocs.io/en/latest/getting_started/videos_blogs.html) in the official documentation.\n","funding_links":[],"categories":["Python","Cloud Scale Analytics","Basic Components","Data Manipulation","Apache Spark Tools, Libraries, and Frameworks","Distributed Computing Libraries","数据容器和结构","Data Containers \u0026 Dataframes","Simplification Tools","Table of Contents","Packages"],"sub_categories":["Azure Databricks","Alternative libraries","Data Frames","Higher Level APIs","Data, Dashboards \u0026 Visualization","Interfaces"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks%2Fkoalas","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatabricks%2Fkoalas","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabricks%2Fkoalas/lists"}