{"id":17487493,"url":"https://github.com/danielavdar/pandas-pyarrow","last_synced_at":"2025-12-14T03:14:59.509Z","repository":{"id":226228691,"uuid":"766905445","full_name":"DanielAvdar/pandas-pyarrow","owner":"DanielAvdar","description":"Seamlessly switch Pandas DataFrame backend to PyArrow.","archived":false,"fork":false,"pushed_at":"2025-04-12T15:48:27.000Z","size":894,"stargazers_count":8,"open_issues_count":2,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-17T04:19:15.637Z","etag":null,"topics":["arrow","backend","db-dtypes","dtypes","pandas","pandas-arrow","pandas-dataframe","pandas-pyarrow","pyarrow","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DanielAvdar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-03-04T10:54:36.000Z","updated_at":"2025-04-12T15:46:34.000Z","dependencies_parsed_at":"2024-04-02T12:38:29.125Z","dependency_job_id":"7c688c07-641e-4b74-b726-f201d1edb5cd","html_url":"https://github.com/DanielAvdar/pandas-pyarrow","commit_stats":{"total_commits":215,"total_committers":2,"mean_commits":107.5,"dds":0.3023255813953488,"last_synced_commit":"74c668d2472aacfbb9443843a435f4803358b83a"},"previous_names":["danielavdar/schemarrow","danielavdar/pandas-pyarrow"],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DanielAvdar%2Fpandas-pyarrow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DanielAvdar%2Fpandas-pyarrow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DanielAvdar%2Fpandas-pyarrow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DanielAvdar%2Fpandas-pyarrow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DanielAvdar","download_url":"https://codeload.github.com/DanielAvdar/pandas-pyarrow/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250275997,"owners_count":21403786,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrow","backend","db-dtypes","dtypes","pandas","pandas-arrow","pandas-dataframe","pandas-pyarrow","pyarrow","python"],"created_at":"2024-10-19T03:04:46.368Z","updated_at":"2025-12-14T03:14:54.453Z","avatar_url":"https://github.com/DanielAvdar.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pandas-pyarrow\n\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pandas-pyarrow)](https://pypi.org/project/pandas-pyarrow/)\n[![PyPI Version](https://img.shields.io/pypi/v/pandas-pyarrow)](https://pypi.org/project/pandas-pyarrow/)\n[![License](https://img.shields.io/badge/MIT-License-blue)](https://opensource.org/licenses/MIT)\n![Ubuntu](https://img.shields.io/badge/Ubuntu-Supported-blue?logo=ubuntu)\n![Windows](https://img.shields.io/badge/Windows-Supported-blue?logo=windows)\n![macOS](https://img.shields.io/badge/macOS-Supported-blue?logo=apple)\n[![Continuous Integration](https://github.com/DanielAvdar/pandas-pyarrow/actions/workflows/ci.yml/badge.svg)](https://github.com/DanielAvdar/pandas-pyarrow/actions/workflows/ci.yml)\n[![Code Quality](https://github.com/DanielAvdar/pandas-pyarrow/actions/workflows/code-checks.yml/badge.svg)](https://github.com/DanielAvdar/pandas-pyarrow/actions/workflows/code-checks.yml)\n[![Coverage Status](https://codecov.io/gh/DanielAvdar/pandas-pyarrow/branch/main/graph/badge.svg?token=N0V9KANTG2)](https://codecov.io/gh/DanielAvdar/pandas-pyarrow)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n![Last Commit](https://img.shields.io/github/last-commit/DanielAvdar/pandas-pyarrow/main)\n\n`pandas-pyarrow` simplifies the conversion of pandas backends to pyarrow, allowing a seamless switch to pyarrow pandas\nbackend.\n\n## Get started:\n\n### Installation\n\nInstall the package using pip:\n\n```bash\npip install pandas-pyarrow\n```\n\n### Usage\n\n```python\nimport pandas as pd\nfrom pandas_pyarrow import convert_to_pyarrow\n\n# Create a pandas DataFrame\ndf = pd.DataFrame({\n    'A': [1, 2, 3],\n    'B': ['a', 'b', 'c'],\n    'C': [1.1, 2.2, 3.3],\n    'D': [True, False, True]\n})\n\n# Convert the pandas DataFrame dtypes to arrow dtypes\nadf: pd.DataFrame = convert_to_pyarrow(df)\n\nprint(adf.dtypes)\n```\n\nOutputs:\n\n```\nA     int64[pyarrow]\nB    string[pyarrow]\nC    double[pyarrow]\nD      bool[pyarrow]\ndtype: object\n```\n\nFurthermore, it's possible to add mappings or override existing ones:\n\n```python\nimport pandas as pd\n\nfrom pandas_pyarrow import PandasArrowConverter\n\n# Create a pandas DataFrame\ndf = pd.DataFrame({\n    'A': [1, 2, 3],\n    'B': ['a', 'b', 'c'],\n    'C': [1.1, 2.2, 3.3],\n    'D': [True, False, True]\n})\n\n# Instantiate a PandasArrowConverter object\npandas_pyarrow_converter = PandasArrowConverter(\n    custom_mapper={'int64': 'int32[pyarrow]', 'float64': 'float32[pyarrow]'})\n\n# Convert the pandas DataFrame dtypes to arrow dtypes\nadf: pd.DataFrame = pandas_pyarrow_converter(df)\n\nprint(adf.dtypes)\n```\n\noutputs:\n\n```\nA     int32[pyarrow]\nB    string[pyarrow]\nC     float[pyarrow]\nD      bool[pyarrow]\ndtype: object\n```\n\npandas-pyarrow also support db-dtypes used by bigquery python sdk:\n\n```bash\npip install pandas-gbq\n```\n\nor\n\n```bash\npip install pandas-pyarrow[bigquery]\n```\n\n```python\nimport pandas_gbq as gbq\n\nfrom pandas_pyarrow import PandasArrowConverter\n\n# Specify the public dataset and table you want to query\ndataset_id = \"bigquery-public-data\"\ntable_name = \"hacker_news.stories\"\n\n# Construct the query string\nquery = \"\"\"\n    SELECT * FROM `bigquery-public-data.austin_311.311_service_requests` LIMIT 1000\n\"\"\"\n\n# Use pandas_gbq to read the data from BigQuery\ndf = gbq.read_gbq(query)\npandas_pyarrow_converter = PandasArrowConverter()\nadf = pandas_pyarrow_converter(df)\n# Print the retrieved data\nprint(df.dtypes)\nprint(adf.dtypes)\n```\n\noutputs:\n\n```\nunique_key                               object\ncomplaint_description                    object\nsource                                   object\nstatus                                   object\nstatus_change_date          datetime64[us, UTC]\ncreated_date                datetime64[us, UTC]\nlast_update_date            datetime64[us, UTC]\nclose_date                  datetime64[us, UTC]\nincident_address                         object\nstreet_number                            object\nstreet_name                              object\ncity                                     object\nincident_zip                              Int64\ncounty                                   object\nstate_plane_x_coordinate                 object\nstate_plane_y_coordinate                float64\nlatitude                                float64\nlongitude                               float64\nlocation                                 object\ncouncil_district_code                     Int64\nmap_page                                 object\nmap_tile                                 object\ndtype: object\nunique_key                         string[pyarrow]\ncomplaint_description              string[pyarrow]\nsource                             string[pyarrow]\nstatus                             string[pyarrow]\nstatus_change_date          timestamp[us][pyarrow]\ncreated_date                timestamp[us][pyarrow]\nlast_update_date            timestamp[us][pyarrow]\nclose_date                  timestamp[us][pyarrow]\nincident_address                   string[pyarrow]\nstreet_number                      string[pyarrow]\nstreet_name                        string[pyarrow]\ncity                               string[pyarrow]\nincident_zip                        int64[pyarrow]\ncounty                             string[pyarrow]\nstate_plane_x_coordinate           string[pyarrow]\nstate_plane_y_coordinate           double[pyarrow]\nlatitude                           double[pyarrow]\nlongitude                          double[pyarrow]\nlocation                           string[pyarrow]\ncouncil_district_code               int64[pyarrow]\nmap_page                           string[pyarrow]\nmap_tile                           string[pyarrow]\ndtype: object\n```\n## Documentation\n\n[Documentation](https://pandas-pyarrow.readthedocs.io/en/latest/) is available online.\n\n## Purposes\n\n- Simplify the conversion process between pandas' pyarrow and numpy backends.\n- Provide seamless integration with the pyarrow pandas backend, even for challenging dtypes such as float16 or\n  db-dtypes.\n- Standardize dtypes for db-dtypes used by the BigQuery Python SDK.\n\n### Example:\n\n```python\nimport pandas as pd\n\n# Create a pandas DataFrame\ndf = pd.DataFrame({\n\n    'C': [1.1, 2.2, 3.3],\n\n}, dtype='float16')\n\ndf.convert_dtypes(dtype_backend='pyarrow')\n```\n\nwill raise an error:\n```\npyarrow.lib.ArrowNotImplementedError: Unsupported cast from halffloat to double using function cast_double\n```\n\nbut with pandas-pyarrow:\n\n```python\nimport pandas as pd\n\nfrom pandas_pyarrow import convert_to_pyarrow\n\n# Create a pandas DataFrame\ndf = pd.DataFrame({\n\n    'C': [1.1, 2.2, 3.3],\n\n}, dtype='float16')\nadf = convert_to_pyarrow(df)\nprint(adf.dtypes)\n\n```\noutputs:\n```\nC    halffloat[pyarrow]\ndtype: object\n```\n\n\n## Additional Information\n\nWhen converting from higher precision numerical dtypes (like float64) to\nlower precision (like float32), data precision might be compromised.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanielavdar%2Fpandas-pyarrow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanielavdar%2Fpandas-pyarrow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanielavdar%2Fpandas-pyarrow/lists"}