{"id":13688952,"url":"https://github.com/pytorch/torcharrow","last_synced_at":"2025-10-19T07:31:37.135Z","repository":{"id":36958759,"uuid":"410996252","full_name":"pytorch/torcharrow","owner":"pytorch","description":"High performance model preprocessing library on PyTorch","archived":true,"fork":false,"pushed_at":"2024-03-29T23:39:10.000Z","size":11886,"stargazers_count":651,"open_issues_count":57,"forks_count":78,"subscribers_count":24,"default_branch":"main","last_synced_at":"2025-01-18T05:19:33.443Z","etag":null,"topics":["preprocessing","python","pytorch"],"latest_commit_sha":null,"homepage":"https://pytorch.org/torcharrow/beta/index.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pytorch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-09-27T18:19:18.000Z","updated_at":"2025-01-12T01:49:09.000Z","dependencies_parsed_at":"2024-01-13T10:40:45.827Z","dependency_job_id":"b0d144e3-661a-4384-992b-7832c829bcda","html_url":"https://github.com/pytorch/torcharrow","commit_stats":{"total_commits":439,"total_committers":54,"mean_commits":8.12962962962963,"dds":0.7949886104783599,"last_synced_commit":"15a7f7124d4c73c8c541547aef072264baab63b7"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Ftorcharrow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Ftorcharrow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Ftorcharrow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Ftorcharrow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pytorch","download_url":"https://codeload.github.com/pytorch/torcharrow/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237088493,"owners_count":19253565,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["preprocessing","python","pytorch"],"created_at":"2024-08-02T15:01:28.704Z","updated_at":"2025-10-19T07:31:31.536Z","avatar_url":"https://github.com/pytorch.png","language":"Python","readme":"# TorchArrow: a data processing library for PyTorch\n\n**This library currently does not have a stable release. The API and implementation may change. \nFuture changes may not be backward compatible.**\n\nTorchArrow is a [torch](https://github.com/pytorch/pytorch).Tensor-like Python DataFrame library for data preprocessing in PyTorch models, with two high-level features:\n\n* DataFrame library (like Pandas) with strong GPU or other hardware acceleration (under development) and PyTorch ecosystem integration.\n* Columnar memory layout based on [Apache Arrow](https://arrow.apache.org/docs/format/Columnar.html#physical-memory-layout) with strong variable-width and nested data support (such as string, list, map) and Arrow ecosystem integration.\n\n## Installation\n\nYou will need Python 3.7 or later. Also, we highly recommend installing an [Miniconda](https://docs.conda.io/en/latest/miniconda.html#latest-miniconda-installer-links) environment.\n\nFirst, set up an environment. If you are using conda, create a conda environment:\n```\nconda create --name torcharrow python=3.7\nconda activate torcharrow\n```\n\n### Version Compatibility\n\nThe following is the corresponding `torcharrow` versions and supported Python versions.\n\n| `torch`            | `torcharrow`        | `python`          |\n| ------------------ | ------------------ | ----------------- |\n| `main` / `nightly` | `main` / `nightly` | `\u003e=3.7`, `\u003c=3.10` |\n| `1.13.0`           | `0.2.0`            | `\u003e=3.7`, `\u003c=3.10` |\n\n\n### Colab\n\nFollow the instructions [in this Colab notebook](https://colab.research.google.com/drive/1S0ldwN7qNM37E4WZnnAEnzn1DWnAQ6Vt)\n\n### Nightly Binaries\n\nExperimental nightly binary on macOS (requires macOS SDK \u003e= 10.15) and Linux (requires glibc \u003e= 2.17) for Python 3.7, 3.8, and 3.9 can be installed via pip wheels:\n```\npip install --pre torcharrow -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html\n```\n\n### From Source\n\nIf you are installing from source, you will need Python 3.7 or later and a C++17 compiler.\n\n#### Get the TorchArrow Source\n```bash\ngit clone --recursive https://github.com/pytorch/torcharrow\ncd torcharrow\n# if you are updating an existing checkout\ngit submodule sync --recursive\ngit submodule update --init --recursive\n```\n\n#### Install Dependencies\n\nOn macOS\n\n[HomeBrew](https://brew.sh/) is required to install development tools on macOS.\n\n```bash\n# Install dependencies from Brew\nbrew install --formula ninja flex bison cmake ccache icu4c boost gflags glog libevent\n\n# Build and install other dependencies\nscripts/build_mac_dep.sh ranges_v3 fmt double_conversion folly re2\n```\n\nOn Ubuntu (20.04 or later)\n```bash\n# Install dependencies from APT\napt install -y g++ cmake ccache ninja-build checkinstall \\\n    libssl-dev libboost-all-dev libdouble-conversion-dev libgoogle-glog-dev \\\n    libgflags-dev libevent-dev libre2-dev libfl-dev libbison-dev\n# Build and install folly and fmt\nscripts/setup-ubuntu.sh\n```\n\n#### Install TorchArrow\nFor local development, you can build with debug mode:\n```\nDEBUG=1 python setup.py develop\n```\n\nAnd run unit tests with\n```\npython -m unittest -v\n```\n\nTo build and install TorchArrow with release mode:\n```\npython setup.py install\n```\n\n## License\n\nTorchArrow is BSD licensed, as found in the [LICENSE](LICENSE) file.\n","funding_links":[],"categories":["Python","Deep Learning Framework"],"sub_categories":["High-Level DL APIs"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpytorch%2Ftorcharrow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpytorch%2Ftorcharrow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpytorch%2Ftorcharrow/lists"}