{"id":29867022,"url":"https://github.com/jrothbaum/polars_readstat","last_synced_at":"2026-04-29T23:05:44.563Z","repository":{"id":287653881,"uuid":"963585541","full_name":"jrothbaum/polars_readstat","owner":"jrothbaum","description":"Polars IO plugin to read SAS (sas7bdat), Stata (dta), and SPSS (sav) files","archived":false,"fork":false,"pushed_at":"2026-03-25T21:28:04.000Z","size":33747,"stargazers_count":24,"open_issues_count":0,"forks_count":4,"subscribers_count":2,"default_branch":"master","last_synced_at":"2026-03-26T20:47:12.459Z","etag":null,"topics":["polars","sas","spss","stata"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jrothbaum.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-04-09T22:51:54.000Z","updated_at":"2026-03-25T21:24:46.000Z","dependencies_parsed_at":null,"dependency_job_id":"e44850a5-b80e-4b55-80c4-ee83380be0da","html_url":"https://github.com/jrothbaum/polars_readstat","commit_stats":null,"previous_names":["jrothbaum/polars_readstat"],"tags_count":45,"template":false,"template_full_name":null,"purl":"pkg:github/jrothbaum/polars_readstat","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jrothbaum%2Fpolars_readstat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jrothbaum%2Fpolars_readstat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jrothbaum%2Fpolars_readstat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jrothbaum%2Fpolars_readstat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jrothbaum","download_url":"https://codeload.github.com/jrothbaum/polars_readstat/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jrothbaum%2Fpolars_readstat/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31307148,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-02T12:59:32.332Z","status":"ssl_error","status_checked_at":"2026-04-02T12:54:48.875Z","response_time":89,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["polars","sas","spss","stata"],"created_at":"2025-07-30T13:02:03.730Z","updated_at":"2026-04-02T13:34:06.928Z","avatar_url":"https://github.com/jrothbaum.png","language":"Rust","funding_links":[],"categories":["Libraries/Packages/Scripts"],"sub_categories":["Polars plugins"],"readme":"# polars_readstat\nPolars plugin for SAS (`.sas7bdat`), Stata (`.dta`), and SPSS (`.sav`/`.zsav`) files.\n\nThe Python package wraps the Rust core in [polars_readstat_rs](https://crates.io/crates/polars-readstat-rs) and exposes a Polars-first API. The project includes cross-library parity tests and roundtrip checks to reduce regressions.\n\nThe Rust engine is generally faster for many workloads, but performance varies by file shape and options. If you need the legacy C/C++ engine, use version 0.11.1 (see the [prior version](https://github.com/jrothbaum/polars_readstat/tree/250f516a4424fbbe84c931a41cb82b454c5ca205)).\n\n## Why use this?\n\n- In project benchmarks, the new Rust-backed engine is typically faster than pandas/pyreadstat on large SAS/Stata files, especially for subset/filter workloads.\n- It avoids the older C/C++ toolchain complexity and ships as standard Python wheels.\n- API is Polars-first (`scan_readstat`, `read_readstat`, `write_readstat`, `write_sas_csv_import`).\n\n## Install\n\n```bash\npip install polars-readstat\n```\n\n## Core API\n\n### 1) Lazy scan\n```python\nimport polars as pl\nfrom polars_readstat import scan_readstat\n\nlf = scan_readstat(\"/path/file.sas7bdat\", preserve_order=True)\ndf = lf.select([\"SERIALNO\", \"AGEP\"]).filter(pl.col(\"AGEP\") \u003e= 18).collect()\n```\n\n### 2) Getting metadata\n```python\nfrom polars_readstat import ScanReadstat\n\nreader = ScanReadstat(path=\"/path/file.sav\")\nschema = reader.schema      # polars.Schema\nmetadata = reader.metadata  # dict with file info and per-column details\nlf = reader.df              # LazyFrame — same as calling scan_readstat(path)\n```\n\n`metadata` is a dict with a `columns` list. Each column entry includes:\n- `\"name\"` — column name\n- `\"label\"` — variable label (description), if present\n- `\"value_labels\"` — dict mapping coded values to label strings, if present\n\n### 3) Write (Experimental)\nWriting support is experimental and compatibility varies across tools. Stata roundtrip tests are included; SPSS roundtrip coverage is limited. Please report issues.\n\n```python\nfrom polars_readstat import write_readstat, write_sas_csv_import\n\nwrite_readstat(df, \"/path/out.dta\")\nwrite_readstat(df, \"/path/out.sav\")\nwrite_sas_csv_import(df, \"/path/out/sas_bundle\", dataset_name=\"my_data\")\n```\n\n`write_readstat` supports Stata (`dta`) and SPSS (`sav`).  \nUse `write_sas_csv_import` for SAS-ingestible output (`.csv` + `.sas` import script). Binary `.sas7bdat` writing is not currently supported.\n\n## Docs\n\nView the docs at [https://jrothbaum.github.io/polars_readstat/](https://jrothbaum.github.io/polars_readstat/) for more information on the options you can pass to the scan and write functions.\n\n## Benchmark\n\nBenchmarks compare four scenarios: 1) load the full file, 2) load a subset of columns (Subset:True), 3) filter to a subset of rows (Filter: True), 4) load a subset of columns and filter to a subset of rows (Subset:True, Filter: True).\n\nBenchmark context:\n- Machine: AMD Ryzen 7 8845HS (16 cores), 14 GiB RAM, Linux Mint 22\n- Storage: external SSD\n- `polars-readstat` (rust engine v0.12.4) last run: February 24, 2026; comparison library timings for SAS/Stata (v0.11.1) last run August 31, 2025\n- Version tested: `polars-readstat` 0.12.4 (new Rust engine) against polars-readstat 0.11.1 (prior C++ and C engines) and pandas and pyreadstat\n- Method: wall-clock timings via Python `time.time()`\n\n### Compared to Pandas and Pyreadstat (using read_file_multiprocessing for parallel processing in Pyreadstat)\n#### SAS\nall times in seconds (speedup relative to pandas in parenthesis below each)\n| Library | Full File | Subset: True | Filter: True | Subset: True, Filter: True |\n|---------|------------------------------|-----------------------------|-----------------------------|----------------------------|\n| polars_readstat\u003cbr\u003e[New rust engine](https://crates.io/crates/polars-readstat-rs) | 0.72\u003cbr\u003e(2.9×) | 0.04\u003cbr\u003e(51.5×) | 1.04\u003cbr\u003e(2.9×) | 0.04\u003cbr\u003e(52.5×) |\n| polars_readstat\u003cbr\u003eengine=\"cpp\"\u003cbr\u003e(fastest for 0.11.1) | 1.31\u003cbr\u003e(1.6×) | 0.09\u003cbr\u003e(22.9×) | 1.56\u003cbr\u003e(1.9×) | 0.09\u003cbr\u003e(23.2×) |\n| pandas | 2.07 | 2.06 | 3.03 | 2.09 |\n| pyreadstat | 10.75\u003cbr\u003e(0.2×) | 0.46\u003cbr\u003e(4.5×) | 11.93\u003cbr\u003e(0.3×) | 0.50\u003cbr\u003e(4.2×) |\n\n#### Stata\nall times in seconds (speedup relative to pandas in parenthesis below each)\n| Library | Full File | Subset: True | Filter: True | Subset: True, Filter: True |\n|---------|------------------------------|-----------------------------|-----------------------------|----------------------------|\n| polars_readstat\u003cbr\u003e[New rust engine](https://crates.io/crates/polars-readstat-rs) | 0.17\u003cbr\u003e(6.7×) | 0.12\u003cbr\u003e(9.8×) | 0.24\u003cbr\u003e(4.1×) | 0.11\u003cbr\u003e(8.7×) |\n| polars_readstat\u003cbr\u003eengine=\"readstat\"\u003cbr\u003e(the only option for 0.11.1) | 1.80\u003cbr\u003e(0.6×) | 0.27\u003cbr\u003e(4.4×) | 1.31\u003cbr\u003e(0.8×) | 0.29\u003cbr\u003e(3.3×) |\n| pandas | 1.14 | 1.18 | 0.99 | 0.96 |\n| pyreadstat | 7.46\u003cbr\u003e(0.2×) | 2.18\u003cbr\u003e(0.5×) | 7.66\u003cbr\u003e(0.1×) | 2.24\u003cbr\u003e(0.4×) |\n\n#### SPSS\nall times in seconds (speedup relative to pandas in parenthesis below each)\n| Library | Full File | Subset: True | Filter: True | Subset: True, Filter: True |\n|---------|------------------------------|-----------------------------|-----------------------------|----------------------------|\n| polars_readstat\u003cbr\u003e[New rust engine](https://crates.io/crates/polars-readstat-rs) | 0.22\u003cbr\u003e(6.6×) | 0.15\u003cbr\u003e(9.1×) | 0.25\u003cbr\u003e(6.0×) | 0.26\u003cbr\u003e(4.5×) |\n| pandas | 1.46 | 1.36 | 1.49 | 1.16 |\n\nDetailed benchmark notes and dataset descriptions are in `BENCHMARKS.md`.\n\n\n## Tests run\n\nTest coverage includes:\n- Cross-library comparisons on the pyreadstat and pandas test data, checking results against `polars-readstat==0.11.1`, [pyreadstat](https://github.com/Roche/pyreadstat), and [pandas](https://github.com/pandas-dev/pandas).\n- Stata/SPSS read/write roundtrip tests.\n- Large-file read/write benchmark runs on real-world data (results below).\n\nIf you want to run the same checks locally, helper scripts and tests are in `scripts/` and `tests/`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjrothbaum%2Fpolars_readstat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjrothbaum%2Fpolars_readstat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjrothbaum%2Fpolars_readstat/lists"}