https://github.com/idinsmore1/polars-mas
Polars-backed library for fast multiple testing
https://github.com/idinsmore1/polars-mas
Last synced: about 1 month ago
JSON representation
Polars-backed library for fast multiple testing
- Host: GitHub
- URL: https://github.com/idinsmore1/polars-mas
- Owner: idinsmore1
- Created: 2024-10-10T19:20:09.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-03-20T15:16:54.000Z (about 1 month ago)
- Last Synced: 2025-03-20T15:42:25.392Z (about 1 month ago)
- Language: Python
- Size: 253 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- trackawesomelist - polars-mas (⭐0) - A CLI tool and python library meant to perform large scale multiple association tests, primarily seen in academic research by [@idinsmore1](https://github.com/idinsmore1). (Recently Updated / [Mar 21, 2025](/content/2025/03/21/README.md))
- awesome-polars - polars-mas - A CLI tool and python library meant to perform large scale multiple association tests, primarily seen in academic research by [@idinsmore1](https://github.com/idinsmore1). (Tools build with Polars / Ruby)
README
# Polars-MAS: Multiple Association Studies
`polars-mas` is a python library and CLI tool meant to perform large scale multiple association tests, primarily seen in academic research. Currently this tool only supports Firth's logistic regression. Will run as a stand in replacement for PheWAS R package analysis, especially for Phecodes. `polars-mas` is built to leverage the speed and memory efficiency of the `polars` dataframe library and it's interoperability with the `sklearn` and `statsmodels` libraries.
## Installation
```python
pip install polars-mas
```## Running the CLI
```text
polars-mas --helpPolars-MAS: A Python package for multiple association analysis.
options:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Input file path.
-o OUTPUT, --output OUTPUT
Output file prefix. Will be suffixed with '{predictor}.csv'.
-p PREDICTORS [PREDICTORS ...], --predictors PREDICTORS [PREDICTORS ...]
Predictor column names. These will be tested independently
-s SEPARATOR, --separator SEPARATOR
Column separator. Default is ","
-d DEPENDENTS [DEPENDENTS ...], --dependents DEPENDENTS [DEPENDENTS ...]
Dependent variable column names.
-di DEPENDENTS_INDICES, --dependents-indices DEPENDENTS_INDICES
Dependent variable column indicies. Ignored if --dependents is used. Accepts comma separated list of
indices/indicies ranges. E.g. 2, 2-5, 2-, 2,3 , 2,5-8, 2,8- are all valid. Range follows python
slicing conventions - includes start, excludes end.
-c COVARIATES [COVARIATES ...], --covariates COVARIATES [COVARIATES ...]
Covariate column names.
-ci COVARIATES_INDICIES, --covariates-indicies COVARIATES_INDICIES
Covariate column indicies. Ignored if --covariates is used. Accepts comma separated list of
indices/indicies ranges. E.g. 2, 2-5, 2-, 2,3 , 2,5-8, 2,8- are all valid. Range follows python
slicing conventions - includes start, excludes end.
-cc CATEGORICAL_COVARIATES [CATEGORICAL_COVARIATES ...], --categorical-covariates CATEGORICAL_COVARIATES [CATEGORICAL_COVARIATES ...]
Categorical covariate column names.
-nv NULL_VALUES [NULL_VALUES ...], --null-values NULL_VALUES [NULL_VALUES ...]
List of values to be treated as missing values. Default is None (normal polars option).
-qt, --quantitative Dependent variables are quantitative traits.
-mi {drop,forward,backward,min,max,mean,zero,one}, --missing {drop,forward,backward,min,max,mean,zero,one}
Method to handle missing values in covariates and predictor variables. If not specified, rows with
missing values in the predictor and covariate columns will be dropped.
-t {standard,min-max}, --transform {standard,min-max}
Transform continuous covariates/predictor variables. Default is no transformation.
-mc MIN_CASES, --min-cases MIN_CASES
Minimum number of cases for each dependent variable. Only applied when not --quantitative. Default is
20.
-m {firth,linear}, --model {firth,linear}
Type of model to fit. Default is firth logistic regression.
--phewas Input data uses Phecodes for dependent variables.
--phewas-sex-col PHEWAS_SEX_COL
Sex covariate column name for PheWAS analysis. Default = 'sex'. Must be coded as male = 0 and female
= 1.
-th THREADS, --threads THREADS
Number of threads for numpy and sklearn to use within each worker.
-n NUM_WORKERS, --num-workers NUM_WORKERS
Number of workers for parallel processing and threads available to Polars. Default is number of CPUs.
-v, --verbose have more verbose logging
```
If you have an R environment with the `PheWAS` package installed, you can run the `src/tests/example_data/generate_examples.R` script to create dummy data for this repository.**NOTE ON THREADS AND WORKERS**: The total number of threads used by `polars-mas` is the number of workers (`-n`) multiplied by the number of threads (`-th`). So if you have 4 workers with 8 threads, `polars-mas` will use 32 threads on your machine.