Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/biona001/knockoffspy


https://github.com/biona001/knockoffspy

Last synced: 4 days ago
JSON representation

Awesome Lists containing this project

README

        

# knockoffspy

An interface to [Knockoffs.jl](https://github.com/biona001/Knockoffs.jl) from the Python programming language. `knockoffspy` provides unique high performance methods for sampling various model-X knockoffs and ships with built-in routines for variable selection. Much of the functionality are unique and allow for orders of magnitude speedup over conventional methods. 'knockoffspy' attaches a Python interface onto the package, allowing seamless use of this tooling by Python users.

## Installation

To install `knockoffspy`, use pip
```
pip install knockoffspy
```
Then in the python interpreter,
```python
>>> import knockoffspy
>>> knockoffspy.install()
```
This will install the `Knockoffs.jl` package and all the Julia dependencies that it needs.

## Usage

Import the package as
```python
from knockoffspy import ko
```
The general flow for using the package is to follow exactly as would be done in Julia, except add `ko.` in front of function calls. Most of the commands will work without any modification. Thus the [Knockoffs.jl documentation](https://biona001.github.io/Knockoffs.jl/dev/) is the main in-depth documentation for this package. Below we will show how to translate these docs to Python code.

## Documentation

Most of the commands of `Knockoffs.jl` will work in python without any modification, just add `ko.` in front of function calls. Thus the [Knockoffs.jl documentation](https://biona001.github.io/Knockoffs.jl/dev/) is the main in-depth documentation for this package. Below we will show how to translate these docs to Python code.

## Example: Exact model-X group knockoffs

Lets simulate `X ~ N(0, Sigma)` where `Sigma` is a symmetric Toesplitz matrix. Here we assume every 5 variables form a group

```python
from knockoffspy import ko
from scipy import linalg
import numpy as np

# generate data
n = 1000 # number of samples
p = 1000 # number of covariates
m = 1 # number of knockoffs to generate per feature
groups = np.repeat(np.arange(0,200,1), 5)
Sigma = linalg.toeplitz([0.7**i for i in range(1, p+1)])
mu = np.zeros(p)
X = np.random.multivariate_normal(mean=mu, cov=Sigma, size=(n,))
```
We generate model-X group knockoffs as follows
```python
solver = "maxent" # Maximum entropy solver, other choices include "mvr", "sdp", "equi"
result = ko.modelX_gaussian_group_knockoffs(X, solver, groups, mu, Sigma, verbose=True)

Maxent initial obj = -2087.6929364666807
Iter 1 (PCA): obj = -1607.4371641119483, δ = 0.4806840533479427, t1 = 0.28, t2 = 0.46
Iter 2 (CCD): obj = -1589.951838589172, δ = 0.046537146748581976, t1 = 0.42, t2 = 1.28, t3 = 0.0
Iter 3 (PCA): obj = -1570.44910152802, δ = 0.32338244109034703, t1 = 0.67, t2 = 1.74
Iter 4 (CCD): obj = -1562.5471454507458, δ = 0.028462155141072386, t1 = 0.81, t2 = 2.56, t3 = 0.0
Iter 5 (PCA): obj = -1557.1393033537286, δ = 0.124560844581473, t1 = 1.04, t2 = 2.99
Iter 6 (CCD): obj = -1552.4489159484508, δ = 0.020754156607442897, t1 = 1.18, t2 = 3.81, t3 = 0.01
Iter 7 (PCA): obj = -1549.810615656943, δ = 0.07012194156368799, t1 = 1.43, t2 = 4.27
Iter 8 (CCD): obj = -1547.020766696055, δ = 0.015614065719368509, t1 = 1.56, t2 = 5.09, t3 = 0.01
Iter 9 (PCA): obj = -1545.531088575216, δ = 0.0511701859313534, t1 = 1.82, t2 = 5.58
Iter 10 (CCD): obj = -1543.8817717502163, δ = 0.013019011861975537, t1 = 1.95, t2 = 6.4, t3 = 0.01
Iter 11 (PCA): obj = -1542.9960966431295, δ = 0.04029486159842148, t1 = 2.23, t2 = 6.87
Iter 12 (CCD): obj = -1542.0192637205867, δ = 0.011386766045749418, t1 = 2.36, t2 = 7.69, t3 = 0.01
Iter 13 (PCA): obj = -1541.4898664708514, δ = 0.03310222247410438, t1 = 2.61, t2 = 8.17
Iter 14 (CCD): obj = -1540.9005704168408, δ = 0.010234115029284592, t1 = 2.74, t2 = 8.99, t3 = 0.01
Iter 15 (PCA): obj = -1540.5789961008802, δ = 0.027573434751233035, t1 = 3.5, t2 = 9.5
Iter 16 (CCD): obj = -1540.221696231743, δ = 0.009352305423352147, t1 = 3.62, t2 = 10.33, t3 = 0.01
Iter 17 (PCA): obj = -1540.0257796341718, δ = 0.02345771973192861, t1 = 4.15, t2 = 10.83
Iter 18 (CCD): obj = -1539.806801955503, δ = 0.008629474061316174, t1 = 4.28, t2 = 11.68, t3 = 0.02
Iter 19 (PCA): obj = -1539.689234490581, δ = 0.020162878516809316, t1 = 4.81, t2 = 12.21
Iter 20 (CCD): obj = -1539.5543495056934, δ = 0.007951906306609491, t1 = 4.93, t2 = 13.03, t3 = 0.02
```
To extract the knockoffs, S matrix, and the final objective as
```python
Xko = result.Xko
S = result.S
obj = result.obj
```

## Citation and reproducibility

If you use this software in a research paper, please cite [our paper](https://arxiv.org/abs/2310.15069). Scripts to reproduce the results featured in our paper can be found [here](https://github.com/biona001/group-knockoff-reproducibility).