Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/biona001/groupknockoff

Building a simple executable for solving group knockoff optimization
https://github.com/biona001/groupknockoff

Last synced: about 1 month ago
JSON representation

Building a simple executable for solving group knockoff optimization

Host: GitHub
URL: https://github.com/biona001/groupknockoff
Owner: biona001
License: mit
Created: 2023-10-25T22:13:53.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-05-11T17:32:46.000Z (9 months ago)
Last Synced: 2024-11-13T03:07:51.902Z (3 months ago)
Language: Jupyter Notebook
Size: 1.63 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Simple app to solve group knockoff optimization

This is an experimental repository that provides a relocatable "app" for code written in Julia. We use [PackageCompiler.jl](https://github.com/JuliaLang/PackageCompiler.jl) to wrap a simple routine that solves the [group knockoff optimization problem](https://arxiv.org/abs/2310.15069).

## Download

First, git clone the repository 

```

git clone https://github.com/biona001/groupknockoff

```

Then, unzip the `app_XXX.zip` file where `XXX` matches your operating system (currently only apple silicon and linux x86 is supported)

```

cd groupknockoff

unzip app_apple_silicon.zip

```

The executable will be located in `app_XXX/bin/groupknockoff`. 

## Usage & example

```

groupknockoff sigma.txt m

```

where 

+ `sigma.txt` is a plain text file containing the correlation matrix stored in tab-separated format. There should be no header

+ `m`: Number of knockoffs the analyst wishes to generate, which defines the constraits in the optimization problem.

**Example (using data under `src`):** Expected output should be

```

$ pwd

/Users/biona001/.julia/dev/groupknockoff/data

$ ../app_apple_silicon/bin/groupknockoff sigma.txt 5

203 representatives for 430 variables, 241 optimization variables

Maxent initial obj = -3976.639913431866

Iter 1 (PCA): obj = -3272.9160443188894, δ = 0.5307152346716336, t1 = 0.01, t2 = 0.0

Iter 2 (CCD): obj = -2668.3862564376786, δ = 0.5932619592923948, t1 = 0.01, t2 = 0.01, t3 = 0.0

Iter 3 (PCA): obj = -2398.5378619681787, δ = 0.3873740427258285, t1 = 0.03, t2 = 0.01

Iter 4 (CCD): obj = -2239.4242488789246, δ = 0.3058039465734939, t1 = 0.03, t2 = 0.01, t3 = 0.0

Iter 5 (PCA): obj = -2137.345534062957, δ = 0.22679100360241075, t1 = 0.04, t2 = 0.02

Iter 6 (CCD): obj = -2107.4623405785032, δ = 0.2593014348969042, t1 = 0.04, t2 = 0.02, t3 = 0.0

Iter 7 (PCA): obj = -2069.4601001286233, δ = 0.12449914101196095, t1 = 0.05, t2 = 0.02

Iter 8 (CCD): obj = -2068.2089269896646, δ = 0.023272211430398024, t1 = 0.05, t2 = 0.03, t3 = 0.0

Iter 9 (PCA): obj = -2050.1590796498435, δ = 0.06221588512243029, t1 = 0.06, t2 = 0.03

Iter 10 (CCD): obj = -2049.1320452314144, δ = 0.021322296710926156, t1 = 0.06, t2 = 0.03, t3 = 0.0

Iter 11 (PCA): obj = -2038.3564136916027, δ = 0.06800904635159992, t1 = 0.08, t2 = 0.04

Iter 12 (CCD): obj = -2037.5314788114642, δ = 0.018412342893657115, t1 = 0.08, t2 = 0.04, t3 = 0.0

Iter 13 (PCA): obj = -2030.4149756595025, δ = 0.0531219671577248, t1 = 0.09, t2 = 0.05

Iter 14 (CCD): obj = -2029.7407140011135, δ = 0.015709132089124465, t1 = 0.09, t2 = 0.05, t3 = 0.0

Iter 15 (PCA): obj = -2024.5077557515956, δ = 0.06285340531750039, t1 = 0.1, t2 = 0.05

Iter 16 (CCD): obj = -2023.9459179850119, δ = 0.013101848749280543, t1 = 0.1, t2 = 0.05, t3 = 0.0

Iter 17 (PCA): obj = -2020.2536464440443, δ = 0.05970081849925719, t1 = 0.11, t2 = 0.06

Iter 18 (CCD): obj = -2019.8086631464043, δ = 0.00969890773472233, t1 = 0.11, t2 = 0.06, t3 = 0.0

Iter 19 (PCA): obj = -2017.1739529896302, δ = 0.034077311847352285, t1 = 0.12, t2 = 0.07

Iter 20 (CCD): obj = -2016.8627869251097, δ = 0.00811031513217606, t1 = 0.12, t2 = 0.07, t3 = 0.0

Iter 21 (PCA): obj = -2014.952457640997, δ = 0.02124631817294009, t1 = 0.13, t2 = 0.07

Iter 22 (CCD): obj = -2014.7711068922192, δ = 0.006852024230133178, t1 = 0.13, t2 = 0.07, t3 = 0.0

Iter 23 (PCA): obj = -2013.364499694923, δ = 0.01751278984407493, t1 = 0.14, t2 = 0.08

Iter 24 (CCD): obj = -2013.272116827203, δ = 0.00519505106419929, t1 = 0.14, t2 = 0.08, t3 = 0.0

Iter 25 (PCA): obj = -2012.2203135135917, δ = 0.01437710510110799, t1 = 0.16, t2 = 0.08

Iter 26 (CCD): obj = -2012.1665535491047, δ = 0.004311814579175299, t1 = 0.16, t2 = 0.09, t3 = 0.0

Iter 27 (PCA): obj = -2011.36065251226, δ = 0.013030483825913913, t1 = 0.17, t2 = 0.09

Iter 28 (CCD): obj = -2011.316241839796, δ = 0.004758224953293359, t1 = 0.17, t2 = 0.09, t3 = 0.0

Iter 29 (PCA): obj = -2010.6755812994675, δ = 0.012214869142719695, t1 = 0.18, t2 = 0.1

Iter 30 (CCD): obj = -2010.6304555381325, δ = 0.004699471885352681, t1 = 0.18, t2 = 0.1, t3 = 0.0

Iter 31 (PCA): obj = -2010.109003953247, δ = 0.011148466459874582, t1 = 0.19, t2 = 0.1

Iter 32 (CCD): obj = -2010.0630546152177, δ = 0.00434898367479869, t1 = 0.19, t2 = 0.11, t3 = 0.0

Iter 33 (PCA): obj = -2009.6396046854584, δ = 0.01176846059487951, t1 = 0.2, t2 = 0.11

Iter 34 (CCD): obj = -2009.5962123610066, δ = 0.0038403412033047846, t1 = 0.2, t2 = 0.11, t3 = 0.0

Iter 35 (PCA): obj = -2009.2580842715777, δ = 0.012349439636019931, t1 = 0.21, t2 = 0.12

Iter 36 (CCD): obj = -2009.220633510039, δ = 0.0032346781662313568, t1 = 0.21, t2 = 0.12, t3 = 0.0

Iter 37 (PCA): obj = -2008.9541714574673, δ = 0.011727588403854147, t1 = 0.22, t2 = 0.12

Iter 38 (CCD): obj = -2008.924647098155, δ = 0.002627646781621286, t1 = 0.22, t2 = 0.13, t3 = 0.0

Iter 39 (PCA): obj = -2008.715928586504, δ = 0.010373748213955905, t1 = 0.23, t2 = 0.13

Iter 40 (CCD): obj = -2008.6945814066446, δ = 0.0020566202195789894, t1 = 0.23, t2 = 0.13, t3 = 0.0

Iter 41 (PCA): obj = -2008.5311682808392, δ = 0.00866690367411388, t1 = 0.24, t2 = 0.14

Iter 42 (CCD): obj = -2008.5170453209819, δ = 0.0015646879963359072, t1 = 0.24, t2 = 0.14, t3 = 0.0

```

**Output**

+ `S.txt`: The full $p \times p$ matrix `S` satisfying $\frac{m+1}{m}\Sigma - S \succeq 0$ (but it is no longer strictly block-diagonal due to utilization of conditional independence)

+ `S11.txt`: The output of group knockoff optimization applied on the group-key variables. 

+ `groups.txt`: Vector of group membership

+ `group_reps.txt`: Selected representatives (i.e. group-key variables)

## What problem is the app solving

We solve the following problem for `S` (a block-diagonal matrix where the blocks corresponds to groups)

```math

\max_{\mathbf{S}} \log\det\left(\frac{m+1}{m}\mathbf{\Sigma} - \mathbf{S}\right) + m \log\det(\mathbf{S})     \quad \text{ subject to } 

    \begin{cases}

        \frac{m + 1}{m}\mathbf{\Sigma} - \mathbf{S} \succeq 0\\

        \mathbf{S} \succeq 0

    \end{cases}

```

where $m$ and $\Sigma$ are given. In this app, we define groups via average-linkage hierarchical clustering, choose group-key variables with threshold $c=0.5$, and use the maximum entropy (ME) solver. These parameters give a pretty good balance in terms of speed and power/FDR control. See [our paper](https://arxiv.org/abs/2310.15069) for more details.

## Compiling the app from scratch

In case you would like to compile the app yourself, first install [PackageCompiler.jl](https://github.com/JuliaLang/PackageCompiler.jl), and within Julia,

```julia

using PackageCompiler

src = "/Users/biona001/.julia/dev/groupknockoff"

des = "/Users/biona001/.julia/dev/groupknockoff/app_apple_silicon"

@time create_app(src, des, include_lazy_artifacts=true)

```