Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/borgwardtlab/sinimin
Significant Network Interval Mining
https://github.com/borgwardtlab/sinimin
interactions networks statistical-testing
Last synced: about 1 month ago
JSON representation
Significant Network Interval Mining
- Host: GitHub
- URL: https://github.com/borgwardtlab/sinimin
- Owner: BorgwardtLab
- License: gpl-3.0
- Created: 2019-07-04T14:52:57.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-08-21T09:29:47.000Z (over 4 years ago)
- Last Synced: 2023-10-20T18:23:27.248Z (about 1 year ago)
- Topics: interactions, networks, statistical-testing
- Language: C++
- Homepage:
- Size: 2.94 MB
- Stars: 5
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SiNIMin
This repository contains the code for the _Significant Network Interval Mining_ approach, short SiNIMin, and its permutation-testing based counterpart SiNIMin-WY. The methods are described in _Network-guided detection of candidate intervals that exhibit genetic heterogeneity_ (under review).
## Data formatting
Assuming we are given a data set of n samples with d binary features. An example of all files can be found in the folder _examples_.
The method requires the following input:
1. __data file__ with d rows corresponding to features (important: the features are assumed to follow a natural ordering, such as genetic variants by their position on the DNA ) and n columns, corresponding to n samples. The values are supposed to be binary.
2. __label file__ with n rows, that contains the binary phenotype of the n samples. Samples are assumed to be in same ordering as in data file.
3. __feature file__ d rows, that contains the name of the d features. Samples are assumed to be in same ordering as in data file.
4. __edge file__, where each row contains the names of the nodes adjacent to the edge in tab-separated format.
5. __mapping file__, linking the features to the nodes in the network. Each row contains the name of the node followed by a white-space separated list of feature names.
6. __target FWER__, the target family wise error rate, default: 0.05.
7. __covariate file__ (optional) with n rows. Each row contains the index of the class of the corresponding sample. Samples are assumed to be in same ordering as in data file.
## Usage information
### Compilation (manual)
Note that the package relies on the Eigen-library. This library has to
be linked upon re-compilation of the method. OpenMP is used for
parallelization of permutation testing.We provide a `Makefile` that may have to be adjusted for the compilation
to work. You can compile the program using the following steps:$ cd SiNIMin/C
$ makeIf the compile step does not work, please try adjusting the compiler
settings in the `Makefile` or use another compilation method.### Compilation (CMake)
Another way to compile the package involves compiling it using `cmake`.
For Mac OS X, we recommend installing the following packages using
[Homebrew](https://brew.sh):$ brew install cmake gcc eigen
After cloning this repository, the following steps are required to
compile the package:$ cd SiNIMin/C
$ mkdir build
$ cd build
$ cmake -DCMAKE_CXX_COMPILER=g++-9 ../
$ makeOptionally, the compiler version can also be changed if a more recent
compiler is present. Compiling the package with the Apple version of the
`clang` compiler (which is sometimes confusingly also present as `g++`
in the system) currently does *not* work.Having compiled the package, it can optionally be installed by issuing
$ make install
from the `build` directory created above.
### Installation using Homebrew (Mac OS X )
For Mac OS X, we recommend installing the package using the [Homebrew](https://brew.sh)
package manager:$ brew install --cc=gcc BorgwardtLab/mlcb/sinimin
Afterwards, the package can be automatically used on the command-line.
### Example usage
Examples on how to execute the methods SiNIMin and SiNIMin-WY can be found in _examples/runs_ with corresponding data in _examples/data_.
The executable for both methods is called `sinimin` and can be found in _SiNIMin/compiled_.```bash
./sinimin \
-i "${data_file}" \
-l "${labels_file}" \
-c "${covariate_file}" \
-m "${mapping_file}" \
-e "${edge_file}" \
-s "${feature_file}" \
-f 0.05 \
-o "${output_prefix}" \
```
There exist additional flags that can be set, namely:
```bash
-d ${maxlen} \
-n ${number_threads} \
-p ${number_permutations}
```The `-d` flag toggles the maximum length of intervals to be tested. For example, if `d` is set to 1, only interactions between single features are tested.
The `-p` flag toggles the number of permutations. If this flag is set, SiNIMin-WY is executed, i.e. Westfall-Young permutations are used to estimate family-wise error rates.
The `-n` flag sets the number of processes. This parameter only results in a speed-up for permutation testing. `sinimin` uses OMP to parallelize.# Help
If you have questions concerning SiNIMin or you encounter problems when
trying to build the tool under your own system, please open an issue in
[the issue tracker](https://github.com/BorgwardtLab/SiNIMin/issues). Try to
describe the issue in sufficient detail in order to make it possible for
us to help you.# Contact
[email protected]