https://github.com/emfomy/pass

Particle Swarm Stepwise Algorithm (PaSS)
https://github.com/emfomy/pass
Last synced: 7 months ago
JSON representation
Particle Swarm Stepwise Algorithm (PaSS)
Host: GitHub
URL: https://github.com/emfomy/pass
Owner: emfomy
Created: 2016-10-21T08:11:51.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2019-11-30T08:31:31.000Z (almost 6 years ago)
Last Synced: 2025-01-29T03:35:51.911Z (8 months ago)
Language: C++
Size: 7.57 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # Particle Swarm Stepwise (PaSS) Algorithm

### Git

* https://github.com/emfomy/pass

### Documentation

* http://emfomy.github.io/pass

### Author

* Mu Yang 

## Programming

### Cluster

* [IBM® Cognitive Computing Cluster (CCC)](http://ccc.pok.ibm.com/index.html)

### Compiler

* [GCC 5.1.0](https://gcc.gnu.org/gcc-5/)

### Library

* [Intel® Math Kernel Library 11.2 Update 3](https://software.intel.com/en-us/intel-mkl)

* [Open MPI v1.8.4](http://www.open-mpi.org/)

## Directory Structure

| Name         | Detail                                             |

|--------------|----------------------------------------------------|

| `src`        | the source files                                   |

| `src/genlin` | the PaSS algorithm for general linear regression   |

| `src/genlog` | the PaSS algorithm for general logistic regression |

| `src/model`  | the model generators                               |

| `src/data`   | the data loaders                                   |

| `bin`        | the binary files                                   |

| `obj`        | the object files                                   |

| `dep`        | the dependency files                               |

| `mk`         | the Makefiles                                      |

| `sh`         | the shell scripts                                  |

| `dat`        | the data files                                     |

| `run`        | the working directory                              |

| `log`        | the log files                                      |

| `doc`        | the documentation settings                         |

| `html`       | the html documentation                             |

## Compiling

* Modify `Makefile.inc` to change main program and model.

* Modify `sh/pass.sh` for job submission.

### Environment Variables

The following environment variables should be set before compiling.

| Name      | Detail                         | Defalut Value            |

|-----------|--------------------------------|--------------------------|

| `MKLROOT` | the root of Intel MKL          |                          |

| `MKLINC`  | the include directories of MKL | `-I$MKLROOT/include`     |

| `MKLLIB`  | the library directories of MKL | `-L$MKLROOT/lib/intel64` |

| `MPIROOT` | the root of Open MPI           |                          |

| `MPIINC`  | the include directories of MPI | `-I$MPIROOT/include`     |

| `MPILIB`  | the library directories of MPI | `-L$MPIROOT/lib`         |

### Makefile

| Command      | Detail                |

|--------------|-----------------------|

| `make all`   | compile all binaries  |

| `make doc`   | compile documentation |

| `make run`   | run demo code         |

| `make clean` | clean the directory   |

| `make kill`  | kill all jobs         |

| `make killf` | force kill all jobs   |

| `make del`   | delete all jobs       |

## Usage

### The PaSS Algorithm for General Linear Regression

`./bin/genlin [options] ...`

| Option                                 | Detail                                              | Defalut Value |

|----------------------------------------|-----------------------------------------------------|---------------|

| `-f , --file `             | load data from ``                             | `genlin.dat`  |

| `-i ###, --iteration ###`              | the number of iterations                            | `1024`        |

| `-p ###, --particle ###`               | the number of particles per thread                  | `16`          |

| `-t ###, --test ###`                   | the number of tests                                 | `100`         |

| `--brief` (default)                    | switch to brief mode                                |               |

| `--verbose`                            | switch to verbose mode                              |               |

| `-h, --help`                           | display help messages                               |               |

|                                        |                                                     |               |

| `--prob     ` | the probabilities                                   |               |

| ``                                | the probabilities of forward step: best             | `0.1`         |

| ``                                | the probabilities of forward step: improve          | `0.8`         |

| ``                                | the probabilities of forward step: random           | `0.1`         |

| ``                                | the probabilities of backward step: improve         | `0.9`         |

| ``                                | the probabilities of backward step: random          | `0.1`         |

|                                        |                                                     |               |

| `--AIC`                                | Akaike information criterion                        |               |

| `--BIC`                                | Bayesian information criterion                      |               |

| `--EBIC=`                       | Extended Bayesian information criterion             |               |

| `` (optional)                   | the parameter of EBIC                               | `1.0`         |

| `--HDBIC` (default)                    | High-dimensional Bayesian information criterion     |               |

| `--HQC`                                | Hannan-Quinn information criterion                  |               |

| `--HDHQC`                              | High-dimensional Hannan-Quinn information criterion |               |

### The PaSS Algorithm for General Logistic Regression

`./bin/genlog [options] ...`

| Option                                 | Detail                                              | Defalut Value |

|----------------------------------------|-----------------------------------------------------|---------------|

| `-f , --file `             | load data from ``                             | `genlog.dat`  |

| `-i ###, --iteration ###`              | the number of iterations                            | `1024`        |

| `-p ###, --particle ###`               | the number of particles per thread                  | `16`          |

| `-t ###, --test ###`                   | the number of tests                                 | `100`         |

| `--brief` (default)                    | switch to brief mode                                |               |

| `--verbose`                            | switch to verbose mode                              |               |

| `-h, --help`                           | display help messages                               |               |

|                                        |                                                     |               |

| `--prob     ` | the probabilities                                   |               |

| ``                                | the probabilities of forward step: best             | `0.1`         |

| ``                                | the probabilities of forward step: local            | `0.8`         |

| ``                                | the probabilities of forward step: random           | `0.1`         |

| ``                                | the probabilities of backward step: local           | `0.9`         |

| ``                                | the probabilities of backward step: random          | `0.1`         |

|                                        |                                                     |               |

| `--AIC`                                | Akaike information criterion                        |               |

| `--BIC`                                | Bayesian information criterion                      |               |

| `--EBIC=`                       | Extended Bayesian information criterion             |               |

| `` (optional)                   | the parameter of EBIC                               | `1.0`         |

| `--HDBIC` (default)                    | High-dimensional Bayesian information criterion     |               |

| `--HQC`                                | Hannan-Quinn information criterion                  |               |

| `--HDHQC`                              | High-dimensional Hannan-Quinn information criterion |               |

### Create a General Linear Regression Data Using Ing and Lai's Method

`./bin/genlin_inglai [options] ...`

| Option                     | Detail                                              | Defalut Value           |

|----------------------------|-----------------------------------------------------|-------------------------|

| `-f , --file ` | save data into ``                             | `genlin.dat`            |

| `-m , --name ` | set the data name as ``                       | `General_Linear_IngLai` |

| `-b , --beta ` | set the effects as ``s                        |                         |

| `-n ###`                   | the number of statistical units                     | `400`                   |

| `-p ###`                   | the number of total effects                         | `4000`                  |

| `-r ###`                   | the number of given effects, ignored if `-b` is set | `10`                    |

| `-h, --help`               | display help messages                               |                         |

### Create a General Linear Regression Data Using Chen and Chen's Method

`./bin/genlin_chenchen [options] ...`

| Option                     | Detail                                               | Defalut Value             |

|----------------------------|------------------------------------------------------|---------------------------|

| `-f , --file ` | save data into ``                              | `genlin.dat`              |

| `-m , --name ` | set the data name as ``                        | `General_Linear_ChenChen` |

| `-b , --beta ` | set the effects as ``s                         |                           |

| `-n ###`                   | the number of statistical units                      | `200`                     |

| `-p ###`                   | the number of total effects                          | `50`                      |

| `-r ###`                   | the number of given effects, ignored if `-b` is set  | `8`                       |

| `-t ###, --type ###`       | the type of covariance structure (1~3)               | `3`                       |

| `-c ###, --cov ###`        | the covariance parameter                             | `0.2`                     |

| `-h, --help`               | display help messages                                |                           |

### Create a General Logistic Regression Data Using Ing and Lai's Method

`./bin/genlog_inglai [options] ...`

| Option                     | Detail                                              | Defalut Value             |

|----------------------------|-----------------------------------------------------|---------------------------|

| `-f , --file ` | save data into ``                             | `genlog.dat`              |

| `-m , --name ` | set the data name as ``                       | `General_Logistic_IngLai` |

| `-b , --beta ` | set the effects as ``s                        |                           |

| `-n ###`                   | the number of statistical units                     | `400`                     |

| `-p ###`                   | the number of total effects                         | `4000`                    |

| `-r ###`                   | the number of given effects, ignored if `-b` is set | `10`                      |

| `-h, --help`               | display help messages                               |                           |

## Data Structure

### .dat files

```

# 1st  line:  data name

# 2st  line:  n p

# 3rd  line:  * J

# rest lines: Y X

#

# X: float matrix, n by p, the regressors

# Y: float vector, n by 1, the regressand

# J: bool  vector, 1 by p, the chosen indices

#

 


*     J[0]     J[1]     J[2]     ...

Y[0]  X[0][0]  X[0][1]  X[0][2]  ...

Y[1]  X[1][0]  X[1][1]  X[1][2]  ...

Y[2]  X[2][0]  X[2][1]  X[2][2]  ...

...

```

Note that the comment lines should has less than 4096 characters.

## Reference

* [Mu Yang, Ray-Bing Chen, I-Hsin Chung, Weichung Wang (2016). Particle Swarm Stepwise Algorithm (PaSS) on Multicore Hybrid CPU-GPU Clusters.](https://doi.org/10.1109/CIT.2016.101)

* [Jiahua Chen, Zehua Chen (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95(3), 759–771.](http://www.stat.ubc.ca/~jhchen/paper/Bio08.pdf)

* [Zhen Liu, Meng Liu (2011). Logistic Regression Parameter Estimation Based on Parallel Matrix Computation. In Q. Zhou (Ed.), Communications in Computer and Information Science (Vol. 164, pp. 268–275). Berlin, Heidelberg: Springer Berlin Heidelberg.](http://doi.org/10.1007/978-3-642-24999-0_38)

* [Sameer Singh, Jeremy Kubica, Scott Larsen, Daria Sorokina (2013). Parallel Large Scale Feature Selection for Logistic Regression (pp. 1172–1183). Philadelphia, PA: Society for Industrial and Applied Mathematics.](http://doi.org/10.1137/1.9781611972795.100)

* [Adrian Barbu, Yiyuan She, Liangjing Ding, Gary Gramajo (2014). Feature Selection with Annealing for Big Data Learning.](https://arxiv.org/pdf/1310.2880)

* [Ching-Kang Ing, Tze Leung Lai (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models.](http://doi.org/10.5705/ss.2010.081)

* [Hung Hung, Yu-Tin Lin, Pengwen Chen, Chen-Chien Wang, Su-Yun Huang, Jung-Ying Tzeng (2013). Detection of Gene-Gene Interactions using Multistage Sparse and Low-Rank Regression.](http://arxiv.org/abs/1304.3769)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/emfomy/pass

Awesome Lists containing this project

README