https://github.com/emfomy/pass
Particle Swarm Stepwise Algorithm (PaSS)
https://github.com/emfomy/pass
Last synced: 7 months ago
JSON representation
Particle Swarm Stepwise Algorithm (PaSS)
- Host: GitHub
- URL: https://github.com/emfomy/pass
- Owner: emfomy
- Created: 2016-10-21T08:11:51.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2019-11-30T08:31:31.000Z (almost 6 years ago)
- Last Synced: 2025-01-29T03:35:51.911Z (8 months ago)
- Language: C++
- Size: 7.57 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Particle Swarm Stepwise (PaSS) Algorithm
### Git
* https://github.com/emfomy/pass### Documentation
* http://emfomy.github.io/pass### Author
* Mu Yang## Programming
### Cluster
* [IBM® Cognitive Computing Cluster (CCC)](http://ccc.pok.ibm.com/index.html)### Compiler
* [GCC 5.1.0](https://gcc.gnu.org/gcc-5/)### Library
* [Intel® Math Kernel Library 11.2 Update 3](https://software.intel.com/en-us/intel-mkl)
* [Open MPI v1.8.4](http://www.open-mpi.org/)## Directory Structure
| Name | Detail |
|--------------|----------------------------------------------------|
| `src` | the source files |
| `src/genlin` | the PaSS algorithm for general linear regression |
| `src/genlog` | the PaSS algorithm for general logistic regression |
| `src/model` | the model generators |
| `src/data` | the data loaders |
| `bin` | the binary files |
| `obj` | the object files |
| `dep` | the dependency files |
| `mk` | the Makefiles |
| `sh` | the shell scripts |
| `dat` | the data files |
| `run` | the working directory |
| `log` | the log files |
| `doc` | the documentation settings |
| `html` | the html documentation |## Compiling
* Modify `Makefile.inc` to change main program and model.
* Modify `sh/pass.sh` for job submission.### Environment Variables
The following environment variables should be set before compiling.
| Name | Detail | Defalut Value |
|-----------|--------------------------------|--------------------------|
| `MKLROOT` | the root of Intel MKL | |
| `MKLINC` | the include directories of MKL | `-I$MKLROOT/include` |
| `MKLLIB` | the library directories of MKL | `-L$MKLROOT/lib/intel64` |
| `MPIROOT` | the root of Open MPI | |
| `MPIINC` | the include directories of MPI | `-I$MPIROOT/include` |
| `MPILIB` | the library directories of MPI | `-L$MPIROOT/lib` |### Makefile
| Command | Detail |
|--------------|-----------------------|
| `make all` | compile all binaries |
| `make doc` | compile documentation |
| `make run` | run demo code |
| `make clean` | clean the directory |
| `make kill` | kill all jobs |
| `make killf` | force kill all jobs |
| `make del` | delete all jobs |## Usage
### The PaSS Algorithm for General Linear Regression
`./bin/genlin [options] ...`
| Option | Detail | Defalut Value |
|----------------------------------------|-----------------------------------------------------|---------------|
| `-f , --file ` | load data from `` | `genlin.dat` |
| `-i ###, --iteration ###` | the number of iterations | `1024` |
| `-p ###, --particle ###` | the number of particles per thread | `16` |
| `-t ###, --test ###` | the number of tests | `100` |
| `--brief` (default) | switch to brief mode | |
| `--verbose` | switch to verbose mode | |
| `-h, --help` | display help messages | |
| | | |
| `--prob ` | the probabilities | |
| `` | the probabilities of forward step: best | `0.1` |
| `` | the probabilities of forward step: improve | `0.8` |
| `` | the probabilities of forward step: random | `0.1` |
| `` | the probabilities of backward step: improve | `0.9` |
| `` | the probabilities of backward step: random | `0.1` |
| | | |
| `--AIC` | Akaike information criterion | |
| `--BIC` | Bayesian information criterion | |
| `--EBIC=` | Extended Bayesian information criterion | |
| `` (optional) | the parameter of EBIC | `1.0` |
| `--HDBIC` (default) | High-dimensional Bayesian information criterion | |
| `--HQC` | Hannan-Quinn information criterion | |
| `--HDHQC` | High-dimensional Hannan-Quinn information criterion | |### The PaSS Algorithm for General Logistic Regression
`./bin/genlog [options] ...`
| Option | Detail | Defalut Value |
|----------------------------------------|-----------------------------------------------------|---------------|
| `-f , --file ` | load data from `` | `genlog.dat` |
| `-i ###, --iteration ###` | the number of iterations | `1024` |
| `-p ###, --particle ###` | the number of particles per thread | `16` |
| `-t ###, --test ###` | the number of tests | `100` |
| `--brief` (default) | switch to brief mode | |
| `--verbose` | switch to verbose mode | |
| `-h, --help` | display help messages | |
| | | |
| `--prob ` | the probabilities | |
| `` | the probabilities of forward step: best | `0.1` |
| `` | the probabilities of forward step: local | `0.8` |
| `` | the probabilities of forward step: random | `0.1` |
| `` | the probabilities of backward step: local | `0.9` |
| `` | the probabilities of backward step: random | `0.1` |
| | | |
| `--AIC` | Akaike information criterion | |
| `--BIC` | Bayesian information criterion | |
| `--EBIC=` | Extended Bayesian information criterion | |
| `` (optional) | the parameter of EBIC | `1.0` |
| `--HDBIC` (default) | High-dimensional Bayesian information criterion | |
| `--HQC` | Hannan-Quinn information criterion | |
| `--HDHQC` | High-dimensional Hannan-Quinn information criterion | |### Create a General Linear Regression Data Using Ing and Lai's Method
`./bin/genlin_inglai [options] ...`
| Option | Detail | Defalut Value |
|----------------------------|-----------------------------------------------------|-------------------------|
| `-f , --file ` | save data into `` | `genlin.dat` |
| `-m , --name ` | set the data name as `` | `General_Linear_IngLai` |
| `-b , --beta ` | set the effects as ``s | |
| `-n ###` | the number of statistical units | `400` |
| `-p ###` | the number of total effects | `4000` |
| `-r ###` | the number of given effects, ignored if `-b` is set | `10` |
| `-h, --help` | display help messages | |### Create a General Linear Regression Data Using Chen and Chen's Method
`./bin/genlin_chenchen [options] ...`
| Option | Detail | Defalut Value |
|----------------------------|------------------------------------------------------|---------------------------|
| `-f , --file ` | save data into `` | `genlin.dat` |
| `-m , --name ` | set the data name as `` | `General_Linear_ChenChen` |
| `-b , --beta ` | set the effects as ``s | |
| `-n ###` | the number of statistical units | `200` |
| `-p ###` | the number of total effects | `50` |
| `-r ###` | the number of given effects, ignored if `-b` is set | `8` |
| `-t ###, --type ###` | the type of covariance structure (1~3) | `3` |
| `-c ###, --cov ###` | the covariance parameter | `0.2` |
| `-h, --help` | display help messages | |### Create a General Logistic Regression Data Using Ing and Lai's Method
`./bin/genlog_inglai [options] ...`
| Option | Detail | Defalut Value |
|----------------------------|-----------------------------------------------------|---------------------------|
| `-f , --file ` | save data into `` | `genlog.dat` |
| `-m , --name ` | set the data name as `` | `General_Logistic_IngLai` |
| `-b , --beta ` | set the effects as ``s | |
| `-n ###` | the number of statistical units | `400` |
| `-p ###` | the number of total effects | `4000` |
| `-r ###` | the number of given effects, ignored if `-b` is set | `10` |
| `-h, --help` | display help messages | |## Data Structure
### .dat files
```
# 1st line: data name
# 2st line: n p
# 3rd line: * J
# rest lines: Y X
#
# X: float matrix, n by p, the regressors
# Y: float vector, n by 1, the regressand
# J: bool vector, 1 by p, the chosen indices
#
* J[0] J[1] J[2] ...
Y[0] X[0][0] X[0][1] X[0][2] ...
Y[1] X[1][0] X[1][1] X[1][2] ...
Y[2] X[2][0] X[2][1] X[2][2] ...
...
```Note that the comment lines should has less than 4096 characters.
## Reference
* [Mu Yang, Ray-Bing Chen, I-Hsin Chung, Weichung Wang (2016). Particle Swarm Stepwise Algorithm (PaSS) on Multicore Hybrid CPU-GPU Clusters.](https://doi.org/10.1109/CIT.2016.101)
* [Jiahua Chen, Zehua Chen (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95(3), 759–771.](http://www.stat.ubc.ca/~jhchen/paper/Bio08.pdf)
* [Zhen Liu, Meng Liu (2011). Logistic Regression Parameter Estimation Based on Parallel Matrix Computation. In Q. Zhou (Ed.), Communications in Computer and Information Science (Vol. 164, pp. 268–275). Berlin, Heidelberg: Springer Berlin Heidelberg.](http://doi.org/10.1007/978-3-642-24999-0_38)
* [Sameer Singh, Jeremy Kubica, Scott Larsen, Daria Sorokina (2013). Parallel Large Scale Feature Selection for Logistic Regression (pp. 1172–1183). Philadelphia, PA: Society for Industrial and Applied Mathematics.](http://doi.org/10.1137/1.9781611972795.100)
* [Adrian Barbu, Yiyuan She, Liangjing Ding, Gary Gramajo (2014). Feature Selection with Annealing for Big Data Learning.](https://arxiv.org/pdf/1310.2880)
* [Ching-Kang Ing, Tze Leung Lai (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models.](http://doi.org/10.5705/ss.2010.081)
* [Hung Hung, Yu-Tin Lin, Pengwen Chen, Chen-Chien Wang, Su-Yun Huang, Jung-Ying Tzeng (2013). Detection of Gene-Gene Interactions using Multistage Sparse and Low-Rank Regression.](http://arxiv.org/abs/1304.3769)