Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gwgundersen/dpcca
Code for the paper "End-to-end training of deep probabilistic CCA on paired biomedical observations".
https://github.com/gwgundersen/dpcca
Last synced: 3 months ago
JSON representation
Code for the paper "End-to-end training of deep probabilistic CCA on paired biomedical observations".
- Host: GitHub
- URL: https://github.com/gwgundersen/dpcca
- Owner: gwgundersen
- License: other
- Created: 2019-06-05T20:24:14.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2021-05-18T14:27:07.000Z (over 3 years ago)
- Last Synced: 2024-08-02T20:43:40.752Z (6 months ago)
- Language: Python
- Size: 40 KB
- Stars: 24
- Watchers: 2
- Forks: 5
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-multi-omics - DPCCA - Gundersen - Deep Probabilistic CCA - [paper](http://proceedings.mlr.press/v115/gundersen20a.html) (Software packages and methods / Multi-omics correlation or factor analysis)
README
# Deep probabilistic CCA
Code for [End-to-end training of deep probabilistic CCA on paired biomedical observations](http://auai.org/uai2019/proceedings/papers/340.pdf).
### Abstract
Medical pathology images are visually evaluated by experts for disease diagnosis, but the connection between image features and the state of the cells in an image is typically unknown. To understand this relationship, we develop a multimodal modeling and inference framework that estimates shared latent structure of joint gene expression levels and medical image features. Our method is built around probabilistic canonical correlation analysis (PCCA), which is fit to image embeddings that are learned using convolutional neural networks and linear embeddings of paired gene expression data. Using a differentiable take on the EM algorithm, we train the model end-to-end so that the PCCA and neural network parameters are estimated simultaneously. We demonstrate the utility of this method in constructing image features that are predictive of gene expression levels on simulated data and the Genotype-Tissue Expression data. We demonstrate that the latent variables are interpretable by disentangling the latent subspace through shared and modality-specific views.
### Installation
While all the dependencies used for the paper are listed in [environment.yml](https://github.com/gwgundersen/dpcca/blob/master/environment.yml), these are operating system-specific; and some library versions (e.g. `libcxx=4.0.1`) will not be available across systems. However, you can build everything you need with
```bash
python 3.7
pytorch 1.0.1
torchvision 0.2.2
numpy 1.16.2
scikit-learn 0.20.2
scipy 1.2.1
matplotlib 3.0.2
```You'll need `nose2` to run the unit tests. Create and activate a conda environment,
```bash
conda create -n dpcca python=3.7
conda activate dpcca
```and then install these dependencies, e.g. `conda install pytorch=1.0.1 -c pytorch`.
```bash
[Optional] Run the unit tests. Note that these occasionally fail due to numerical tolerances:```bash
bash run_tests.sh
```### Reproducing multimodal MNIST results
Generate the multimodal MNIST data set.
```bash
python -m data.mnist.generate
```Create directories for experiments:
```bash
mkdir experiments experiments/example
```Run the code:
```python
python traindpcca.py
```