Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/weinstockj/llcb

Contains the LLCB method from Weinstock* and Arce* et al., 2023
https://github.com/weinstockj/llcb

causal-inference julia-language

Last synced: about 1 month ago
JSON representation

Contains the LLCB method from Weinstock* and Arce* et al., 2023

Host: GitHub
URL: https://github.com/weinstockj/llcb
Owner: weinstockj
License: gpl-3.0
Created: 2023-09-17T13:17:14.000Z (over 1 year ago)
Default Branch: public
Last Pushed: 2024-07-24T16:36:54.000Z (5 months ago)
Last Synced: 2024-10-13T22:25:21.867Z (2 months ago)
Topics: causal-inference, julia-language
Language: Julia
Homepage:
Size: 438 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # InferCausalGraph

[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://weinstockj.github.io/LLCB/dev)

[![CI](https://github.com/weinstockj/LLCB/actions/workflows/ci.yml/badge.svg?branch=dev)](https://github.com/weinstockj/LLCB/actions/workflows/ci.yml)

This repo contains the implementation of the 'LLCB' method, which is described

in Weinstock & Arce et al. 

## Quick-start

Assuming that you have expression data properly formatted into a dataframe called `expression`:

```julia

using InferCausalGraph

const graph = interventionGraph(expression)

const model_pars = get_model_params(false, 1.0, 0.01)

const sampling_pars = get_sampling_params(false)

model = fit_cyclic_model(graph, false, model_pars, sampling_pars)

edges = get_cyclic_matrices(graph, false)[3]

parsed_chain = parse_cyclic_chain(model[1], model[2], edges)

```

## Installation

Currently using Julia 1.9.4. Please see `mannifest.toml` for more details on 

which Julia modules are required. Later versions of Julia are likely to work

but have not been tested. 

To install, you can clone this repo, activate the project, modify `LOAD_PATH` (as needed)

and run:

```julia

using Pkg

Pkg.instantiate() # only need to this run once to setup the package

using InferCausalGraph # on my machine, takes 13.68 seconds in Julia 1.9.4

```

## Input data formatting

The input variable `expression` should be a dataframe where the rows

indicate the sample. Coluns should include the identity of the 

sample `donor`, the intervention performed `intervention`, and then additional

column for each readout gene, where the column names are the gene symbols. 

Overall, the dataframe should have one row per sample, and P + 2 columns, where

P is the number of readout genes. 

For an example of what this data might look like, run the following example 

from the test script `test/extended_test.jl`

```julia

include("test/extended_test.jl")

sim_cyclic_expression(cyclic_chain_graph(), 3, 50, true)

```

This will simulate expression data for a 5 gene graph. 

Each KO has been performed in 3 'donors' with 50

replicates (yes, a somewhat optimistic setting). Control data (no KO's) are also included. 

The first ten rows of the data looks like:

```

 Row │ gene_1   gene_2   gene_3   gene_4   gene_5   intervention  donor

     │ Float64  Float64  Float64  Float64  Float64  String        String

─────┼───────────────────────────────────────────────────────────────────

   1 │ 0.0      3.36514  4.67425  4.92861  5.44028  gene_1        1

   2 │ 0.0      2.63952  3.93181  4.63513  4.89527  gene_1        1

   3 │ 0.0      2.9394   4.14463  4.9676   5.48139  gene_1        1

   4 │ 0.0      3.23764  4.36584  4.67192  5.13055  gene_1        1

   5 │ 0.0      2.86171  4.43841  4.50357  4.76188  gene_1        1

   6 │ 0.0      2.66913  3.957    4.81993  5.05754  gene_1        1

   7 │ 0.0      2.58999  3.68238  4.23696  5.25393  gene_1        1

   8 │ 0.0      2.2197   3.82162  4.67875  4.75546  gene_1        1

   9 │ 0.0      2.89671  4.33153  4.52459  4.74094  gene_1        1

  10 │ 0.0      2.72936  4.18565  4.70949  4.98802  gene_1        1

```

## Input data notes

In our manuscript, we used bulk RNA-seq data as the read-out. Our processing

pipeline is available [here](https://github.com/weinstockj/RNAseq-perturbation-CD4-pipeline). 

Please note that we analyzed the DESeq2 variance stabilizing transform (`vst`) function to 

normalize the data. We also recommend estimating expression PCs and regressing out those

which correspond to unwanted global sources of variation. 

## Contact

Please contact Josh Weinstock  with questions. 

## Citation

If you use this method, please cite following manuscript:

https://www.biorxiv.org/content/10.1101/2023.09.17.557749v2