https://github.com/robaina/pathwayenrichment
Tools to perform a permutation-based cell pathway analysis
https://github.com/robaina/pathwayenrichment
Last synced: 4 months ago
JSON representation
Tools to perform a permutation-based cell pathway analysis
- Host: GitHub
- URL: https://github.com/robaina/pathwayenrichment
- Owner: Robaina
- License: cc-by-4.0
- Created: 2021-08-14T12:56:40.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2023-01-16T12:23:06.000Z (over 3 years ago)
- Last Synced: 2025-04-06T10:38:02.968Z (about 1 year ago)
- Language: Python
- Size: 90.8 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Permutation-based pathway enrichment analysis
Python tools to perform a permutation-based pathway enrichment analysis. Currently supporting KEGG pathways.
# Usage
```python
from pathwayenrichment.representation import ClusterPermutator
from pathwayenrichment.databaseparser import KEGGPathwayParser
from pathwayenrichment.utils import randomPartition
```
First, let's download the KEGG database for Dokdonia, a marine bacterium. To this end, we employ KEGG's entry code for Dokdonia (dok). We will then parse the database to obtain a list of genes and associated cellular pathways and systems.
```python
KEGGparser = KEGGPathwayParser.fromKEGGidentifier('dok', only_curated_pathways=True)
gene_pathways, gene_systems = KEGGparser.getGenePathways()
system_pathways = KEGGparser.getSystemPathways()
gene_info = KEGGparser.getGeneInfoFromKEGGorthology()
gene_list = list(gene_pathways.keys())
print(f'There are a total of {len(gene_list)} genes')
```
There are a total of 786 genes
Now, we simulate a set of gene clusters to perform a pathway enrichment analysis on them. To this end, we will randomly partition the set of genes into clusters.
```python
genes_under_study = gene_list[:300]
clusters = dict(zip(
['A', 'B', 'C', 'D'],
randomPartition(gene_list, bin_sizes=[75, 25, 150, 50])
))
```
Now we are ready to instantiate a ClusterPermutator to run the enrichment analysis. We will permute the total set of genes to form new random clusters 10000 times, our sample size to compute the sample p-value.
```python
permutator = ClusterPermutator(clusters, gene_pathways, system_pathways)
res = permutator.sampleClusterPermutationSpace(sample_size=10000, n_processes=4)
```
Finished permutation sampling
```python
# Here are the first 10 pathways with lowest sample p-value
{k:v for k,v in list(res['pathway']['A'].items())[:10]}
```
{'03018 RNA degradation [PATH:dok03018]': (0.2777777777777778, 0.0484),
'00020 Citrate cycle (TCA cycle) [PATH:dok00020]': (0.18181818181818182,
0.0691),
'02020 Two-component system [PATH:dok02020]': (0.2, 0.1527),
'00541 O-Antigen nucleotide sugar biosynthesis [PATH:dok00541]': (0.19047619047619047,
0.1641),
'03060 Protein export [PATH:dok03060]': (0.2, 0.1683),
'02024 Quorum sensing [PATH:dok02024]': (0.14814814814814814, 0.218),
'00520 Amino sugar and nucleotide sugar metabolism [PATH:dok00520]': (0.14285714285714285,
0.2211),
'02010 ABC transporters [PATH:dok02010]': (0.15, 0.2422),
'00040 Pentose and glucuronate interconversions [PATH:dok00040]': (0.3333333333333333,
0.25),
'00053 Ascorbate and aldarate metabolism [PATH:dok00053]': (0.2, 0.25)}
Here, we see the 10 pathways with lowest sample p-values within cluster _A_.