https://github.com/antonkulaga/genotations
A small library to work with ensembl and other annotations in python
https://github.com/antonkulaga/genotations
annotations bioinformatics ensembl genes genomics gtf polars transcriptomics
Last synced: 28 days ago
JSON representation
A small library to work with ensembl and other annotations in python
- Host: GitHub
- URL: https://github.com/antonkulaga/genotations
- Owner: antonkulaga
- Created: 2022-10-27T12:39:01.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-09-06T00:03:50.000Z (about 2 years ago)
- Last Synced: 2025-08-27T13:52:25.365Z (about 1 month ago)
- Topics: annotations, bioinformatics, ensembl, genes, genomics, gtf, polars, transcriptomics
- Language: Python
- Homepage:
- Size: 3.65 MB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Genotations
===========Python library to work with genomes and annotations, mostly Ensembl genomes. Also supports visualization of transcripts/gene features and primer selection.
As pandas and polars are libraries of everyday use for many python developers this library focus on annotations representation in a dataframe way.The library allows:
* downloading Ensembl annotations and genomes (uses genomepy under the hood)
* working with genomic annotations like with polars dataframes
* getting sequences for selected genes
* visualizing the genes features
* designing primers for selected transcripts with Primer3 python wrapper
Usage
=====Install with pip:
```bash
pip install genotations
```
In some cases you may also need to install ucsc annotation tools, you can add them to your micromamba/conda environment as they are installed from bioconda channel.
Here how it may look in your environment file:
```yaml
name: genotations
channels:
- conda-forge
- BjornFJohansson
- bioconda
- defaults
dependencies:
- python=3.10
- ucsc-bedtogenepred
- ucsc-genepredtobed
- ucsc-genepredtogtf
- ucsc-gff3togenepred
- ucsc-gtftogenepred
- pip
- pip:
- genotations
```Now you can start using it, for example:
```python
from genotations import ensembl
human = ensembl.human # getting human genome
mouse = ensembl.mouse # getting mosue genome
mouse.annotations.exons().annotations_df # getting exons as DataFrame
mouse.annotations.protein_coding().exons().annotations_df # getting exons of protein coding genes
mouse.annotations.transcript_gene_names_df # getting transcript gene names
mouse.annotations.with_gene_name_contains("Foxo1").protein_coding().transcripts() #getting only coding Foxo1 transcripts
mouse.annotations.with_gene_name_contains("Foxo1").genes_visual(mouse.genome)[0].plot() # plotting features of the Foxo1 gene
cow_assemblies = ensembl.search_assemblies("Bos taurus") # you can also search genomes by species name if it exists in Ensembl
cow1 = ensembl.SpeciesInfo("Cow", cow_assemblies[-1][0]) # selecting one of several cow assemblies
cow1.annotations.annotations_df # getting annotations as dataframe
```You can also use the library to annotate existing gene expression data with gene and transcript symbols and features.
For example
```python
from genotations.quantification import *
from genotations import ensembl
base = "."
examples = base / "examples"
data = examples / "data"
expressions = pl.read_parquet(str(data / "PRJNA543661_transcripts.parquet"))
with_expressions_summaries(expressions, min_avg_value = 1)
expressions_ext = ensembl.mouse.annotations.extend_with_annotations_and_sequences(expressions, ensembl.mouse.genome) # extend expression data with annotations and sequences
```For more examples, check [example notebook](https://github.com/antonkulaga/genotations/blob/main/examples/explore_mouse.ipynb) to see the usage and API
Working with the library code
=====Use micromamba (or conda) and environment.yaml to install the dependencies
```
micromamba create -f environment.yaml
micromamba activate genotations
```