https://github.com/broadinstitute/g2papi

Python Client Library for the G2P Portal API
https://github.com/broadinstitute/g2papi

bioinformatics bioinformatics-tool bioinformatics-visualization isoforms protein-structure proteins sequence-alignment variant-analysis

Last synced: about 1 month ago
JSON representation

Python Client Library for the G2P Portal API

Host: GitHub
URL: https://github.com/broadinstitute/g2papi
Owner: broadinstitute
License: mit
Created: 2024-03-26T22:10:11.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-08-14T20:06:41.000Z (about 2 months ago)
Last Synced: 2025-08-14T22:10:39.184Z (about 2 months ago)
Topics: bioinformatics, bioinformatics-tool, bioinformatics-visualization, isoforms, protein-structure, proteins, sequence-alignment, variant-analysis
Language: Python
Homepage: https://g2p.broadinstitute.org
Size: 4.35 MB
Stars: 12
Watchers: 7
Forks: 2
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # g2papi

`g2papi` is a Python library and command-line tool designed to interact with the G2P API provided by the Broad Institute. It allows users to retrieve mappings between protein isoforms, transcripts, and PDB structures for a gene, as well as protein feature tables for a gene.

Access the swagger definition and endpoints at: https://g2p.broadinstitute.org/api-docs/

## Citation

If you use g2papi in your research, please cite:

Kwon S, et al. Genomics 2 Proteins portal: A resource and discovery tool for linking genetic screening outputs to protein sequences and structures. doi: https://doi.org/10.1101/2024.01.02.573913.

## Installation

First, ensure that you have Python and pip installed on your system.

### Installing with PyPi

```

pip install g2papi

```

### Installing from source

Clone this repository to your local machine and navigate into the cloned directory:

```

git clone https://github.com/broadinstitute/g2papi.git

cd g2papi

```

To install `g2papi`, run:

```

pip install .

```

This will install the `g2papi` package and its dependencies.

## Usage

### As a Python Library

You can import `g2papi` in your Python scripts to retrieve data from the G2P API directly. Here are some examples:

Calling the G2P3D API to get the Gene-Transcript-Protein Isoform-Structure mapping

```python

import g2papi

# Get gene-transcript-protein isoform-protein structure map as a pandas dataframe

gene_transcript_protein_isoform_struct = g2papi.get_gene_transcript_protein_isoform_structure('BRCA1', 'P38398')

print(gene_transcript_protein_isoform_struct[['UniProt Isoform','Ensembl Transcript Id', 'RefSeq mRNA Id']].head())

```

Output:

```

  UniProt Isoform Ensembl Transcript Id RefSeq mRNA Id

0     P38398-1(*)    ENST00000357654(*)   NM_001407611

1     P38398-1(*)    ENST00000357654(*)   NM_001407616

2     P38398-1(*)    ENST00000357654(*)   NM_001407624

3     P38398-1(*)    ENST00000357654(*)   NM_001407637

4     P38398-1(*)    ENST00000357654(*)   NM_001407641

```

Getting Protein Features

```python

import g2papi

# Get protein features as a pandas dataframe

protein_features = g2papi.get_protein_features('BRCA1', 'P38398')

protein_features.fillna('-', inplace=True)

print(protein_features[[

    'residueId', 'AA',

    'AlphaFold confidence (pLDDT)', 

    'Active site (UniProt)'

]].head())

```

Output:

```

   residueId AA  AlphaFold confidence (pLDDT) Active site (UniProt)

0          1  M                            41.59                     -

1          2  D                            45.81                     -

2          3  L                            48.11                     -

3          4  S                            63.99                     -

4          5  A                            61.73                     -

```

### As a Command-Line Tool

g2papi can also be used as a command-line tool to retrieve information directly to your terminal or output files.

Getting Gene-Transcript-Protein Isoform-Structure Map with the G2P3D API

```

g2papi get-gene-transcript-protein-isoform-structure-map --geneName BRCA1 --uniprotId P38398

```

Getting Protein Features

```

g2papi get-protein-features --geneName BRCA1 --uniprotId P38398

```

The above commands will print the results to your terminal. If you wish to save the output to a file, you can redirect the output:

```

g2papi get-gene-transcript-protein-isoform-structure-map --geneName BRCA1 --uniprotId P38398 > transcript_map.tsv

g2papi get-protein-features --geneName BRCA1 --uniprotId P38398 > protein_features.tsv

```

## System Requirements

The package was developed and tested on Python 3.9.12, and is designed to run on a computer that can run Python3 and has a working internet connection. The library was installed and tested on Ubuntu Linux 20.04 and Mac OSX Ventura, 13.5.1. 

## Set up time (total time to set up and run: approximately 5 minutes)

Installation and execution steps run in approximately real time. 3 installation steps each run in less than 5 seconds, and execution time takes less than 5 seconds for genes with under 2000 residues.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/broadinstitute/g2papi

Awesome Lists containing this project

README