Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dillondaudert/proteindatasets
Creating and manipulating various protein sequence-structure datasets using Python, Julia, and other tools.
https://github.com/dillondaudert/proteindatasets
bioinformatics biopython blast dataset dssp fasta julia jupyter jupyter-notebook pandas protein psiblast python3 secondary structure tensorflow uniref50
Last synced: about 1 month ago
JSON representation
Creating and manipulating various protein sequence-structure datasets using Python, Julia, and other tools.
- Host: GitHub
- URL: https://github.com/dillondaudert/proteindatasets
- Owner: dillondaudert
- License: mit
- Created: 2018-03-20T17:23:31.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2018-09-01T22:54:17.000Z (about 6 years ago)
- Last Synced: 2024-04-17T14:14:11.621Z (7 months ago)
- Topics: bioinformatics, biopython, blast, dataset, dssp, fasta, julia, jupyter, jupyter-notebook, pandas, protein, psiblast, python3, secondary, structure, tensorflow, uniref50
- Language: Jupyter Notebook
- Homepage:
- Size: 138 KB
- Stars: 6
- Watchers: 3
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Protein sequence and structure datasets
This repo contains scripts for creating various protein sequence and structure
datasets, as well as some guides for how to use them.## Contents
### proteinfeatures
Protein amino acid features.### cpdb
Working with the cullPDB dataset created in [Zhou & Troyanskaya,
2014](https://arxiv.org/abs/1403.1347).### cpdb2
Creating a new protein sequence-structure dataset following the methods used for
the cullPDB dataset, referred to as cpdb2.### psiblast
Scripts for calling NCBI+ psiblast on large fasta files from BioPython and
handling the results using multiprocessing.