https://github.com/peterhil/serpent
Serpent is an exploration into DNA sequences, codons, amino acids and genome data
https://github.com/peterhil/serpent
bioinformatics fasta sequencing
Last synced: 4 months ago
JSON representation
Serpent is an exploration into DNA sequences, codons, amino acids and genome data
- Host: GitHub
- URL: https://github.com/peterhil/serpent
- Owner: peterhil
- License: mit
- Created: 2023-02-25T09:37:47.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2026-01-26T20:13:52.000Z (4 months ago)
- Last Synced: 2026-01-27T07:37:58.652Z (4 months ago)
- Topics: bioinformatics, fasta, sequencing
- Language: Python
- Homepage:
- Size: 560 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Serpent
## Explore DNA data with Serpent
Serpent is an exploration into DNA and RNA sequences, nucleotide
bases, codons, amino acids and genome data.
My motivation to start this project was that I have wanted to explore DNA data in order to to learn and maybe
invent some compression algorithms for DNA data for about two decades.
## Install
Install serpent with `pip install serpent`, or develop with [pdm](https://pdm.fming.dev/).
## Tools provided
### Work with FASTA files and sequences
* `serpent cat`: concatenate and print FASTA files
* `serpent find`: find FASTA files in directories
* `serpent find -s`: find and print FASTA sequences in files and directories
### Convert data
* `serpent encode`: Convert data into different encoded representations
* `serpent decode`: Map codons into numbers 0...64
### Analyse and plot FASTA data visually
* `serpent ac`: print and plot autocorrelation on DNA and RNA sequences
* `serpent fft`: plot FFTs on DNA and RNA sequences
* `serpent hist`: plot histogram statistics
* `serpent image`: visualise DNA and RNA data as images
* `serpent seq`: plot sequence count statistics
### Statistics
* `serpent codons`: Print codon statistics
* `serpent pep`: Print peptide statistics
See `serpent -h` for all subcommands and `serpent -h` for options!
## Sample data
Get some sample data from NCBI datasets – I recommend starting with virus, bacteria or
archea genomic data as they are smaller than plants or animals.
* [National Center for Biotechnology Information](https://www.ncbi.nlm.nih.gov/)
* [Datasets - NCBI - NLM](https://www.ncbi.nlm.nih.gov/datasets/)
* [RefSeq: NCBI Reference Sequence Database](https://www.ncbi.nlm.nih.gov/refseq/)
* [Home - Nucleotide - NCBI](https://www.ncbi.nlm.nih.gov/nuccore/)
* [Home - Protein - NCBI](https://www.ncbi.nlm.nih.gov/protein)
* [Genome - NCBI - NLM](https://www.ncbi.nlm.nih.gov/datasets/genome/)
A [SARS-CoV-2 genome](https://www.ncbi.nlm.nih.gov/nuccore/MN988713.1?report=fasta) is only 29 kb for example!