https://github.com/peterhil/serpent

Serpent is an exploration into DNA sequences, codons, amino acids and genome data
https://github.com/peterhil/serpent

bioinformatics fasta sequencing

Last synced: 4 months ago
JSON representation

Serpent is an exploration into DNA sequences, codons, amino acids and genome data

Host: GitHub
URL: https://github.com/peterhil/serpent
Owner: peterhil
License: mit
Created: 2023-02-25T09:37:47.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2026-01-26T20:13:52.000Z (4 months ago)
Last Synced: 2026-01-27T07:37:58.652Z (4 months ago)
Topics: bioinformatics, fasta, sequencing
Language: Python
Homepage:
Size: 560 KB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Serpent

## Explore DNA data with Serpent

Serpent is an exploration into DNA and RNA sequences, nucleotide
bases, codons, amino acids and genome data.

My motivation to start this project was that I have wanted to explore DNA data in order to to learn and maybe
invent some compression algorithms for DNA data for about two decades.

## Install

Install serpent with `pip install serpent`, or develop with [pdm](https://pdm.fming.dev/).

## Tools provided

### Work with FASTA files and sequences

* `serpent cat`: concatenate and print FASTA files
* `serpent find`: find FASTA files in directories
* `serpent find -s`: find and print FASTA sequences in files and directories

### Convert data

* `serpent encode`: Convert data into different encoded representations
* `serpent decode`: Map codons into numbers 0...64

### Analyse and plot FASTA data visually

* `serpent ac`: print and plot autocorrelation on DNA and RNA sequences
* `serpent fft`: plot FFTs on DNA and RNA sequences
* `serpent hist`: plot histogram statistics
* `serpent image`: visualise DNA and RNA data as images
* `serpent seq`: plot sequence count statistics

### Statistics

* `serpent codons`: Print codon statistics
* `serpent pep`: Print peptide statistics

See `serpent -h` for all subcommands and `serpent -h` for options!

## Sample data

Get some sample data from NCBI datasets – I recommend starting with virus, bacteria or
archea genomic data as they are smaller than plants or animals.

* [National Center for Biotechnology Information](https://www.ncbi.nlm.nih.gov/)
* [Datasets - NCBI - NLM](https://www.ncbi.nlm.nih.gov/datasets/)
* [RefSeq: NCBI Reference Sequence Database](https://www.ncbi.nlm.nih.gov/refseq/)
* [Home - Nucleotide - NCBI](https://www.ncbi.nlm.nih.gov/nuccore/)
* [Home - Protein - NCBI](https://www.ncbi.nlm.nih.gov/protein)
* [Genome - NCBI - NLM](https://www.ncbi.nlm.nih.gov/datasets/genome/)

A [SARS-CoV-2 genome](https://www.ncbi.nlm.nih.gov/nuccore/MN988713.1?report=fasta) is only 29 kb for example!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/peterhil/serpent

Awesome Lists containing this project

README