https://github.com/babilonczyk/bioai-seq
Command-line tool for protein sequence analysis that gives you instant, human-readable insights - without logging in, uploading sensitive data, or running complex pipelines
https://github.com/babilonczyk/bioai-seq
Last synced: 4 months ago
JSON representation
Command-line tool for protein sequence analysis that gives you instant, human-readable insights - without logging in, uploading sensitive data, or running complex pipelines
- Host: GitHub
- URL: https://github.com/babilonczyk/bioai-seq
- Owner: babilonczyk
- License: apache-2.0
- Created: 2025-08-02T13:42:25.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-08-07T08:40:00.000Z (5 months ago)
- Last Synced: 2025-08-07T10:22:28.761Z (5 months ago)
- Language: Python
- Homepage:
- Size: 18.6 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# bioai-seq
`bioai-seq` is a lightweight command-line tool for basic biological sequence analysis. Itβs part of my journey toward becoming a **Bio AI Software Engineer** - combining software engineering, biology, and AI.
It's designed to provide information about
---
## How to install
### 1. Create and activate a virtual environment
```bash
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
### 2. Install bioai-seq
```bash
pip install --upgrade bioai-seq
bioseq
```
---
## Deploying to PyPI (Production)
### 1. Clean previous builds
```bash
rm -rf dist build *.egg-info
```
### 2. Build the package
```bash
python3 -m build
```
### 3. Upload to PyPI
```bash
pip install --upgrade twine
twine upload dist/*
```
- Username: `__token__`
- Password: your API token from [https://pypi.org/manage/account/token/](https://pypi.org/manage/account/token/)
---
## π§ͺ Planned Example Output
```txt
β
Sequence loaded: 1273 amino acids
𧬠Detected: SARS-CoV-2 spike glycoprotein (likely variant: Omicron)
π Running ESM-2 embeddings...
π¦ Comparing against 1000 proteins in vector database...
π Top similar sequences:
- UniProt P0DTC2 (99.8%) β SARS-CoV-2 spike glycoprotein
- UniProt A0A6H2L9T9 (98.9%) β Bat coronavirus spike protein
- UniProt A0A2X1VPJ6 (97.5%) β Pangolin coronavirus S protein
------------------------------------------------------------
π¬ Matched Protein Metadata: P0DTC2
π Organism: SARS-CoV-2
𧬠Gene names: S, spike
π§« Host organisms: Human, Bat
π Description: Spike glycoprotein mediates viral entry via ACE2
π·οΈ Keywords: Receptor-binding, Glycoprotein, Fusion protein
π Protein evidence: Evidence at protein level
π§© Features:
- Signal peptide: 1β13
- Transmembrane region: 1213β1237
- RBD domain: 319β541
π External references:
- [PDB: 6VSB](https://www.rcsb.org/structure/6VSB)
- [RefSeq: YP_009724390.1](https://www.ncbi.nlm.nih.gov/protein/YP_009724390.1)
- [Pfam: PF01601](https://www.ebi.ac.uk/interpro/entry/pfam/PF01601)
- [AlphaFold model](https://alphafold.ebi.ac.uk/entry/P0DTC2)
- [UniProt entry](https://www.uniprot.org/uniprotkb/P0DTC2)
------------------------------------------------------------
π§ Summary:
"This sequence matches the SARS-CoV-2 spike glycoprotein. It binds to the ACE2 receptor to mediate viral entry. The receptor binding domain (RBD) spans residues 319β541 and contains key mutations in Omicron variants. The protein is expressed in humans and bats."
```
---
## Follow the Journey
- π Blog: [https://bioaisoftware.engineer](https://bioaisoftware.engineer)
- π§βπ» GitHub: [https://github.com/babilonczyk](https://github.com/babilonczyk)
- πΌ LinkedIn: [https://www.linkedin.com/in/jan-piotrzkowski/](https://www.linkedin.com/in/jan-piotrzkowski/)
---
## License
Apache 2.0 - free to use, and improve.