{"id":22788466,"url":"https://github.com/astrabert/supaseqs","last_synced_at":"2026-04-12T04:33:39.739Z","repository":{"id":264506107,"uuid":"850832227","full_name":"AstraBert/SupaSeqs","owner":"AstraBert","description":"Basically BLAST, but written in PostgreSQL😉","archived":false,"fork":false,"pushed_at":"2024-09-01T22:45:19.000Z","size":35,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-25T20:06:56.839Z","etag":null,"topics":["bioinformatics","blast","data-management","database","dna-sequences","fastapi","nucleotide-sequence","postgresql","sqalchemy","supabase","vector-database","vector-search"],"latest_commit_sha":null,"homepage":"https://astrabert.github.io/SupaSeqs/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AstraBert.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-01T22:37:20.000Z","updated_at":"2024-09-01T22:45:50.000Z","dependencies_parsed_at":"2024-11-24T23:49:22.365Z","dependency_job_id":null,"html_url":"https://github.com/AstraBert/SupaSeqs","commit_stats":null,"previous_names":["astrabert/supaseqs"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AstraBert%2FSupaSeqs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AstraBert%2FSupaSeqs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AstraBert%2FSupaSeqs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AstraBert%2FSupaSeqs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AstraBert","download_url":"https://codeload.github.com/AstraBert/SupaSeqs/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246343796,"owners_count":20762096,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","blast","data-management","database","dna-sequences","fastapi","nucleotide-sequence","postgresql","sqalchemy","supabase","vector-database","vector-search"],"created_at":"2024-12-12T01:31:31.487Z","updated_at":"2025-12-30T23:16:37.941Z","avatar_url":"https://github.com/AstraBert.png","language":"Python","funding_links":["https://github.com/sponsors/AstraBert"],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eSupaSeqs\u003c/h1\u003e\r\n\u003ch2 align=\"center\"\u003eBasically BLAST written in PostgreSQL😉\u003c/h2\u003e\r\n\r\n\r\n\u003cdiv align=\"center\"\u003e\r\n    \u003cimg src=\"https://img.shields.io/github/languages/top/AstraBert/SupaSeqs\" alt=\"GitHub top language\"\u003e\r\n   \u003cimg src=\"https://img.shields.io/github/commit-activity/t/AstraBert/SupaSeqs\" alt=\"GitHub commit activity\"\u003e\r\n   \u003cimg src=\"https://img.shields.io/badge/Status-stable_beta-green\" alt=\"Static Badge\"\u003e\r\n   \u003cimg src=\"https://img.shields.io/badge/Release-v0.0_beta.0-purple\" alt=\"Static Badge\"\u003e\r\n   \u003cimg src=\"https://img.shields.io/badge/Supported_platforms-Windows/POSIX-brown\" alt=\"Static Badge\"\u003e\r\n   \u003cbr\u003e\r\n   \u003cbr\u003e\r\n   \u003cdiv\u003e\r\n        \u003cimg src=\"./scripts/static/favicon.png\" alt=\"Logo\" align=\"center\"\u003e\r\n   \u003c/div\u003e\r\n   \u003cbr\u003e\r\n   \u003cbr\u003e\r\n\u003c/div\u003e\r\n\r\n**SupaSeqs** is a tool that can be used to manage DNA sequences databases locally, thanks to the PostgreSQL implementation offered by [*Supabase*](https://supabase.com/).\r\n\r\nIt leverages PostgreSQL as backend database manager, kmer-based vectorization and vector search to mimic the functionalities of BLAST. \r\n\r\n### 1. Installation \r\n\r\nIf you are working in a Linux environment, you may want to just download/copy [setup.sh](./setup.sh) and launch it:\r\n\r\n```bash\r\n# Linux\r\nwget https://raw.githubusercontent.com/AstraBert/SupaSeqs/main/scripts/setup.sh\r\nbash setup.sh\r\n```\r\n\r\n### 1a. Pre-requirements\r\n\r\nMake sure that your environment has:\r\n- `git`\r\n- `Node v18` or following\r\n- `npm` and `npx`\r\n- `python 3.10` or following\r\nThe installation process should work both on Windows and on Linux.\r\n\r\n#### 1b. Environment setup\r\n\r\nFirst of all, clone this repository:\r\n```bash\r\n# BOTH Windows and Linux\r\ngit clone https://github.com/AstraBert/SupaSeqs\r\ncd SupaSeqs\r\n```\r\n\r\nGet the `supabase` command line executables:\r\n\r\n```bash\r\n# BOTH Windows and Linux\r\nnpm install supabase\r\n```\r\n\r\nCreate and start a Supabase instance:\r\n\r\n```bash\r\n# BOTH Windows and Linux\r\nnpx supabase init\r\nnpx supabase start\r\n```\r\n\r\nRetrieve the connection string from the `DB URL` that will be printed after this command:\r\n\r\n```bash\r\n# BOTH Windows and Linux\r\nnpx supabase status\r\n```\r\n\r\nCreate a virtual environment, activate it and install the necessary dependencies:\r\n\r\n```bash\r\n# Linux\r\npython3 -m venv apienv\r\nsource apienv/bin/activate\r\npython3 -m pip install -r requirements.txt\r\n```\r\n\r\nOr\r\n\r\n```powershell\r\n# Windows\r\npython3 -m venv .\\apienv\r\n.\\apienv\\Scripts\\activate  # For Command Prompt\r\n# or\r\n.\\apienv\\Activate.ps1  # For PowerShell\r\npython3 -m pip install -r .\\requirements.txt\r\n```\r\n\r\n#### 1c. Application start\r\n\r\nWithin the virtual environment, run:\r\n\r\n```bash\r\n# BOTH Windows and Linux\r\ncd scripts\r\npython3 -m fastapi dev \r\n```\r\n\r\nIf there are problems with the connection to the Supabase client, make sure to replace the connection string in [line of 16 `main.py`](./scripts/main.py#L16) with the one you found running `supabase status`.\r\n\r\n### 2. How does it work\r\n\r\nThe application works as an API service, leveraging [FastAPI](https://fastapi.tiangolo.com/). The connection to Supabase is handled via a [`sqlalchemy`](https://docs.sqlalchemy.org/en/20/) implementation of a client which is similar to the one built in the [`vecs`](https://github.com/supabase/vecs) library.\r\n\r\nThe application accepts two request types:\r\n\r\n1- **POST** - *Upload a sequence or a FASTA file*:\r\n```bash\r\n# Single sequence\r\ncurl -X POST \"http://127.0.0.1:8000/seqs/\" -H \"accept: application/json\" -H \"Content-Type: application/json\" -d \"{\\\"sequence\\\":\\\"GGCAGAACCCAGGGCACCAGCACGCCGAAGGACCACCGCAGGCTGGCCAGCGCTCCACCCTCCCTGCACCACACCCTGCGAGCAAAAGGCAGCAGAAATGAAGAGCATTTACTTTGTGGCTGGATTGTTTGTAATGCTGGTACAAGGCAGCTGGCAACACCCACTTCAAGACACAGAGGAAAAACCCAGGTCTTTCTCAACTTCTCAAACAGACTTGCTTGATGATCCGGATCAGATGAATGAAGACAAGCGTCATTCACAGGGTACATTCACCAGTGACTACAGCAAGTTCCTCGACACCAGGCGTGCTCAAGACTTCTTGGATTGGCTGAAGAACACCAAGAGGAACAGGAATGAAAT\\\", \\\"description\\\": \\\"M57688.1 Octodon degus glucagon mRNA, complete cds\\\"}\"\r\n# FASTA file\r\ncurl -X POST \"http://127.0.0.1:8000/seqs/\" -H \"accept: application/json\" -H \"Content-Type: application/json\" -d \"{\\\"sequence\\\": \\\"sequence.fasta\\\"}\"\r\n```\r\nEach sequence gets vectorized with a 5-mer-based representation (a 1024-dim array), which is then uploaded to the `sequences` table on Supabase along with a description (if provided in the case of the single sequence, the headers of the sequences for those in a FASTA file) and the original sequence.\r\n\r\n2- **GET** - *Search through the sequence database*\r\n```bash\r\ncurl -X 'GET' 'http://localhost:8000/seqs/AACTTCTCAAACAGACTTGCTTGATGATCCGGATCAGATGAATGAAGACAAGCGTCATTCACAGGGTACATTCACCAGTGACTACAGCAAGTTCCTCGACACCAGGCGTGCTCAAGACTTCTTGGATTGGCTGAAGAACACCAAGAGGAACAGGAATGAAAT?limit=100\u0026threshold=75' -H 'accept: application/json'\r\n```\r\n\r\nThe query sequence gets vectorized and the database is searched: a number of sequences (specified with the _limit_ key, maximum is 1000) is returned if they are compliant with a similarity threshold (specified as a percentage value with the _threshold_ key); the typical response looks like this: \r\n\r\n```json\r\n{\"1\":{\"sequence\":\"GGCAGAACCCAGGGCACCAGCACGCCGAAGGACCACCGCAGGCTGGCCAGCGCTCCACCCTCCCTGCACCACACCCTGCGAGCAAAAGGCAGCAGAAATGAAGAGCATTTACTTTGTGGCTGGATTGTTTGTAATGCTGGTACAAGGCAGCTGGCAACACCCACTTCAAGACACAGAGGAAAAACCCAGGTCTTTCTCAACTTCTCAAACAGACTTGCTTGATGATCCGGATCAGATGAATGAAGACAAGCGTCATTCACAGGGTACATTCACCAGTGACTACAGCAAGTTCCTCGACACCAGGCGTGCTCAAGACTTCTTGGATTGGCTGAAGAACACCAAGAGGAACAGGAATGAAAT\",\"description\":\"M57688.1 Octodon degus glucagon mRNA, complete cds\",\"cos_dist\":0.23987939711631145}}\r\n```\r\n\r\nThis is accomplished thanks to a function called `match_page_sections` and defined as follows:\r\n\r\n```sql\r\ncreate or replace function public.match_page_sections (\r\n  embedding vector(1024),\r\n  match_threshold float,\r\n  match_count int\r\n)\r\nreturns setof public.sequences\r\nlanguage sql\r\nas $$\r\n  select *\r\n  from public.sequences\r\n  where public.sequences.embedding \u003c=\u003e embedding \u003c 1 - match_threshold\r\n  order by public.sequences.embedding \u003c=\u003e embedding asc\r\n  limit least(match_count, 1000);\r\n$$;\r\n```\r\n\r\n### 3. Contributions\r\n\r\nContributions are more than welcome! See [contribution guidelines](./CONTRIBUTING.md) for more information :)\r\n\r\n### 4. Funding\r\n\r\nIf you found this project useful, please consider to [fund it](https://github.com/sponsors/AstraBert) and make it grow: let's support open-source together!😊\r\n\r\n### 5. License and rights of usage\r\n\r\nThis project is provided under [MIT license](./LICENSE): it will always be open-source and free to use.\r\n\r\nIf you use this project, please cite the author: [Astra Clelia Bertelli](https://astrabert.vercel.app)\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastrabert%2Fsupaseqs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fastrabert%2Fsupaseqs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastrabert%2Fsupaseqs/lists"}