{"id":32176970,"url":"https://github.com/bacpop/mandrake","last_synced_at":"2025-10-30T18:47:11.707Z","repository":{"id":40423089,"uuid":"238434584","full_name":"bacpop/mandrake","owner":"bacpop","description":"Mandrake 🌿/👨‍🔬🦆  – Fast visualisation of the population structure of pathogens using Stochastic Cluster Embedding","archived":false,"fork":false,"pushed_at":"2025-03-07T11:36:21.000Z","size":24752,"stargazers_count":38,"open_issues_count":4,"forks_count":2,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-10-21T20:07:17.290Z","etag":null,"topics":["cuda","embedding","genomics","gpu","pathogens"],"latest_commit_sha":null,"homepage":"https://mandrake.readthedocs.io/en/latest/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bacpop.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-02-05T11:29:35.000Z","updated_at":"2025-05-29T15:24:30.000Z","dependencies_parsed_at":"2023-12-20T16:51:15.172Z","dependency_job_id":null,"html_url":"https://github.com/bacpop/mandrake","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/bacpop/mandrake","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bacpop%2Fmandrake","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bacpop%2Fmandrake/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bacpop%2Fmandrake/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bacpop%2Fmandrake/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bacpop","download_url":"https://codeload.github.com/bacpop/mandrake/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bacpop%2Fmandrake/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281862815,"owners_count":26574703,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-30T02:00:06.501Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","embedding","genomics","gpu","pathogens"],"created_at":"2025-10-21T20:01:19.250Z","updated_at":"2025-10-30T18:47:11.702Z","avatar_url":"https://github.com/bacpop.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# mandrake \u003cimg src='docs/images/mandrake_logo_v2.1.png' align=\"right\" height=\"140\" /\u003e\n\n\u003c!-- badges: start --\u003e\n[![Build and run tests](https://github.com/bacpop/mandrake/actions/workflows/python-package-conda.yml/badge.svg)](https://github.com/bacpop/mandrake/actions/workflows/python-package-conda.yml)\n[![Anaconda package](https://anaconda.org/conda-forge/mandrake/badges/version.svg\n)](https://anaconda.org/conda-forge/mandrake)\n[![Documentation Status](https://readthedocs.org/projects/mandrake/badge/?version=latest)](https://mandrake.readthedocs.io/)\n\u003c!-- badges: end --\u003e\n\nFast visualisation of the population structure of pathogens using Stochastic Cluster Embedding.\n\nPaper:\n\nLees JA, Tonkin-Hill G, Yang Z, Corander J.\nMandrake: visualizing microbial population structure by embedding millions of\ngenomes into a low-dimensional representation. *Philosophical Transactions of\nThe Royal Society B*. 2022;377: 20210237.\n\nhttps://doi.org/10.1098/rstb.2021.0237\n\nDocumentation available at: https://mandrake.readthedocs.io/en/latest/\n\n## Installation (briefly)\n\nSee https://mandrake.readthedocs.io/en/latest/installation.html for more details.\n\n1. Install [miniconda](https://docs.conda.io/en/latest/miniconda.html).\n2. Run `conda create -n mandrake_env mandrake` to install into a clean environment.\n3. Run `conda activate mandrake_env` to use the environment.\n\nRefer to the [conda-forge](https://conda-forge.org/docs/user/tipsandtricks.html#installing-cuda-enabled-packages-like-tensorflow-and-pytorch) documentation if\nyou want to install a CUDA (GPU) enabled version.\n\n### Semi-manual\n\nYou will need some dependencies, which you can install through `conda`:\n```\nconda create -n mandrake_env python\nconda env update -n mandrake_env --file environment.yml\nconda activate mandrake_env\n```\n\nYou can then clone this repository, and run:\n```\npython setup.py install\n```\n\n### GPU acceleration\nYou will need the CUDA toolkit installed.\n\nIf you have the ability to compile CUDA (e.g. `nvcc`) you should see a message:\n```\nCUDA found, compiling both GPU and CPU code\n```\notherwise only the CPU version will be compiled:\n```\nCUDA not found, compiling CPU code only\n```\n\n## Usage\nAfter installing, an example command would look like this:\n```\nmandrake --sketches sketchlib.h5 --kNN 500 --cpus 4 --maxIter 1000000\n```\nThis would use a file `sketchlib.h5` created by [pp-sketchlib](https://github.com/johnlees/pp-sketchlib)\nto calculate accessory distances using 500 nearest neighbours.\n\nOutput can be found in numerous files prefixed `mandrake.embedding*`.\n\nOther useful arguments include:\n\n- `--alignment` use a fasta alignment to calculate distances\n- `--accessory` use a presence/absence file (Rtab or similar) to calculate distances\n- `--distances` use a `.npz` file from a previous run and skip straight to the embedding step\n- `--labels` give labels to colour the output by\n- `--perplexity` change the perplexity of the preprocessing (similar to t-SNE)\n- `--animate` produce a video of the optimisation\n- `--use-gpu` use a GPU for the run. Make sure to increase `--n-workers`.\n\nSee the [documentation](https://mandrake.readthedocs.io/en/latest/) for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbacpop%2Fmandrake","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbacpop%2Fmandrake","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbacpop%2Fmandrake/lists"}