{"id":26572661,"url":"https://github.com/medvir/VirMet","last_synced_at":"2025-03-23T00:35:28.305Z","repository":{"id":10179742,"uuid":"12266267","full_name":"medvir/VirMet","owner":"medvir","description":"Set of tools for viral metagenomics.","archived":false,"fork":false,"pushed_at":"2025-03-18T09:40:36.000Z","size":1191,"stargazers_count":14,"open_issues_count":6,"forks_count":5,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-18T10:36:19.618Z","etag":null,"topics":["bacterial-database","bioinformatics","bioinformatics-pipeline","database","genome","miseq","ncbi","python","viral-database","viral-metagenomics"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"JorgeAguirreLeon/react-daterange-picker","license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/medvir.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2013-08-21T09:24:53.000Z","updated_at":"2024-11-10T23:06:47.000Z","dependencies_parsed_at":"2022-08-31T02:00:46.096Z","dependency_job_id":"5263b6c5-8264-4e6e-bba5-f9acadf20e6a","html_url":"https://github.com/medvir/VirMet","commit_stats":{"total_commits":258,"total_committers":5,"mean_commits":51.6,"dds":0.03875968992248058,"last_synced_commit":"edc8f7d1a63c5a11e0837009d3674391adbb5b3e"},"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medvir%2FVirMet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medvir%2FVirMet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medvir%2FVirMet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medvir%2FVirMet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/medvir","download_url":"https://codeload.github.com/medvir/VirMet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245040215,"owners_count":20551297,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bacterial-database","bioinformatics","bioinformatics-pipeline","database","genome","miseq","ncbi","python","viral-database","viral-metagenomics"],"created_at":"2025-03-23T00:34:36.928Z","updated_at":"2025-03-23T00:35:28.271Z","avatar_url":"https://github.com/medvir.png","language":"Python","funding_links":[],"categories":["Databases"],"sub_categories":["Gene Finding"],"readme":"VirMet\n------\n\n[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat-square)](http://bioconda.github.io/recipes/virmet/README.html)\n[![Build Status](https://travis-ci.org/ozagordi/VirMet.svg?branch=master)](https://travis-ci.org/ozagordi/VirMet)\n[![codecov.io](https://codecov.io/github/ozagordi/VirMet/coverage.svg?branch=master)](https://codecov.io/github/ozagordi/VirMet?branch=master)\n[![codebeat badge](https://codebeat.co/badges/bf360427-6915-4432-b43e-054716e8139f)](https://codebeat.co/projects/github-com-ozagordi-virmet-master)\n\nWatch out: only a few files are counted in coverage statistics.\n\nFull documentation on [Read the Docs](http://virmet.rtfd.org/en/latest/).\n\nA set of tools for viral metagenomics.\n\nvirmet is called with a command subcommand\nsyntax: `virmet fetch --viral n`, for example, downloads the bacterial\ndatabase. Other available subcommands so far are\n\n- `fetch`               download genomes\n- `update`              update viral/bacterial database\n- `index`               index genomes\n- `wolfpack`            analyze a Miseq run\n- `covplot`             plot coverage for a specific organism\n\n\nA short help is obtained with `virmet subcommand -h`.\n\n#### The simplest example\n\n    [user@host ~]$ virmet wolfpack --run path_to_run_directory\n    ... some time later ...\n    [user@host ~]$ wc virmet_output_name_of_the_run/sample_name/orgs_list.csv\n       9     128     963 orgs_list2.tsv\n\nReads are filtered, decontaminated, and finally blasted against a (large)\nset of viral sequences. Results for each database sequence to which similar reads\nwere found are summarised in a tsv file is with columns\n\n- `species`: scientific name of the species corresponding to the database sequence;\n- `reads`: number of reads assigned to this specific sequence;\n- `stitle`: title of the sequence in the database (fasta header);\n- `ssciname`: scientific name of the sequence;\n- `covered_region`: number of nucleotides covered by at least one read;\n- `seq_len`: length of the sequence.\n\nAn example of such a file is reported here.\n\n![Figure 1](output.png \" Figure 1\")\n\n### Installation\n\n#### Bioconda\n\nVirMet is available through [Bioconda](https://bioconda.github.io), a channel\nfor the [conda](http://conda.pydata.org/docs/intro.html) package manager. Once\nconda is [installed](https://bioconda.github.io/#install-conda) and the\n[channels](https://bioconda.github.io/#set-up-channels) are set up,\n`conda install virmet` installs the package with all its dependencies.\n\n#### `setuptools` or Docker\n\nThe classic `python setup.py install` will work, but see the relevant\n[page](http://virmet.readthedocs.io/en/latest/installation/) of the\ndocumentation to install third-party tools, or follow the instructions\nto run the [docker version](http://virmet.readthedocs.io/en/latest/dockerised/).\n\n### Preparation\n\nVirMet contains programs to download and index the genome sequences,\ninstructions [here](http://virmet.readthedocs.io/en/latest/preparation/).\n\n### Running a virus scan\n\nThis can be run on a single file or on a directory. It will try to guess from\nthe naming scheme if it is a Miseq output directory (_i.e._ with\n`Data/Intensities/BaseCalls/` structure) and analyze all fastq files in there.\nThe extension must be `.fastq` or `.fastq.gz`. It will then run a filtering\nstep based on quality, length and entropy (in short: reads with a lot of\nrepeats will be discarded), followed by a decontamination step where reads\nof human/bacterial/bovine/fungal origin will be discarded. Finally, remaining\nreads are _blasted_ against the viral database. The list of organisms with the\ncount of reads is in files `orgs_list.csv` in the output directory\n(naming is `virmet_output_...`). For example, if we have a directory named\n`exp_01` with files\n\n    exp_01/AR-1_S1_L001_R1_001.fastq.gz\n    exp_01/AR-2_S2_L001_R1_001.fastq.gz\n    exp_01/AR-3_S3_L001_R1_001.fastq.gz\n    exp_01/AR-4_S4_L001_R1_001.fastq.gz\n\nwe could run\n\n    [user@host test_virmet]$ virmet wolfpack --dir exp_01\n\nand, after some time, find the results in `virmet_output_exp01`. Many files are\npresent, the most important ones being `orgs_list.csv` and `stats.tsv`. The\nfirst lists the viral organisms found with a count of reads that could be\nmatched to them.\n\n    [user@host test_virmet]$ cat virmet_output_test_dir_150123/3-1-65_S5/orgs_list.tsv\n    organism\treads\n    Human adenovirus 7\t126\n    Human poliovirus 1 strain Sabin\t45\n    Human poliovirus 1 Mahoney\t29\n    Human adenovirus 3+11p\t19\n    Human adenovirus 16\t1\n\nThe second file is a summary of all reads analyzed for this sample and how many\nwere passing a specific step of the pipeline or matching a specific database.\n\n    [user@host test_virmet]$ cat virmet_output_exp01/AR-1_S1/stats.tsv\n    raw_reads       6250\n    trimmed_too_short       462\n    low_entropy     1905\n    low_quality     0\n    passing_filter  3883\n    matching_humanGRCh38    3463\n    matching_bact1  0\n    matching_bact2  0\n    matching_bact3  0\n    matching_fungi1 0\n    matching_bt_ref 0\n    reads_to_blast  420\n    viral_reads     257\n    undetermined_reads      163\n\n\n### Updating the database\n\nMore and more sequences are uploaded to NCBI database every month. The figure\nshows the number of viral sequences with _complete genome_ in the title\nthat are submitted every month to NCBI ([code](https://gist.github.com/ozagordi/c1e1c4158ab4e94e4683)).\n\n![Code used to create the figure is [here](https://gist.github.com/ozagordi/c1e1c4158ab4e94e4683)](./docs/viral_genomes.png \"NCBI complete viral genomes per month\")\n\nVirMet provides a simple way to [update the viral database](http://virmet.readthedocs.io/en/latest/updating/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmedvir%2FVirMet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmedvir%2FVirMet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmedvir%2FVirMet/lists"}