{"id":13408295,"url":"https://github.com/dcdanko/minerva_barcode_deconvolution","last_synced_at":"2025-04-12T14:47:38.952Z","repository":{"id":57441736,"uuid":"109084569","full_name":"dcdanko/minerva_barcode_deconvolution","owner":"dcdanko","description":"Sort Linked Read DNA Into Fragment Specific Clusters","archived":false,"fork":false,"pushed_at":"2021-10-05T19:39:32.000Z","size":11088,"stargazers_count":11,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-08-10T07:51:16.197Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dcdanko.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-11-01T03:54:09.000Z","updated_at":"2021-10-05T19:39:35.000Z","dependencies_parsed_at":"2022-09-26T17:20:52.625Z","dependency_job_id":null,"html_url":"https://github.com/dcdanko/minerva_barcode_deconvolution","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcdanko%2Fminerva_barcode_deconvolution","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcdanko%2Fminerva_barcode_deconvolution/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcdanko%2Fminerva_barcode_deconvolution/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcdanko%2Fminerva_barcode_deconvolution/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dcdanko","download_url":"https://codeload.github.com/dcdanko/minerva_barcode_deconvolution/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248585290,"owners_count":21128974,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T20:00:51.942Z","updated_at":"2025-04-12T14:47:38.933Z","avatar_url":"https://github.com/dcdanko.png","language":"Python","funding_links":[],"categories":["Tools"],"sub_categories":[],"readme":"# Minerva Barcoded Read Deconvolution\n\n[![CircleCI](https://circleci.com/gh/dcdanko/minerva_barcode_deconvolution.svg?style=svg)](https://circleci.com/gh/dcdanko/minerva_barcode_deconvolution)\n\n[![CodeFactor](https://www.codefactor.io/repository/github/dcdanko/minerva_barcode_deconvolution/badge)](https://www.codefactor.io/repository/github/dcdanko/minerva_barcode_deconvolution)\n\n[![](https://img.shields.io/pypi/v/minerva_deconvolve.svg)](https://pypi.org/project/minerva_deconvolve/)\n\n## Deprecation Notice\n\nMinerva was an initial proof of concept algorithm. It has since been superseded by [Ariadne](https://github.com/lauren-mak/Ariadne). You should use Ariadne over Minerva to deconvolve linked reads.\n\n## Summary\n\nEmerging linked-read technologies (aka Read-Cloud or barcoded short-reads) have revived interest in short-read technology as a viable way to understand large-scale structure in genomes and metagenomes. Linked-read technologies, such as the 10x Chromium system, use a microfluidic system and a specialized set of barcodes to tag short DNA reads sourced from the same long fragment of DNA. Subsequently, the tagged reads are sequenced on standard short read platforms.\n\nThis approach results in interesting compromises. Each long fragment of DNA is only sparsely covered by reads, no information about the ordering of reads from the same fragment is preserved, and barcodes match reads from roughly 2-20 long fragments of DNA. However, compared to long read technologies the cost per base to sequence is far lower, far less input DNA is required, and the per base error rate is that of Illumina short-reads.\n\nIn the accompanying paper, we formally describe a particular algorithmic issue for linked-read technology: the deconvolution of reads with a single barcode into clusters that represent single long fragments of DNA. We also present Minerva, an algorithm which approximately solves the barcode deconvolution problem for metagenomic data. This codebase implements Minerva.\n\n[Minerva: An Alignment and Reference Free Approach to Deconvole Linked-Reads for Metagenomics](https://genome.cshlp.org/content/early/2018/12/06/gr.235499.118.full.pdf+html)\n\n## Installation\n\nFrom PyPi\n```\npip install minerva_deconvolve\n```\n\nFrom source\n```\ngit clone \u003curl\u003e   \ncd minerva_barcode_deconvolution\npython setup.py install\n```\n\n## Deconvolving Reads\n\nUse the following command to run barcode deconvolution. `\u003cfastq\u003e` should be an interleaved fastq file where reads have a `BX` tag designating barcode (this is the default output of [longranger basic](https://support.10xgenomics.com/genome-exome/software/pipelines/latest/advanced/other-pipelines))\n```\ncat \u003cfastq\u003e | minerva_deconvolve -k 20 -w 40 -d 8 -a 20 --remove-stopwords --eps 0.51 \u003e ebc_assignments.tsv\n```\n\nFor more options\n```\nminerva_deconvolve --help\n```\n\n### Output\n\nMinerva assigns barcoded reads to clusters within each barcode called deconvolved barcodes. The `minerva_deconvolve` command outputs a tsv file with three columns: read id, barcode, and cluster id. The deconvolved barcode for a read is a tuple of (barcode, cluster id).\n\n```\n$ head \u003cminerva_output_file\u003e\nBX:Z:GTGCCTTAGTCCGTAT-1 D00547:847:HYHNTBCXX:1:1207:20627:25951 0\nBX:Z:GTGCCTTAGTCCGTAT-1 D00547:847:HYHNTBCXX:1:1113:11082:83578 0\nBX:Z:GTGCCTTAGTCCGTAT-1 D00547:847:HYHNTBCXX:1:2206:4393:100450 0\nBX:Z:GTGCCTTAGTCCGTAT-1 D00547:847:HYHNTBCXX:2:1111:2014:28730  1\nBX:Z:GTGCCTTAGTCCGTAT-1 D00547:847:HYHNTBCXX:1:2216:17277:16384 2\nBX:Z:GTGCCTTAGTCCGTAT-1 D00547:847:HYHNTBCXX:1:1201:19163:82220 2\nBX:Z:GTGCCTTAGTCCGTAT-1 D00547:847:HYHNTBCXX:1:1202:16780:78102 0\nBX:Z:GTGCCTTAGTCCGTAT-1 D00547:847:HYHNTBCXX:1:1210:7460:13722  2\n```\n\nTo add enhanced barcodes to your fastq file run the following\n```\nminerva_deconvolve_fastq \u003cbc_assignment_file\u003e \u003cfastq_file\u003e - \u003e output.fq\n```\n\n### Performance\n\nThis is a demonstration program and is not intended to be performant. Runtimes over 10 hours are common even on small datasets.\nRAM usage is typically 50-100Gb.\n\n## Datasets\n\nThe datasets used in the paper may be downloaded from AWS.\n - [Dataset 1](https://s3.us-east-2.amazonaws.com/minerva-datasets/10M.data1_atgctgaaq.fq.gz)\n - [Dataset 2](https://s3.us-east-2.amazonaws.com/minerva-datasets/10M.data2_accctcct.fq.gz)\n\n\n## Credits\n\nThis algorithm was devloped and tested with help from Dmitrii Meleshko, Daniela Bezdan, Chris Mason, and Iman Hajirasouliha.\n\nThis package is written and maintained by [David C. Danko](mailto:dcdanko@gmail.com)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdcdanko%2Fminerva_barcode_deconvolution","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdcdanko%2Fminerva_barcode_deconvolution","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdcdanko%2Fminerva_barcode_deconvolution/lists"}