{"id":22621084,"url":"https://github.com/pdimens/bio-bin","last_synced_at":"2025-04-11T15:35:08.733Z","repository":{"id":50086269,"uuid":"101410292","full_name":"pdimens/bio-bin","owner":"pdimens","description":"Handy reusable bioinformatic scripts","archived":false,"fork":false,"pushed_at":"2022-07-27T17:53:38.000Z","size":456,"stargazers_count":8,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-25T11:49:16.256Z","etag":null,"topics":["bioinformatics","fasta","genome-analysis","genomics","julia"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pdimens.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-08-25T14:10:37.000Z","updated_at":"2024-08-01T12:06:14.000Z","dependencies_parsed_at":"2022-09-01T12:21:04.660Z","dependency_job_id":null,"html_url":"https://github.com/pdimens/bio-bin","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pdimens%2Fbio-bin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pdimens%2Fbio-bin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pdimens%2Fbio-bin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pdimens%2Fbio-bin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pdimens","download_url":"https://codeload.github.com/pdimens/bio-bin/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248432059,"owners_count":21102308,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","fasta","genome-analysis","genomics","julia"],"created_at":"2024-12-08T22:17:39.671Z","updated_at":"2025-04-11T15:35:08.708Z","avatar_url":"https://github.com/pdimens.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Bio-bin, the genomic toolbox\n\nA place to store custom and forked scripts used for genomic analysis- a list slowly growing as things come up.\n### allmaps_split_chimera.sh ![BASH logo](https://img.shields.io/badge/bash-lightgrey.svg?logo=gnu%20bash\u0026logoColor=white)\nA reusable script that wraps [the steps provided by ALLMAPS](https://github.com/tanghaibao/jcvi/wiki/ALLMAPS:-How-to-split-chimeric-contigs) to identify and split chimeric contigs. \n\n### bampurge.sh ![BASH logo](https://img.shields.io/badge/bash-lightgrey.svg?logo=gnu%20bash\u0026logoColor=white)\nSort and index a BAM file, along with removing unmapped reads. Provide the number of threads as the second argument to run multithreaded.\n\n### configure_blasr_install ![BASH logo](https://img.shields.io/badge/bash-lightgrey.svg?logo=gnu%20bash\u0026logoColor=white)\nIt took me forever to get blasr/sparc installed and running correctly for hybrid genome assemblies, and after finally getting it to work, I vowed to never **ever** have to deal with it again, so this scipt does the necessary tweaks to get sparc_split_and_run.sh working right, *and* from your `$PATH`. **Deprecated since adding PR's to DBG2OLC repo**\n\n### CoverageCutoff.jl ![Julia logo](https://img.shields.io/badge/julia-blue.svg?logo=julia\u0026logoColor=white)\nSimple isolation of contigs below a specified sequence coverage threshold. Typically used for the `genome.file` output from `dDocent`'s `FreeBayes` step when `FreeBayes` crashes due to memory load because _de novo_ assembly with too many contigs. Output usually fed into [faSomeRecords](https://github.com/ENCODE-DCC/kentUtils/blob/master/src/utils/faSomeRecords/faSomeRecords.c) to \"prune\" the de novo assembly of low-coverage contigs. \n\n### countbam ![BASH logo](https://img.shields.io/badge/bash-lightgrey.svg?logo=gnu%20bash\u0026logoColor=white)\nSimple wrapper for `SAMtools` which counts the total number of reads and number of mapped reads in bam files.\n\n### CountMatch.jl ![Julia logo](https://img.shields.io/badge/julia-blue.svg?logo=julia\u0026logoColor=white)\nTakes an input file of strings (like 6bp indices) and does and all vs. all match to count the number of mismatches between the indices. Outputs an html heatmap and textfile of the pairwise comparisons.\n\n### estimateGenomeSize ![BASH logo](https://img.shields.io/badge/bash-lightgrey.svg?logo=gnu%20bash\u0026logoColor=white)\nIteratively performs the first steps of the [Jellyfish Kmer counting method](https://bioinformatics.uconn.edu/genome-size-estimation-tutorial/)\n\n### exportenv | condadeps ![BASH logo](https://img.shields.io/badge/bash-lightgrey.svg?logo=gnu%20bash\u0026logoColor=white)\nFor those times you forget the command to export (and strip the prefix from) your current conda environment to a yaml file. Use `condadeps` to list only the manually (explicitly) installed programs.\n\n### FastStructureK.sh ![BASH logo](https://img.shields.io/badge/bash-lightgrey.svg?logo=gnu%20bash\u0026logoColor=white)\nA convenience wrapper to perform `fastStructure` anaylses for a range of `1` to `k` values, then summarize all the marginal likelihoods into a single file. \n\n### punzip ![BASH logo](https://img.shields.io/badge/bash-lightgrey.svg?logo=gnu%20bash\u0026logoColor=white)\nParallelized unzipping of .gz files from one directory into another. Can do an entire directory, or only files containing something specific in their name, such as `lobster`, `_R1_`, `britneyspears`, etc.\n\n### revcomp  ![BASH logo](https://img.shields.io/badge/bash-lightgrey.svg?logo=gnu%20bash\u0026logoColor=white)\nReturns the reverse, complement, or reverse-complement of DNA bases in a text file.\n\n### unpac ![BASH logo](https://img.shields.io/badge/bash-lightgrey.svg?logo=gnu%20bash\u0026logoColor=white)\nConverts pacbio sequences from bam to fasta/q. A wrapper for `bam2fastx`\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpdimens%2Fbio-bin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpdimens%2Fbio-bin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpdimens%2Fbio-bin/lists"}