https://github.com/pdimens/bio-bin
Handy reusable bioinformatic scripts
https://github.com/pdimens/bio-bin
bioinformatics fasta genome-analysis genomics julia
Last synced: 8 months ago
JSON representation
Handy reusable bioinformatic scripts
- Host: GitHub
- URL: https://github.com/pdimens/bio-bin
- Owner: pdimens
- Created: 2017-08-25T14:10:37.000Z (over 8 years ago)
- Default Branch: main
- Last Pushed: 2022-07-27T17:53:38.000Z (over 3 years ago)
- Last Synced: 2025-03-25T11:49:16.256Z (8 months ago)
- Topics: bioinformatics, fasta, genome-analysis, genomics, julia
- Language: R
- Homepage:
- Size: 445 KB
- Stars: 8
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Bio-bin, the genomic toolbox
A place to store custom and forked scripts used for genomic analysis- a list slowly growing as things come up.
### allmaps_split_chimera.sh 
A reusable script that wraps [the steps provided by ALLMAPS](https://github.com/tanghaibao/jcvi/wiki/ALLMAPS:-How-to-split-chimeric-contigs) to identify and split chimeric contigs.
### bampurge.sh 
Sort and index a BAM file, along with removing unmapped reads. Provide the number of threads as the second argument to run multithreaded.
### configure_blasr_install 
It took me forever to get blasr/sparc installed and running correctly for hybrid genome assemblies, and after finally getting it to work, I vowed to never **ever** have to deal with it again, so this scipt does the necessary tweaks to get sparc_split_and_run.sh working right, *and* from your `$PATH`. **Deprecated since adding PR's to DBG2OLC repo**
### CoverageCutoff.jl 
Simple isolation of contigs below a specified sequence coverage threshold. Typically used for the `genome.file` output from `dDocent`'s `FreeBayes` step when `FreeBayes` crashes due to memory load because _de novo_ assembly with too many contigs. Output usually fed into [faSomeRecords](https://github.com/ENCODE-DCC/kentUtils/blob/master/src/utils/faSomeRecords/faSomeRecords.c) to "prune" the de novo assembly of low-coverage contigs.
### countbam 
Simple wrapper for `SAMtools` which counts the total number of reads and number of mapped reads in bam files.
### CountMatch.jl 
Takes an input file of strings (like 6bp indices) and does and all vs. all match to count the number of mismatches between the indices. Outputs an html heatmap and textfile of the pairwise comparisons.
### estimateGenomeSize 
Iteratively performs the first steps of the [Jellyfish Kmer counting method](https://bioinformatics.uconn.edu/genome-size-estimation-tutorial/)
### exportenv | condadeps 
For those times you forget the command to export (and strip the prefix from) your current conda environment to a yaml file. Use `condadeps` to list only the manually (explicitly) installed programs.
### FastStructureK.sh 
A convenience wrapper to perform `fastStructure` anaylses for a range of `1` to `k` values, then summarize all the marginal likelihoods into a single file.
### punzip 
Parallelized unzipping of .gz files from one directory into another. Can do an entire directory, or only files containing something specific in their name, such as `lobster`, `_R1_`, `britneyspears`, etc.
### revcomp 
Returns the reverse, complement, or reverse-complement of DNA bases in a text file.
### unpac 
Converts pacbio sequences from bam to fasta/q. A wrapper for `bam2fastx`