https://github.com/opengene/opengene.jl
(No maintenance) OpenGene, core libraries for NGS data analysis and bioinformatics in Julia
https://github.com/opengene/opengene.jl
bioinformatics julia ngs
Last synced: 20 days ago
JSON representation
(No maintenance) OpenGene, core libraries for NGS data analysis and bioinformatics in Julia
- Host: GitHub
- URL: https://github.com/opengene/opengene.jl
- Owner: OpenGene
- License: other
- Created: 2015-12-05T14:10:53.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2022-07-18T01:42:12.000Z (almost 3 years ago)
- Last Synced: 2025-03-24T08:42:20.432Z (about 1 month ago)
- Topics: bioinformatics, julia, ngs
- Language: Julia
- Homepage:
- Size: 177 KB
- Stars: 65
- Watchers: 13
- Forks: 15
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# This project is no longer maintained
# OpenGene`OpenGene.jl` project aims to provide basic functions and rich utilities to analyze sequencing data, with the beautiful language [Julia](http://julialang.org/)
If you want to be an author of OpenGene, please open an issue, or make a pull request.
If you are looking for BAM/SAM read/write, see [OpenGene/HTSLIB](https://github.com/OpenGene/HTSLIB.jl)
Bug reports and feature requests, please [file an issue](https://github.com/OpenGene/OpenGene.jl/issues/new)## Julia
Julia is a fresh programming language with `C/C++` like performance and `Python` like simple usage
On Ubuntu, you can install Julia by `sudo apt-get install julia`, and type `julia` to open Julia interactive prompt. Details to install Julia is at [platform specific instructions](http://julialang.org/downloads/platform.html).## Add OpenGene
```julia
# run on Julia REPL
Pkg.add("OpenGene")
```
If you want to get the latest dev version of OpenGene (not for beginners)
```julia
Pkg.checkout("OpenGene")
```This project is under active developing, remember to update it to get newest features:
```julia
Pkg.update()
```
## Examples
***sequence operation***
```julia
julia> using OpenGenejulia> seq = dna("AAATTTCCCGGGATCGATCGATCG")
dna:AAATTTCCCGGGATCGATCGATCG
# reverse complement operator
julia> ~seq
dna:CGATCGATCGATCCCGGGAAATTT
# transcribiton, note that seq is treated as coding sequence, not template sequence
# so this operation only changes T to U
julia> transcribe(seq)
rna:CGAUCGAUCGAUCCCGGGAAAUUU
```***read/write a single fastq/fasta file***
```julia
using OpenGeneistream = fastq_open("input.fastq.gz")
ostream = fastq_open("output.fastq.gz","w")# fastq_read will return an object FastqRead {name, sequence, strand, quality}
# fastq_write can write a FastqRead into a ouput stream
while (fq = fastq_read(istream))!=false
fastq_write(ostream, fq)
endclose(ostream)
```
fasta is supported similarly with `fasta_open`, `fasta_read` and `fasta_write`***read/write a pair of fastq files***
```julia
using OpenGeneistream = fastq_open_pair("R1.fastq.gz", "R2.fastq.gz")
ostream = fastq_open_pair("Out.R1.fastq.gz","Out.R2.fastq.gz","w")# fastq_read_pair will return a pair of FastqRead {read1, read2}
# fastq_write_pair can write this pair to two files
while (pair = fastq_read_pair(istream))!=false
fastq_write_pair(ostream, pair)
endclose(ostream)
```***read/write a bed file***
```julia
using OpenGene# read all records, return an array of Intervals(chrom, chromstart, chromend)
intervals = bed_read_intervals("in.bed")
# write all records
bed_write_intervals("out.bed",intervals)
```***read/write a VCF***
```julia
using OpenGene# load the entire VCF data into a vcf object, which has a .header field and a .data field
vcfobj = vcf_read("in.vcf")
# write the vcf object into a file
vcf_write("out.vcf", vcfobj)
```***VCF Operations***
```julia
using OpenGenev1 = vcf_read("v1.vcf")
v2 = vcf_read("v2.vcf")# merge by positions
v_merge = v1 + v2# intersect by positions
v_intersect = v1 * v2# remove v2 records from v1, by positions
v_minus = v1 - v2
```***read/write a GTF***
```julia
using OpenGene# load the gtf header and data
gtfobj = gtf_read("in.gtf")# write the gtf object into a file
gtf_write("out.gtf", gtfobj)# if the file is too big, use following to load header only
gtfobj, stream = gtf_read("in.gtf", loaddata = false)
while (row = gtf_read_row(stream)) != false
# do something with row ...
end
```***locate the gene/exon/intron***
```julia
using OpenGene, OpenGene.Reference# load the gencode dataset, it will download a file from gencode website if it's not downloaded before
# once it's loaded, it will be cached so future loads will be fast
index = gencode_load("GRCh37")# locate which gene chr:pos is in
gencode_locate(index, "chr5", 149526621)
# it will return
# 1-element Array{Any,1}:
# Dict{ASCIIString,Any}("gene"=>"PDGFRB","number"=>1,"transcript"=>"ENST00000261799.4","type"=>"intron")
genes = gencode_genes(index, "TP53")
# return an array with only one record
genes[1].name, genes[1].chr, genes[1].start_pos, genes[1].end_pos
# ("TP53","chr17",7565097,7590856)
```***access assembly (hg19/hg38)***
```julia
julia> using OpenGenejulia> using OpenGene.Reference
julia> hg19 = load_assembly("hg19")
# Dict{ASCIIString,OpenGene.FastaRead} with 93 entries:julia> hg19["chr17"]
# >chr17
# dna:AAGCTTCTCACCCTGTTCCTGCATAGATAATTGCATGACA......agggtgtgggtgtgggtgtgggtgtgggtgtggtgtgtgggtgtgggtgtgGTjulia> hg19["chr17"].sequence[1:100]
# dna:AAGCTTCTCACCCTGTTCCTGCATAGATAATTGCATGACAATTGCCTTGTCCCTGCTGAATGTGCTCTGGGGTCTCTGGGGTCTCACCCACGACCAACTC
```
***merge a pair of reads from pair-end sequencing***
```julia
julia> using OpenGene, OpenGene.Algorithmjulia> r1=dna("TTTAGGCCTGTCACTGTGAACGCTATCAGCAAGCCTTTGCATGATTTTTCTCTTTCCCACTCCTACATTCTCGGTGATGACAACAACTGTAGCCTGATCCAGATATTTCGAAGTGCAACAAATCGTATTCAATATAGAGTAAGG")
dna:TTTAGGCCTGTCACTGTGAACGCTATCAGCAAGCCTTTGCATGATTTTTCTCTTTCCCACTCCTACATTCTCGGTGATGACAACAACTGTAGCCTGATCCAGATATTTCGAAGTGCAACAAATCGTATTCAATATAGAGTAAGGjulia> r2=dna("GTTAGCTATTACTGTAATCACCGCGAGACAAGTTAATGAGAGAGTTATTCATAAAACTTACTCTATATTGAATACGATTTGTAGCACATCGAAATATCTGGATCAGGCTACAGTTGTAGTCATCACCGAGAATGTAGGAGTGG")
dna:GTTAGCTATTACTGTAATCACCGCGAGACAAGTTAATGAGAGAGTTATTCATAAAACTTACTCTATATTGAATACGATTTGTAGCACATCGAAATATCTGGATCAGGCTACAGTTGTAGTCATCACCGAGAATGTAGGAGTGGjulia> offset, overlap_len, distance = overlap(r1, r2)
(56,88,4)julia> merged = simple_merge(r1, r2, overlap_len)
dna:TTTAGGCCTGTCACTGTGAACGCTATCAGCAAGCCTTTGCATGATTTTTCTCTTTCCCACTCCTACATTCTCGGTGATGACAACAACTGTAGCCTGATCCAGATATTTCGAAGTGCAACAAATCGTATTCAATATAGAGTAAGGTTTATGAATAACTCTCTCATTAACTTGTCTCGCGGTGATTACAGTAATAGCTAAC
```