{"id":13639065,"url":"https://github.com/pbenner/gonetics","last_synced_at":"2025-03-16T19:30:28.014Z","repository":{"id":50530405,"uuid":"59055813","full_name":"pbenner/gonetics","owner":"pbenner","description":"Go / Golang Bioinformatics Library","archived":false,"fork":false,"pushed_at":"2024-09-29T14:26:18.000Z","size":6396,"stargazers_count":40,"open_issues_count":2,"forks_count":1,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-10-12T15:59:29.971Z","etag":null,"topics":["bam-files","bigwig","bigwig-files","bioinformatics","bioinformatics-containers","chip-seq","coverage","genomic-ranges","golang","golang-library","gtf"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pbenner.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-05-17T20:20:05.000Z","updated_at":"2024-09-29T14:26:21.000Z","dependencies_parsed_at":"2024-09-08T11:48:25.383Z","dependency_job_id":"0ae8b71d-0886-4f5f-85e0-d08a47478dce","html_url":"https://github.com/pbenner/gonetics","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Fgonetics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Fgonetics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Fgonetics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Fgonetics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pbenner","download_url":"https://codeload.github.com/pbenner/gonetics/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221667061,"owners_count":16860539,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bam-files","bigwig","bigwig-files","bioinformatics","bioinformatics-containers","chip-seq","coverage","genomic-ranges","golang","golang-library","gtf"],"created_at":"2024-08-02T01:00:57.302Z","updated_at":"2024-10-27T11:08:11.321Z","avatar_url":"https://github.com/pbenner.png","language":"Go","funding_links":[],"categories":["General Bioinformatics Libraries"],"sub_categories":[],"readme":"## Gonetics\n\nGonetics is a bioinformatics library for the Go programming language (golang). It provides native data structures for handling genetic data and methods for handling common file formats such as BAM, GTF, BED, BigWig, and Wig. The documentation is available [here](https://godoc.org/github.com/pbenner/gonetics).\n\n### Tools\n\nExecutables are available [here](https://github.com/pbenner/gonetics-tools).\n\n| Tool                     | Description                                                              |\n| ------------------------ | ------------------------------------------------------------------------ |\n| bamCheckBin              | check bin records of a bam file                                          |\n| bamGenome                | print the genome (sequence table) of a bam file                          |\n| bamToBigWig              | convert bam to bigWig (estimate fragment length if required)             |\n| bamView                  | print contents of a bam file                                             |\n| bigWigEditChromNames     | edit chromosome names of a bigWig file (i.e. replace `chr1` by just `1`) |\n| bigWigExtract            | extract regions from a bigWig file and save them as table or bigWig file |\n| bigWigExtractChroms      | extract a subset of the chromosomes from a bigWig file                   |\n| bigWigGenome             | print the genome (sequence table) of a bigWig file                       |\n| bigWigHistogram          | compute a histogram of the values in a bigWig file                       |\n| bigWigNil                | read bigWig and output it to a new file                                  |\n| bigWigMap                | apply a function to a set of bigWig files                                |\n| bigWigPositive           | simple peak finding (i.e. every region with a value above a threshold)   |\n| bigWigQuantileNormalize  | quantile normalize a bigWig file to a reference                          |\n| bigWigQuery              | retrieve data from a bigWig file                                         |\n| bigWigQuerySequence      | retrieve sequences from a bigWig file                                    |\n| bigWigStatistics         | compute summary statistics of a bigWig file                              |\n| chromHmmTablesToBigWig   | convert chromHmm output (posteriors / binariezed bams) to bigWig         |\n| countKmers               | count kmers in a set of DNA sequences                                    |\n| drawGenomicRegions       | draw random genomic regions                                              |\n| fastaExtract             | extract regions from a fasta file                                        |\n| fastaUnresolvedRegions   | identify regions that are not resolved (i.e. stretches of 'NNNN...')     |\n| gtfToBed                 | convert GTF files to Bed6 format                                         |\n| memeExtract              | extract PWM/PPM motifs from MEME/DREME xml files                         |\n| observedOverExpectedCpG  | compute CpG scores as defined by Gardiner-Garden and Frommer (1987)      |\n| pwmScanSequences         | scan sequences for PWM hits                                              |\n| pwmScanRegions           | scan regions for multiple PWMs                                           |\n| segmentationDifferential | extract differential regions from multiple chromatin segmentations       |\n\n### GRanges\n\nCreate a GRanges object with three ranges on the first chromosome:\n\n```go\n  seqnames := []string{\"chr1\", \"chr1\", \"chr1\"}\n  from     := []int{100000266, 100000271, 100000383}\n  to       := []int{100000291, 100000296, 100000408}\n  strand   := []byte{'+', '+', '-'}\n\n  granges  := NewGRanges(seqnames, from, to, strand)\n  fmt.Println(granges)\n```\n\t  seqnames                 ranges strand\n\t1     chr1 [100000266, 100000291)      +\n\t2     chr1 [100000271, 100000296)      +\n\t3     chr1 [100000383, 100000408)      -\n\nAdd some meta data to the GRanges object:\n\n```go\n  granges.AddMeta(\"data\", []float64{1.0, 2.0, 3.0})\n```\n\t  seqnames                 ranges strand |          data\n\t1     chr1 [100000266, 100000291)      + |      1.000000\n\t2     chr1 [100000271, 100000296)      + |      2.000000\n\t3     chr1 [100000383, 100000408)      - |      3.000000\n\nFind overlaps of two GRanges objects:\n\n```go\n  rSubjects := NewGRanges(\n    []string{\"chr4\", \"chr4\", \"chr4\", \"chr4\"},\n    []int{100, 200, 300, 400},\n    []int{150, 250, 350, 450},\n    []byte{})\n  rQuery := NewGRanges(\n    []string{\"chr1\", \"chr4\", \"chr4\", \"chr4\", \"chr4\", \"chr4\"},\n    []int{100, 110, 190, 340, 390, 450},\n    []int{150, 120, 220, 360, 400, 500},\n    []byte{})\n\n  queryHits, subjectHits := FindOverlaps(rQuery, rSubjects)\n```\n\t  queryHits: [1 2 3 4 5]\n\tsubjectHits: [0 1 2 3 3]\n\n### Genes\n\nDownload gene list from UCSC and export it to file:\n\n```go\n  genes := ImportGenesFromUCSC(\"hg19\", \"ensGene\")\n  genes.WriteTable(\"hg19.knownGene.txt\", true, false)\n  fmt.Println(genes)\n```\n\n\t                 names seqnames          transcripts                  cds strand\n\t     1 ENST00000456328     chr1 [   11868,    14409) [   14409,    14409)      +\n\t     2 ENST00000515242     chr1 [   11871,    14412) [   14412,    14412)      +\n\t     3 ENST00000518655     chr1 [   11873,    14409) [   14409,    14409)      +\n\t     4 ENST00000450305     chr1 [   12009,    13670) [   13670,    13670)      +\n\t     5 ENST00000423562     chr1 [   14362,    29370) [   29370,    29370)      -\n\t                   ...      ...                  ...                  ...       \n\t204936 ENST00000420810     chrY [28695571, 28695890) [28695890, 28695890)      +\n\t204937 ENST00000456738     chrY [28732788, 28737748) [28737748, 28737748)      -\n\t204938 ENST00000435945     chrY [28740997, 28780799) [28780799, 28780799)      -\n\t204939 ENST00000435741     chrY [28772666, 28773306) [28773306, 28773306)      -\n\t204940 ENST00000431853     chrY [59001390, 59001635) [59001635, 59001635)      +\n\nImport expression data from a GTF file:\n\n```go\n  genes.ImportGTF(\"genesExpr_test.gtf.gz\", \"transcript_id\", \"FPKM\", false)\n```\n\n\t                 names seqnames          transcripts                  cds strand |          expr\n\t     1 ENST00000456328     chr1 [   11868,    14409) [   14409,    14409)      + |      0.073685\n\t     2 ENST00000515242     chr1 [   11871,    14412) [   14412,    14412)      + |      0.000000\n\t     3 ENST00000518655     chr1 [   11873,    14409) [   14409,    14409)      + |      0.000000\n\t     4 ENST00000450305     chr1 [   12009,    13670) [   13670,    13670)      + |      0.000000\n\t     5 ENST00000423562     chr1 [   14362,    29370) [   29370,    29370)      - |     10.413931\n\t                   ...      ...                  ...                  ...        |           ...\n\t204936 ENST00000420810     chrY [28695571, 28695890) [28695890, 28695890)      + |      0.000000\n\t204937 ENST00000456738     chrY [28732788, 28737748) [28737748, 28737748)      - |      0.000000\n\t204938 ENST00000435945     chrY [28740997, 28780799) [28780799, 28780799)      - |      0.000000\n\t204939 ENST00000435741     chrY [28772666, 28773306) [28773306, 28773306)      - |      0.000000\n\t204940 ENST00000431853     chrY [59001390, 59001635) [59001635, 59001635)      + |      0.000000\n\n### Peaks\n\nImport peaks from a MACS xls file:\n\n```go\n  peaks := ImportXlsPeaks(\"peaks_test.xls\")\n```\n\t   seqnames             ranges strand |  abs_summit     pileup -log10(pvalue) fold_enrichment -log10(qvalue)\n\t 1       2L [   5757,    6001)      * |        5865  33.000000      19.809300        6.851880      17.722200\n\t 2       2L [  47233,   47441)      * |       47354  36.000000      19.648200        6.263150      17.566200\n\t 3       2L [  66379,   67591)      * |       66957 252.000000     350.151250       50.986050     346.525450\n\t 4       2L [  72305,   72838)      * |       72525 170.000000     208.558240       34.460930     205.734390\n\t 5       2L [  72999,   73218)      * |       73130  25.000000      12.711700        5.239670      10.700880\n\t        ...                ...        |         ...        ...            ...             ...            ...\n\t12       2R [3646319, 3646794)      * |     3646442  37.000000      23.176910        7.455710      21.063850\n\t13       2R [3666770, 3668041)      * |     3667119 215.000000     279.229060       41.551060     276.108090\n\t14       2R [3668231, 3668441)      * |     3668363  22.000000       9.943110        4.476950       7.976070\n\t15       2R [3670063, 3670393)      * |     3670180  38.000000      19.474590        5.901360      17.393440\n\t16       2R [3670470, 3670927)      * |     3670719 227.000000     305.243350       45.831180     301.974760\n\n### Track\n\nImport ChIP-seq reads from bed files and create a track with the normalized signal:\n\n```go\n  fmt.Fprintf(os.Stderr, \"Parsing reads (treatment) ...\\n\")\n  treatment1 := GRanges{}\n  treatment1.ImportBed6(\"SRR094207.bed\")\n  treatment2 := GRanges{}\n  treatment2.ImportBed6(\"SRR094208.bed\")\n  fmt.Fprintf(os.Stderr, \"Parsing reads (control)   ...\\n\")\n  control1   := GRanges{}\n  control1.ImportBed6(\"SRR094215.bed\")\n  control2   := GRanges{}\n  control2.ImportBed6(\"SRR094216.bed\")\n\n  genome  := Genome{}\n  genome.Import(\"Data/hg19.genome\")\n  d       := 200 // d=200 (see *_peaks.xls)\n  binsize := 100 // binsize of the track\n  pcounts := 1   // pseudocounts\n  track := NormalizedTrack(\"H3K4me3\",\n    []GRanges{treatment1, treatment2}, []GRanges{control1, control2},\n    genome, d, binsize, pcounts, pcounts, false)\n```\n\nExport track to wig or bigWig:\n```go\n  track.WriteWiggle(\"track.wig\", \"track description\")\n  track.WriteBigWig(\"track.bw\",  \"track description\")\n```\n\n### BigWig Files\n\nBigWig files contain data in a binary format optimized for fast random access. In addition to the raw data, bigWig files typically contain several zoom levels for which the data has been summarized. The BigWigReader class allows to query data and it automatically selects an appropriate zoom level for the given binsize:\n```go\n  reader, err := NewBigWigReader(\"test.bw\")\n  if err != nil {\n    log.Fatal(err)\n  }\n  // query details\n  seqname := \"chr4\" // (regular expression)\n  from    := 11774000\n  to      := 11778000\n  binsize := 20\n\n  for record := range reader.Query(seqname, from, to, binsize) {\n    if record.Error != nil {\n      log.Fatalf(\"reading bigWig failed: %v\", record.Error)\n    }\n    fmt.Println(record)\n  }\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpbenner%2Fgonetics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpbenner%2Fgonetics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpbenner%2Fgonetics/lists"}