{"id":16222689,"url":"https://github.com/claymcleod/genomics-primer","last_synced_at":"2026-02-09T04:03:13.121Z","repository":{"id":149182665,"uuid":"60312217","full_name":"claymcleod/genomics-primer","owner":"claymcleod","description":null,"archived":false,"fork":false,"pushed_at":"2016-12-24T19:26:52.000Z","size":3,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-07-29T01:42:34.152Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/claymcleod.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-06-03T02:28:55.000Z","updated_at":"2016-06-03T02:28:55.000Z","dependencies_parsed_at":null,"dependency_job_id":"2d3932c1-571a-4d2c-b9ab-69146015e814","html_url":"https://github.com/claymcleod/genomics-primer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/claymcleod/genomics-primer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/claymcleod%2Fgenomics-primer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/claymcleod%2Fgenomics-primer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/claymcleod%2Fgenomics-primer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/claymcleod%2Fgenomics-primer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/claymcleod","download_url":"https://codeload.github.com/claymcleod/genomics-primer/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/claymcleod%2Fgenomics-primer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29255951,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-09T03:07:45.136Z","status":"ssl_error","status_checked_at":"2026-02-09T03:07:24.123Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-10T12:14:58.084Z","updated_at":"2026-02-09T04:03:13.097Z","avatar_url":"https://github.com/claymcleod.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"This repository contains all of the resources I have found helpful in studying genomics from the ground up.\n\n## Introduction\n    \n1. [A Brief Guide to Genomics][1] by the National Human Genome Research Institute \n2. [Overview of genomic file formats][10]. Introduction to common file types for genomics.\n3. [FastQ file specification][11]. Illumina raw data format specification. In general, this is thought to be the industry standard.\n4. [SAM/BAM file specification][9]. This is the industry standard file-type for *aligned* sequence data. Know this format like the back of your hand.\n\n## Useful software\n\n* [Samtools][5]: manipulate and perform common tasks for SAM/BAM/CRAM files.\n* [BWA aligner][4]: Industry standard WGS/WXS aligner\n* [STAR aligner][6]: Industry standard Transcriptome aligner\n* [Picard][8]: Swiss-army knife of genomics\n* [GATK][12]: Genome Analysis Toolkit, industry standard variant caller.\n* [FastQC][13]: Industry standard raw data quality check software.\n* [htseq][7]: Useful for counting gene expression\n\n## Data sources\n\n### Raw data\n\n* [1000 genomes][22]: The \"thousand genome project\" is a well-known project that houses raw/variant data from 1000 people across the world.\n* [dbGaP][14]: NIH (U.S. based) genomics repository\n* [EGA][15]: EBI (European based) genotype-phenotype repository\n\n## Variation data\n\n### General purpose\n\n* [dbSNP][16]: de-facto single nucleotide polymorphism database. Mostly research oriented.\n* [OMIM][18]: The \"Online Mendelian Inheritance in Man\" is a catalog that maps genotypes -\u003e phenotypes. A great resource for handcrafting articles and literature curation.\n\n### Clinical significance\n\n* [ClinVar][17]: clinical significance of variations for humans.\n* [PolyPhen-2][19]: The \"Polymorphism Phenotyping v2\" tool attempts to predict the functional/structural effects of a variant on a human protein.\n* [SIFT][20]: The SIFT tool attempts to predict whether an amino acid substitution affects protein function.\n* [FATHMM][21]: FATHMM stands for \"functional analysis though hidden markov models\". This tool attempts to predict the functional affect of a variant on the resulting protein.\n\n## Data science applied to genomics \n\n* [Machine Learning in Genomic Medicine][3]\n* [Machine Learning Applications in Genetics and Genomics][2]\n    \n[1]: https://www.genome.gov/18016863/a-brief-guide-to-genomics/\n[2]: http://www.nature.com/nrg/journal/v16/n6/full/nrg3920.html\n[3]: http://www.psi.toronto.edu/publications/2015/Machine%20Learning%20in%20Genomic%20Medicine-%20A%20Review%20of%20Computational%20Problems%20and%20Data%20Sets.pdf\n[4]: http://bio-bwa.sourceforge.net/\n[5]: http://samtools.github.io/\n[6]: https://github.com/alexdobin/STAR\n[7]: http://www-huber.embl.de/HTSeq/doc/index.html#\n[8]: https://broadinstitute.github.io/picard/ \n[9]: https://samtools.github.io/hts-specs/SAMv1.pdf\n[10]: https://genome.ucsc.edu/ENCODE/fileFormats.html#FASTQ\n[11]: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2847217/\n[12]: https://software.broadinstitute.org/gatk/\n[13]: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/\n[14]: https://www.ncbi.nlm.nih.gov/gap\n[15]: https://www.ebi.ac.uk/ega/home\n[16]: https://www.ncbi.nlm.nih.gov/snp\n[17]: https://www.ncbi.nlm.nih.gov/clinvar/\n[18]: https://www.omim.org/\n[19]: http://genetics.bwh.harvard.edu/pph2/\n[20]: http://sift.jcvi.org/\n[21]: http://fathmm.biocompute.org.uk/\n[22]: http://www.internationalgenome.org/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclaymcleod%2Fgenomics-primer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fclaymcleod%2Fgenomics-primer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclaymcleod%2Fgenomics-primer/lists"}