Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bluegranite/azure-synapse-vcf-analysis
Sample code for analyzing VCF files (converted to Parquet) in Azure Databricks and Synapse.
https://github.com/bluegranite/azure-synapse-vcf-analysis
azure azure-databricks azure-synapse bioinformatics computational-biology databricks genomics glow parquet spark synapse vcf
Last synced: 26 days ago
JSON representation
Sample code for analyzing VCF files (converted to Parquet) in Azure Databricks and Synapse.
- Host: GitHub
- URL: https://github.com/bluegranite/azure-synapse-vcf-analysis
- Owner: BlueGranite
- License: gpl-3.0
- Created: 2021-03-16T17:53:13.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2021-04-29T12:14:09.000Z (almost 4 years ago)
- Last Synced: 2024-11-18T16:56:39.057Z (3 months ago)
- Topics: azure, azure-databricks, azure-synapse, bioinformatics, computational-biology, databricks, genomics, glow, parquet, spark, synapse, vcf
- Homepage: https://www.bluegranite.com/genomics
- Size: 14.8 MB
- Stars: 1
- Watchers: 5
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# VCF Analysis in Azure Synapse
Sample code for analyzing VCF files in Azure Synapse (once converted to Parquet using [Glow](http://projectglow.io/)).
Colby T. Ford, Ph.D.
## Pipeline
## Sample Code
1. Convert VCF files to Parquet: [ConvertVCFsToParquet.md](ConvertVCFsToParquet.md)
2. Create External Table to VCF-based Parquet Files in Azure Synapse: [CreateVCFTable.md](CreateVCFTable.md)
3. Sample SQL Queries: [SampleQueries.md](SampleQueries.md)
## Sample Data
The sample VCF data used in this demo is from the Phase 3 release of the [1000 Genomes Project](https://www.internationalgenome.org/data/).
This includes ~168GB of data in VCFs, which can be downloaded from their [FTP site](ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/).## BlueGranite Resources
- This repository accompanies the BlueGranite blog post: https://www.bluegranite.com/blog/query-millions-of-genomic-variants-in-minutes-with-azure-synapse
- Demo video on YouTube: [https://www.youtube.com/watch?v=4B-8cviFPYU](https://www.youtube.com/watch?v=4B-8cviFPYU)
- _Building a Genomics Data Lake in Azure_ eBook: https://www.bluegranite.com/genomics-data-lake-ebook
- BlueGranite Genomics Page: https://www.bluegranite.com/genomics