{"id":42060195,"url":"https://github.com/statgen/swiss","last_synced_at":"2026-01-26T07:38:51.845Z","repository":{"id":17180192,"uuid":"19947570","full_name":"statgen/swiss","owner":"statgen","description":"Software to help identify overlap between association scan results and GWAS hit catalogs. ","archived":false,"fork":false,"pushed_at":"2022-08-26T19:28:39.000Z","size":22731,"stargazers_count":15,"open_issues_count":9,"forks_count":6,"subscribers_count":2,"default_branch":"master","last_synced_at":"2026-01-14T14:47:43.932Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/statgen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-05-19T15:38:26.000Z","updated_at":"2025-01-05T19:14:11.000Z","dependencies_parsed_at":"2022-09-09T19:24:51.501Z","dependency_job_id":null,"html_url":"https://github.com/statgen/swiss","commit_stats":null,"previous_names":["welchr/swiss"],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/statgen/swiss","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statgen%2Fswiss","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statgen%2Fswiss/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statgen%2Fswiss/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statgen%2Fswiss/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/statgen","download_url":"https://codeload.github.com/statgen/swiss/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statgen%2Fswiss/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28769853,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-26T06:37:25.426Z","status":"ssl_error","status_checked_at":"2026-01-26T06:37:23.039Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-26T07:38:50.003Z","updated_at":"2026-01-26T07:38:51.834Z","avatar_url":"https://github.com/statgen.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Swiss\n\n* [Synopsis](#synopsis)\n* [Requirements](#requirements)\n* [Download](#download)\n* [Installation](#installation)\n* [Usage](#usage)\n  * [Simple example](#simple-example)\n  * [Genome build](#genome-build)\n  * [Association result formats](#association-result-formats)\n    * [Simple format](#simple-format)\n    * [EPACTS multi-assoc format](#epacts-multi-assoc-format)\n  * [LD sources ](#ld-sources)\n  * [Filtering results](#filtering-results)\n  * [Clumping options](#clumping-options)\n    * [LD based clumping](#ld-based-clumping)\n    * [Distance based clumping](#distance-based-clumping)\n  * [GWAS catalogs](#gwas-catalogs)\n  * [GWAS catalog lookups](#gwas-catalog-lookups)\n  * [Output from Swiss](#output-from-swiss)\n  * [Common command-lines used](#common-command-lines-used)\n* [Generate GWAS catalog](#generate-gwas-catalog)\n* [Options](#options)\n* [Limitations](#limitations)\n* [Changes](#changes)\n* [License](#license)\n\n## Synopsis\n\nSwiss is a tool for pruning association scan results from a GWAS or sequencing study, and identifying regions near or in LD with previously reported GWAS signals.\n\nSwiss implements the following procedure:\n\n* Prune a list of variants using LD or distance, keeping the best variant by p-value (very similar to PLINK's method.)\n* Identify which of the pruned variants are near, or in LD, with previously reported GWAS signals\n\nSwiss supports two main formats:\n\n* A tab-delimited file of association results with the usual columns (CHROM, POS, SNP, PVAL)\n* An EPACTS multi-assoc file containing association p-values across a number of traits\n\n## Requirements\n\n* Python 2.7 (not 3.x)\n* Linux (tested on Ubuntu)\n* Tabix (available as a part of SAMtools, http://samtools.sourceforge.net/)\n* PLINK 1.9 or greater (https://www.cog-genomics.org/plink2)\n\nBoth tabix and plink should be somewhere on your $PATH ideally, or alternatively you must specify their locations in the config file. Use `swiss --list-files` to find the config file.\n\n## Download\n\nThe latest version is:\n\n| Version | Date       | Install                                                         |\n|---------|------------|-----------------------------------------------------------------|\n| 1.1.1 | 10/31/2019 | `pip install git+https://github.com/welchr/swiss.git@v1.1.1`  |\n\nPlease see the [changelog](#changes) for a list of recent bug fixes and new features.\n\n## Installation\n\n### 1. Install swiss\n\nYou can install directly from the tarball as a regular python package:\n\n```bash\n# Install globally\npip install git+https://github.com/welchr/swiss.git@v1.0b4\n\n# Install in ~/.local/ instead\npip install --user git+https://github.com/welchr/swiss.git@v1.0b4\n```\n\nIf you don't have administrator privileges on your machine, you can install into your home directory by adding `--user`. This causes pip to install packages into `~/.local/lib/python2.7/site-packages/`, and binaries/scripts into `~/.local/bin/`. In this case, you will want to make sure `~/.local/bin/` is in your $PATH (`export PATH=\"/home/\u003cuser\u003e/.local/bin:$PATH\"`).\n\nAn alternative would be to install into a virtualenv, to keep swiss encapsulated away from your main python packages:\n\n```bash\nvirtualenv swiss\nsource swiss/bin/activate\npip install git+https://github.com/welchr/swiss.git@v1.0b4\nswiss --help\n```\n\nIf you're using anaconda/miniconda, and prefer to use conda environments rather than virtualenv, you could do:\n\n```bash\nconda create -n swiss\nsource activate swiss\npip install git+https://github.com/welchr/swiss.git@v1.0b4\nswiss --help\n```\n\n### 2. Install required dependencies\n\nSwiss requires these two programs to function:\n\n* Tabix (available as a part of SAMtools, http://samtools.sourceforge.net/)\n* PLINK 1.9 or greater (https://www.cog-genomics.org/plink2)\n\nMake sure both are installed and somewhere on your $PATH.\n\nAlternatively, you can create a user config (follow instructions by `swiss --list-files`) and use this to specify the paths to the plink and tabix binaries.\n\n### 3. Download supporting data files (optional)\n\nIf you're planning to run swiss with your own GWAS catalog and LD files, you can skip this step. Otherwise, after installing (above), you can download all supporting data by doing:\n\n```\nswiss --download-data\n```\n\nThis tries to install data into your user data directory (typically ~/.local/share/swiss on nix systems). If you want to use a different directory, copy the config file (follow instructions from `swiss --list-files`) and change the `data_dir` parameter.\n\n## Usage\n\n### Simple example\n\n```bash\nswiss --assoc my_file.txt --ld-clump --clump-p 5e-08 --out my_results\n```\n\n### Genome build\n\nYou should always specify which genome build you're working in by using `--build`. By default, the build is hg19.\n\nAdditionally, if you specify your own GWAS catalog, or VCF files for calculating LD, you should verify that the positions for these match the genome build of your association results.\n\n### Association result formats\n\n#### Simple format\n\nThe simplest format looks like your typical association results:\n\n| CHR | POS | REF | ALT | MARKER_ID | PVALUE |\n|-----|-----|-----|-----|-----------|--------|\n| 1   | 10  | A   | G   | 1:10_A/G  | 5e-08  |\n| 3   | 400 | C   | T   | 3:400_C/T | 1e-09  |\n\nYou can specify the delimiter with `--delim` and the names of the columns with `--variant-col`, `--chrom-col`, `--pos-col`, `--pval-col`. The defaults are listed below under options.\n\nThe \"variant\" column ideally is all EPACTS-formatted IDs (chr:pos_ref/alt). If they are not, then you **must** have a CHR, POS, REF, and ALT column so that these types of IDs can be constructed.\n\nIf you're analyzing multiple files, 1 per trait, you may want to tell swiss the name of your trait using `--trait \u003ctrait\u003e`. This will include a TRAIT column in your output, which can be useful for joining results together later.\n\nThe file can be gzipped.\n\n#### EPACTS multi-assoc format\n\nAdditionally, you can tell Swiss that your file is an EPACTS multi-assoc file with the `--multi-assoc` flag. This type of file looks like the following:\n\n| #CHROM | BEG   | END   | MARKER_ID      | NS   | AC       | CALLRATE | GENOCNT  | MAF     | TRAIT1.P | TRAIT1.B | TRAIT2.P | TRAIT2.B |\n|--------|-------|-------|----------------|------|----------|----------|----------|---------|----------|----------|----------|----------|\n| 1      | 15903 | 15903 | 1:15903_G/GC   | 8448 | 14459.66 | 1        | 0/3/8445 | 0.1442  | 0.5      | 0.195    | 0.659    | 0.128    |\n| 1      | 19190 | 19191 | 1:19190_GC/G   | 8448 | 98.23    | 1        | 8448/0/0 | 0.00581 | 0.703    | 0.266    | 0.588    | -0.379   |\n| 1      | 20316 | 20317 | 1:20316_GA/G   | 8448 | 120.46   | 1        | 8448/0/0 | 0.00713 | 0.714    | -0.512   | 0.645    | 0.644    |\n| 1      | 30967 | 30970 | 1:30967_CCCA/C | 8448 | 47.35    | 1        | 8448/0/0 | 0.0028  | 0.322    | 3.15     | 0.296    | 3.32     |\n| 1      | 51972 | 51975 | 1:51972_GGAC/G | 8448 | 268.34   | 1        | 8448/0/0 | 0.01588 | 0.673    | 0.301    | 0.866    | -0.121   |\n| 1      | 53138 | 53140 | 1:53138_TAA/T  | 8448 | 402.05   | 1        | 8448/0/0 | 0.0238  | 0.368    | -0.768   | 0.905    | -0.103   |\n| 1      | 54421 | 54421 | 1:54421_A/G    | 8448 | 422.81   | 1        | 8448/0/0 | 0.02502 | 0.367    | -0.776   | 0.98     | -0.0215  |\n| 1      | 66221 | 66221 | 1:66221_A/AT   | 8448 | 338.19   | 1        | 8448/0/0 | 0.02002 | 0.0378   | 1.24     | 0.211    | 0.747    |\n| 1      | 66222 | 66223 | 1:66222_TA/T   | 8448 | 298.81   | 1        | 8448/0/0 | 0.01769 | 0.0653   | 1.13     | 0.314    | 0.615    |\n\nThere are a set of columns (.P, .B) for each trait that was analyzed. The file is tab-delimited, and gzipped.\n\nExample command line:\n\n```bash\nswiss --assoc results.epacts.gz --multi-assoc --out my_results\n```\n\nBy default, swiss will try to run on every single trait given in the file. However, if you only wish to look at a single trait, you can use `--trait` instead:\n\n```bash\nswiss --assoc results.epacts.gz --multi-assoc --out my_results --trait TRAIT1\n```\n\nIf you're running on a machine with multiple CPU cores, you can ask swiss to do multiple traits from the multi-assoc file at the same time by telling it how many to run with `-T \u003cnum of parallel jobs\u003e`. Please remember these run on the same machine, and not on the cluster - **do not overwhelm the machine!**\n\n### LD sources\n\nSwiss comes with a few built-in sources of LD information:\n\n```bash\nswiss --list-ld-sources\n\nBuild      LD Sources\n-----      ----------\nhg19       1000G_2012-03_AFR, 1000G_2012-03_AMR, 1000G_2012-03_ASN, 1000G_2012-03_EUR, GOT2D_2011-11\n```\n\nYou can select different sources to use when LD pruning results, and when looking for GWAS catalog variants in LD. For example, you may wish to use your own genotypes for pruning (since they will cover all of your markers), but when looking for GWAS catalog variants in LD, it may be better to use a reference panel such as GoT2D for better coverage of your novel variants + known GWAS variants.\n\n* For the pruning step, use: `--ld-clump-source \u003cname\u003e`.\n* For the GWAS catalog LD lookup step, use: `--ld-gwas-source \u003cname\u003e`.\n\nBoth options can be the same (and in fact, if you only specify one of them, *it assumes you meant to use that source for both.*)\n\nYou can always provide a VCF directly to use instead of selecting a built-in one:\n\n```bash\nswiss --ld-clump-source /path/to/vcf.gz\n```\n\nIf you have multiple VCF files split up across chromosomes, you can specify a .json file that maps chromosomes to VCF files:\n\n```bash\nswiss --ld-clump-source /path/to/vcfmap.json\n```\n\nWhere the `vcfmap.json` file looks like:\n\n```\n{\n  \"1\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr1.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"10\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr10.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"11\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr11.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"12\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr12.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"13\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr13.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"14\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr14.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"15\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr15.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"16\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr16.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"17\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr17.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"18\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr18.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"19\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr19.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"2\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr2.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"20\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr20.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"21\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr21.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"22\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr22.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"3\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr3.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"4\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr4.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"5\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr5.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"6\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr6.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"7\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr7.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"8\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr8.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"9\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chr9.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\",\n  \"X\": \"/net/got2d/cfuchsb/T2Dgo/paper/data/2657/GoT2D.chrX.final_integrated_snps_indels_sv_beagle_thunder.qc.vcf.gz\"\n}\n```\n\nJSON format is a little fussy, so be careful. Make sure to use **double quotes** like above.\n\n### Filtering results\n\nIf you provided an imputation quality column in your association results (specified with `--rsq-col`), swiss can remove variants below a certain threshold using `--rsq-filter \u003cthreshold\u003e`.\n\n### Clumping options\n\n#### LD based clumping\n\nSwiss can clump your association results using LD. The result being that only the best variants by p-value are kept first, and the remaining variants in LD with it are dropped.\n\n```bash\nswiss --ld-clump --ld-clump-source GOT2D_2011-11 --clump-ld-thresh 0.8 --clump-p 4e-09\n```\n\nIn the example above, variants in LD (r2) \u003e 0.8 with the top variant per region are removed, and only variants with a p-value \u003c 4e-09 are  considered at all.\n\n#### Distance based clumping\n\nSimilarly, you can prune based on distance. The best variants by p-value are retained, the remaining variants within X distance are dropped, and this process is continued until no variants remain to be considered.\n\n```bash\nswiss --dist-clump --clump-dist 250000\n```\n\nIn the example, variants within 250kb of the best p-value variant are removed, and so forth.\n\n### GWAS catalogs\n\nSwiss supports two types of GWAS catalogs: built-in ones that come with the program, and user-supplied catalogs.\n\nThe built-in catalogs can be found by doing:\n\n```bash\nswiss --list-gwas-cats\n\nBuild      Catalog\n-----      -------\nhg19       ebi\n```\n\nThen you can select the catalog to use by `--gwas-cat fusion`, for example. Build is selected with `--build hg19`.\n\nThe fusion catalog is an internal one maintained by our group here.\n\nIf you'd like a list of traits contained in a particular catalog:\n\n```bash\nswiss --list-gwas-traits\n\nAvailable traits for GWAS catalog 'fusion':\n\nAPOA1B\n------\n\nApoA1\nApoB\nApoB/ApoA1\n\nAmino acids clumped\n-------------------\n\n2-aminobutyrate\n2-hydroxyisobutyrate\n3-(4-hydroxyphenyl)lactate\n3-(4-hydroxyphenyl)lactate/ alpha-hydroxyisovalerate\n3-phenylpropionate (hydrocinnamate)\n4-acetamidobutanoate/ X-03056\n5-oxoproline\n```\n\nYou can also specify your own GWAS catalog by giving a filename instead of a codename for the catalog, like: `--gwas-cat /path/to/my/gwascat.tab`.\n\nThe GWAS catalog format looks like the following (tab-delimited):\n\n| VARIANT    | EPACTS           |   CHR |       POS | REF   | ALT   | GROUP              | PHENO              |   LOG_PVAL |\n|:-----------|:-----------------|------:|----------:|:------|:------|:-------------------|:-------------------|-----------:|\n| rs964184   | 11:116648917_G/C |    11 | 116648917 | G     | C     | Vitamin E levels   | Vitamin E levels   |      11.1  |\n| rs2108622  | 19:15990431_C/T  |    19 |  15990431 | C     | T     | Vitamin E levels   | Vitamin E levels   |      10    |\n| rs11057830 | 12:125307053_G/A |    12 | 125307053 | G     | A     | Vitamin E levels   | Vitamin E levels   |       8.1  |\n| rs3130573  | 6:31106268_A/G   |     6 |  31106268 | A     | G     | Systemic sclerosis | Systemic sclerosis |       9.22 |\n| rs6457617  | 6:32663851_C/T   |     6 |  32663851 | C     | T     | Systemic sclerosis | Systemic sclerosis |      36.7  |\n\n\nIt can contain additional columns, for example you may have citations along with each hit or other supporting information:\n\n| VARIANT    | EPACTS           | CHRPOS       |   CHR |       POS | REF   | ALT   | PHENO              | GROUP              |   LOG_PVAL | CITATION                       | RISK_ALLELE   |   RISK_AL_FREQ | GENE_LABEL         |   OR_BETA |\n|:-----------|:-----------------|:-------------|------:|----------:|:------|:------|:-------------------|:-------------------|-----------:|:-------------------------------|:--------------|---------------:|:-------------------|----------:|\n| rs964184   | 11:116648917_G/C | 11:116648917 |    11 | 116648917 | G     | C     | Vitamin E levels   | Vitamin E levels   |      11.1  | Major JM et al.  Hum Mol Genet | G             |           0.15 | ZNF259,APOA5,BUD13 |      0.04 |\n| rs2108622  | 19:15990431_C/T  | 19:15990431  |    19 |  15990431 | C     | T     | Vitamin E levels   | Vitamin E levels   |      10    | Major JM et al.  Hum Mol Genet | T             |           0.21 | CYP4F2             |      0.03 |\n| rs11057830 | 12:125307053_G/A | 12:125307053 |    12 | 125307053 | G     | A     | Vitamin E levels   | Vitamin E levels   |       8.1  | Major JM et al.  Hum Mol Genet | A             |           0.15 | SCARB1             |      0.03 |\n| rs3130573  | 6:31106268_A/G   | 6:31106268   |     6 |  31106268 | A     | G     | Systemic sclerosis | Systemic sclerosis |       9.22 | Allanore Y et al.  PLoS Genet  | G             |           0.32 | PSORS1C1           |      1.25 |\n| rs6457617  | 6:32663851_C/T   | 6:32663851   |     6 |  32663851 | C     | T     | Systemic sclerosis | Systemic sclerosis |      36.7  | Allanore Y et al.  PLoS Genet  | T             |           0.5  | HLA,DQB1           |      1.61 |\n\nThe extra columns will be included with the output from Swiss.\n\n### GWAS catalog lookups\n\nAfter LD or distance based clumping, Swiss will look for GWAS catalog hits that are near, or in LD, with your clumped/top variants. It does both and generates two files, one for each:\n\n* prefix.ld-gwas.tab - file contains GWAS catalog variants that were in LD with your top variants after clumping\n* prefix.near-gwas.tab - contains GWAS catalogs near your top variants by distance\n\nYou can control the LD threshold using `--gwas-cat-ld \u003cthreshold\u003e` and distance threshold using `--gwas-cat-dist \u003cthreshold\u003e`.\n\nSwiss normally only includes columns from the GWAS catalog (as well as a few relevant columns from your association results) in these files. If you want to include additional columns from your assoc file:\n\n```bash\nswiss --assoc my_assoc.txt --include-cols \"RSQ,EFF_AL,EFF_FREQ\"\n```\n\n### Output from Swiss\n\nSwiss generates the two GWAS catalog lookup files (listed above), and a third .clump file containing your top variants after clumping. The files are named starting with a prefix given by `--out`, for example:\n\n```bash\nswiss --assoc my_assoc.txt --ld-clump --out prefix\n```\n\nWill create:\n\n* prefix.ld-gwas.tab\n* prefix.near-gwas.tab\n* prefix.clump\n\nThe .clump file looks like this:\n\n| #CHROM | BEG      | END      | MARKER_ID                | PVALUE   | BETA    | MRSQ    | TRAIT  | ld_with                                                                  | ld_with_values | failed_clump |\n|--------|----------|----------|--------------------------|----------|---------|---------|--------|--------------------------------------------------------------------------|----------------|--------------|\n| 11     | 60784275 | 60784275 | 11:60784275_G/A          | 4.47E-08 | -0.0992 | 0.98842 | otPUFA |                                                                          |                | pass         |\n| 11     | 60786289 | 60786289 | 11:60786289_C/T          | 3.64E-10 | -0.307  | 0.93654 | otPUFA |                                                                          |                | pass         |\n| 11     | 60859791 | 60859791 | 11:60859791_C/T_rs175133 | 9.51E-11 | 0.118   | 0.99901 | otPUFA | 11:60899767_A/G_exm915580,11:60853986_A/G,11:60859624_A/C_SNP11-60616200 | 0.40,0.60,0.61 | pass         |\n| 11     | 60866519 | 60866519 | 11:60866519_A/ACCCAG     | 1.49E-11 | -0.246  | 0.94861 | otPUFA |                                                                          |                | fail         |\n\nThe `ld_with` column gives a comma separated list of variants that were pruned away (if LD clumping was used.) The r2 values are given for each variant (in the same order) in the `ld_with_values` column.\n\nIf a variant failed LD calculation for some reason (not present in the VCF file, variant was an indel, etc.) the `failed_clump` column will say **fail**. The program will also generate a warning while running.\n\nThe .ld-gwas.tab and .near-gwas.tab files are very similar (removing some columns for brevity):\n\n| ASSOC_MARKER    | ASSOC_CHRPOS | ASSOC_TRAIT | GWAS_SNP   | GWAS_CHRPOS | ASSOC_GWAS_LD | GWAS_GENE_LABEL | GWAS_Group | GWAS_PHENO | GWAS_P_VALUE |\n|-----------------|--------------|-------------|------------|-------------|---------------|-----------------|------------|------------|--------------|\n| 15:58683366_A/G | 15:58683366  | TotFA       | rs4775041  | 15:58674695 | 0.54800787    | LIPC            | Lipids     | HDL        | 3.20E-20     |\n| 15:58683366_A/G | 15:58683366  | TotFA       | rs4775041  | 15:58674695 | 0.54800787    | LIPC            | Lipids     | TG         | 1.60E-08     |\n| 15:58683366_A/G | 15:58683366  | TotFA       | rs10468017 | 15:58678512 | 0.636239711   | LIPC            | Lipids     | HDL        | 8.00E-23     |\n| 15:58683366_A/G | 15:58683366  | TotFA       | rs1532085  | 15:58683366 | 1             | LIPC            | Lipids     | HDL        | 1.00E-188    |\n\n* ASSOC_MARKER: Variant from your clumped association results (the top hit.)\n* ASSOC_CHRPOS: CHR:POS naming for the variant\n* ASSOC_TRAIT: Either taken from the multi-assoc file, or specified with `--trait`.\n* GWAS_SNP: The GWAS catalog variant that your ASSOC_MARKER is in LD with.\n* ASSOC_GWAS_LD: The r2 between the GWAS_SNP and the ASSOC_MARKER.\n* GWAS_PHENO: The phenotype associated with the GWAS_SNP according to the GWAS catalog.\n* GWAS_P_VALUE: P-value reported in the GWAS catalog.\n\nThe .near-gwas.tab file has ASSOC_GWAS_DIST instead of ASSOC_GWAS_LD, and denotes the distance between the ASSOC_MARKER and the GWAS_SNP.\n\n### Common command-lines used\n\n```bash\nswiss --assoc example.multiassoc.epacts.gz --multi-assoc \\\n--build hg19 --ld-clump-source /net/snowwhite/home/welchr/projects/FFA/metsim_got2d_exomechip.json \\\n--ld-gwas-source /net/snowwhite/home/welchr/projects/FFA/metsim_got2d_exomechip.json \\\n--gwas-cat nhgri --ld-clump --clump-p 5e-08 --out example\n```\n\nThe command above will:\n\n* Run on an EPACTS multiassoc file (and do all traits. To do a single trait, use `--trait`).\n* Use LD clumping to prune variants, and use VCF files specified by metsim_got2d_exomechip.json to do it\n* Remove any variant with p-value \u003e 5e-08\n* Use the NHGRI GWAS catalog for looking up GWAS variants in LD with top signals\n* Again use the VCFs specified by metsim_got2d_exomechip.json to find GWAS variants in LD with top signals\n\n---\n\n```bash\nswiss --assoc my_results.tab --delim tab --chrom-col CHROM --pos-col POS --pval-col PVAL --snp-col SNP \\\n--rsq-col RSQ --rsq-filter 0.3 \\\n--build hg19 --ld-clump-source 1000G_2012-03_EUR --ld-gwas-source 1000G_2012-03_EUR \\\n--gwas-cat nhgri --dist-clump --clump-p 5e-08 --clump-dist 500000 --out example\n```\n\nThe command above will:\n\n* Run on a simple tab-delimited format of GWAS association results, specifying the column names directly\n* Filter variants on imputation quality 0.3\n* Clump results using distance of 500kb, and also remove variants with p \u003e 5e-08\n* Use 1000G EUR to both LD clump AND find GWAS variants in LD with top signals\n\n## Generate GWAS catalog\n\nInstead of waiting for data releases from `swiss --download-data` (which\ncontain a GWAS catalog from EBI), you can generate your own up to date\ncatalog with the `swiss-create-data` script.\n\nNote that this script downloads some rather large files from NCBI, in\norder to translate GWAS catalog variants into CHR/POS/REF/ALT.\n\nThe process takes roughly an hour or two depending on your internet\nconnection.\n\nTo generate a new catalog:\n\n```\nswiss-create-data --genome-build GRCh37p13 --dbsnp-build b147\n```\n\nThis will create two files:\n\n```\n-rw-r----- 1 user user  22G Nov 30 18:39 GRCh37p13_b147.sqlite\n-rw-r----- 1 user user 1.7M Nov 30 18:39 gwascat_ebi_GRCh37p13.tab\n```\n\nThe first file is a SQLite database created from the downloaded NCBI\ndbSNP VCF. The second file is the processed GWAS catalog that can be\nused by swiss.\n\nTo use the catalog, you can either provide the path to it directly by\nusing `--gwas-cat /path/to/gwascat_ebi_GRCh37p13.tab`, or you can modify\nthe config file (see `swiss --list-files`) and add an entry for it\nthere.\n\n## Options\n\n```\nusage: swiss [options]\n\n  -h, --help\n    show this help message and exit\n\n  --list-files\n    Show the locations of files in use by swiss.\n    Default value is: False\n\n  --download-data\n    Download pre-formatted and compiled data (LD, GWAS catalogs, etc.)\n    Default value is: False\n\n  --assoc \u003cstring\u003e\n    [Required] Association results file.\n\n  --multi-assoc\n    Designate that the results file is in EPACTS multi-assoc format.\n    Default value is: False\n\n  --trait \u003cstring\u003e\n    Description of phenotype for association results file. E.g. 'HDL' or 'T2D'\n\n  --delim \u003cstring\u003e\n    Association results delimiter.\n    Default value is: tab\n\n  --build \u003cstring\u003e\n    Genome build your association results are anchored to.\n    Default value is: hg19\n\n  --variant-col \u003cstring\u003e\n    Variant column name in results file.\n    Default value is: MARKER_ID\n\n  --pval-col \u003cstring\u003e\n    P-value column name in results file.\n    Default value is: PVALUE\n\n  --chrom-col \u003cstring\u003e\n    Chromosome column name in results file.\n    Default value is: CHR\n\n  --pos-col \u003cstring\u003e\n    Position column name in results file.\n    Default value is: POS\n\n  --rsq-col \u003cstring\u003e\n    Imputation quality column name.\n    Default value is: RSQ\n\n  --trait-col \u003cstring\u003e\n    Trait column name. Can be omitted, in which case the value of --trait will be added as a column.\n    Default value is: None\n\n  --rsq-filter \u003cstring\u003e\n    Remove variants below this imputation quality.\n    Default value is: None\n\n  --filter \u003cstring\u003e\n    Give a general filter string to filter variants.\n    Default value is: None\n\n  --out \u003cstring\u003e\n    Prefix for output files.\n    Default value is: swiss_output\n\n  --ld-clump\n    Clump association results by LD.\n    Default value is: False\n\n  --clump-p \u003cstring\u003e\n    P-value threshold for LD and distance based clumping.\n    Default value is: 5e-08\n\n  --clump-ld-thresh \u003cfloat\u003e\n    LD threshold for clumping.\n    Default value is: 0.2\n\n  --clump-ld-dist \u003cint\u003e\n    Distance from each significant result to calculate LD.\n    Default value is: 1000000\n\n  --dist-clump\n    Clump association results by distance.\n    Default value is: False\n\n  --clump-dist \u003cint\u003e\n    Distance threshold to use for clumping based on distance.\n    Default value is: 250000\n\n  --ld-clump-source \u003cstring\u003e\n    Name of pre-configured LD source, or a VCF file from which to compute LD.\n    Default value is: 1000G_2012-03_EUR\n\n  --list-ld-sources\n    Print a list of available LD sources for each genome build.\n    Default value is: False\n\n  --gwas-cat \u003cstring\u003e\n    GWAS catalog to use.\n    Default value is: ebi\n\n  --ld-gwas-source \u003cstring\u003e\n    Name of pre-configured LD source or VCF file to use when calculating LD with GWAS variants.\n    Default value is: 1000G_2012-03_EUR\n\n  --list-gwas-cats\n    Give a listing of all valid GWAS catalogs and their descriptions.\n    Default value is: False\n\n  --list-gwas-traits\n    List all of the available traits in a selected GWAS catalog.\n    Default value is: False\n\n  --list-gwas-trait-groups\n    List all of the available groupings of traits in a selected GWAS catalog.\n    Default value is: False\n\n  --gwas-cat-p \u003cfloat\u003e\n    P-value threshold for GWAS catalog variants.\n    Default value is: 5e-08\n\n  --gwas-cat-ld \u003cfloat\u003e\n    LD threshold for considering a GWAS catalog variant in LD.\n    Default value is: 0.1\n\n  --gwas-cat-dist \u003cint\u003e\n    Distance threshold for considering a GWAS catalog variant 'nearby'.\n    Default value is: 250000\n\n  --include-cols \u003cstring\u003e\n    List of columns to merge in from association results (grouped by variant.)\n    Default value is: None\n\n  --do-overlap-check\n    Perform the check of whether the GWAS catalog has variants that are not in your --ld-gwas-source.\n    Default value is: False\n\n  --skip-gwas\n    Skip the step of looking for GWAS hits in LD with top variants after clumping.\n    Default value is: False\n\n  --cache \u003cstring\u003e\n    Prefix for LD cache.\n    Default value is: ld_cache\n\n  -T, --threads \u003cint\u003e\n    Number of parallel jobs to run. Only works with --multi-assoc currently.\n    Default value is: 1\n\n  --version\n    Print version and exit.\n    Default value is: False\n```\n\n## Limitations\n\nThe latest human genome build (hg38) is not yet supported.\n\n## Changes\n\n1.1.1 - 10/31/2019\n\nBug fixes:\n\n* Fix crash in `swiss-create-data` caused by invalid unicode characters in rsIDs from GWAS catalog\n\n1.1.0 - 10/24/2019\n\nBug fixes:\n\n* Fixed an issue where VCFs with chromosomes specified as 'chr#' instead of simply '#' would cause swiss to send no output to PLINK, which produced a red herring \"File read failure\" message.\n\n* Existence of tabix index was not previously checked\n\n* P-values exceeding double precision were not properly handled and would result in p-value of 0.0 in result file\n\nNew features:\n\n* P-values can be provided in log10 scale now using the `--logp-col` option to denote which column contains log10 p-values. Note that this is exactly log10(p-value), and *not* -log10. The reason for this is that the most popular meta-analysis program METAL outputs log10(p) when using the LOGPVALUE ON option.\n\n1.0.0 - 08/30/2018\n\nBug fixes:\n\n* Fixed an issue with VCFs that misuse the FILTER column. Swiss now checks if \"PASS\" occurs anywhere within the FILTER column, and if it does, the variant is assumed to be OK to use. Previously, Swiss expected the column simply to contain \"PASS\" and nothing else.\n\nNew features:\n\n* Support for GRCh38. EBI GWAS catalog and 1000G phase 3 genotypes in GRCh38 coordinates are both now available. Use `swiss --download-data` to grab the latest files. You can also now use `swiss-create-data` to generate new up-to-date GWAS catalogs for both GRCh37 and GRCh38.\n\n  **Note**: if you previously customized your install by copying the default swiss.yaml to `~/.config/`, you will need to repeat this process again to see the new LD sources (or just copy them over from the bottom of the file.)\n\n* Header rows beginning with \"##\" are now skipped in association files\n\n* Paths to files being used for calculating LD will now be shown in log\n\nBackward incompatible changes:\n\n* 1000G phase 1 LD files have been removed since they are superseded by 1000G phase 3\n\n1.0.0b7 - 03/03/2018\n\nBug fixes:\n\n* Fix issue when installing latest version of bx-python requiring python-lzo which does not install nicely. There is now a requirements.txt with versions pinned.\n\n1.0.0b6 - 03/03/2018\n\nBug fixes:\n\n* Fix PLINK version detection\n\nNew features:\n\n* Allow passing arguments through to PLINK, use `--plink-args`. For example: `--plink-args '--double-id --vcf-half-call missing'`. You must quote the arguments to be passed through or the shell will expand them.\n\n1.0.0b5 - 10/03/2017\n\nSlight change in versioning scheme to more closely follow semver.\n\nBug fixes:\n\n* Previously swiss would not include the top independent variants themselves when looking for LD buddies that exist in the GWAS catalog. These would only have been picked up in the `near-gwas` scan and not the `ld-gwas` scan. Now they will correctly appear in both places. (GH [#6](../../issues/6))\n* Deprecation of `pandas.DataFrame.sort` -\u003e `sort_values`\n* Updated NCBI URLs for swiss-create-data (thank you Daniele Di Domizio)\n\nNew features:\n\n* Better accounting/printing of what is happening during GWAS catalog\n  parsing\n* Allow using existing SNP history and RsMergeArch when using\n  swiss-create-data\n* Better display of LD (and distance) clumping settings currently in use\n\n1.0b4 - 01/17/2016\n\nBug fixes:\n\n* Indels with very long alleles are now supported, previously they could\n  not be used for calculating LD due to allele length limitation in\n  PLINK\n\nNew features:\n\n* Include 1000G phase 3 (hg19/GRCh37) (re-run `swiss --download-data`)\n* Issue template for github\n\n1.0b3 - 12/26/2016\n\nBug fixes:\n\n* Unicode error when parsing catalog\n\n1.0b2 - 11/30/2016\n\nNew features:\n\n* Script to create GWAS catalog without waiting for data releases `swiss-create-data` - see [Generate GWAS catalog](#generate-gwas-catalog) for more info\n\n1.0b1 - 11/27/2016\n\n**This version has backwards incompatible changes with the previous 0.x\nreleases.**\n\nNew features:\n\n* Support for indel and other types of variants\n\n* Much improved speed in calculating LD\n\n* New option --list-files will now show the current config file and data files in use\n\n* New option --download-data to automatically download/update when new supporting data (GWAS catalog, LD files, etc.) are available\n\nBackwards incompatible changes:\n\n* Swiss is installed now as a python package, instead of a standalone directory. Some files have shifted around in locations. Use --list-files to find installed locations.\n\n* Swiss requires PLINK 1.9 or greater now to compute LD. It must exist on your $PATH, or the path must be set in the config file (see next).\n\n* Config file is no longer stored relative to the swiss root directory, but rather within the package directory. To override, you can copy the default config file to ~/.config/swiss.yaml and modify it. Use `swiss --list-files` to find the default config file.\n\n* Option --snp-col is now --variant-col. The default is \"MARKER_ID\". Variants in your association results file must contain both ref and alt alleles. This needs to be specified either 1) in the variant column, as EPACTS style IDs (chr:pos_ref/alt), or 2) there must be CHR, POS, REF, and ALT columns in the file.\n\n* The default GWAS catalog has been renamed from nhgri to ebi. Use `--gwas-cat ebi` to specify this catalog. It is currently only available for hg19/GRCh37, but the hg38 version will be generated soon.\n\n* The GWAS catalog now only contains a LOG_PVAL, rather than P_VALUE column. LOG_PVAL is -log10(p-value). As a result, .ld-gwas and .near-gwas files will have a GWAS_LOG_PVAL column, rather than the prior p-value based column.\n\n0.9.5 - 02/18/2016\n\n* Update NHGRI GWAS catalog\n\n0.9.4 - 12/4/2014\n\n* Fixes a potential installation issue on Debian where virtualenv would not install pip and setuptools\n\n## License\n\nCopyright (C) 2014 Ryan Welch, The University of Michigan\n\nSwiss is free software: you can redistribute it and/or modify\nit under the terms of the GNU General Public License as published by\nthe Free Software Foundation, either version 3 of the License, or\n(at your option) any later version.\n\nSwiss is distributed in the hope that it will be useful,\nbut WITHOUT ANY WARRANTY; without even the implied warranty of\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\nGNU General Public License for more details.\n\nYou should have received a copy of the GNU General Public License\nalong with this program.  If not, see \u003chttp://www.gnu.org/licenses/\u003e.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstatgen%2Fswiss","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstatgen%2Fswiss","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstatgen%2Fswiss/lists"}