https://github.com/statgen/pheweb
A tool to build a website to browse hundreds or thousands of GWAS.
https://github.com/statgen/pheweb
Last synced: 5 months ago
JSON representation
A tool to build a website to browse hundreds or thousands of GWAS.
- Host: GitHub
- URL: https://github.com/statgen/pheweb
- Owner: statgen
- License: mit
- Created: 2016-02-20T02:53:21.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2025-09-11T16:58:03.000Z (9 months ago)
- Last Synced: 2025-09-25T11:32:35.207Z (9 months ago)
- Language: Python
- Homepage:
- Size: 11.3 MB
- Stars: 185
- Watchers: 10
- Forks: 75
- Open Issues: 41
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
**If you're starting fresh, check out [PheWeb2](https://github.com/GaglianoTaliun-Lab/PheWeb2). It adds stratification by ancestry/sex, genotype-by-sex interactions, dynamic Miami Plots, a real API, and many other improvements.**
---
For a list of available instances of PheWeb, navigate [here](http://pheweb.sph.umich.edu).
For a walk-through demo see [here](etc/demo.md#demo-navigating-pheweb).
If you have questions or comments, check out our [Google Group](https://groups.google.com/g/pheweb-umich).

# How to Cite PheWeb
If you use the PheWeb code base for your work, please cite our paper:
Gagliano Taliun, S.A., VandeHaar, P. et al. Exploring and visualizing large-scale genetic associations by using PheWeb. *Nat Genet* 52, 550–552 (2020).
# How to Build a PheWeb for your Data
If this is broken, [open an issue on github](https://github.com/statgen/pheweb/issues/new) and hopefully I can help.
### 1. Install PheWeb
```bash
pip3 install pheweb
```
- If that doesn't work, follow [the detailed install instructions](etc/detailed-install-instructions.md#detailed-install-instructions).
### 2. Create a directory and `config.py` for your new dataset
```
mkdir ~/my-new-pheweb && cd ~/my-new-pheweb
```
This directory will store all the files pheweb makes for your dataset. All `pheweb ...` commands should be run in this directory.
Make `config.py` in this directory. In it, either set `hg_build_number = 19` or `hg_build_number = 38`. Other options you can set are listed [here](etc/detailed-loading-instructions.md#configuration-options).
### 3. Check that your GWAS summary statistics files will work
You need one file for each phenotype. Most common GWAS file formats should work. Here are the requirements:
- It needs a header row.
- Columns can be delimited by tabs, spaces, or commas.
- It needs a column for the reference allele (which must always match the bases on the reference genome that you specified with `hg_build_number`) and a column for the alternate allele. If you have a `MARKER_ID` column like `1:234_C/G`, that's okay too. If you have an allele1 and allele2, and sometimes one or the other is the reference, then you'll need to modify your files.
- It can be gzipped if you want.
- Variants must be sorted by chromosome and position, with chromosomes in the order [1-22,X,Y,MT].
The file must have columns for:
| column description | name | other allowed column names | allowed values |
| --- | --- | --- | --- |
| chromosome | `chrom` | `#chrom`, `chr` | 1-22, `X`, `Y`, `M`, `MT`, `chr1`, etc |
| position | `pos` | `beg`, `begin`, `bp` | integer |
| reference allele | `ref` | `reference` | must match reference genome |
| alternate allele | `alt` | `alternate` | anything |
| p-value | `pval` | `pvalue`, `p`, `p.value` | number in [0,1] |
You may also have columns for:
| column description | name | other allowed column names | allowed values |
| --- | --- | --- | --- |
| minor allele frequency | `maf` | | number in (0,0.5] |
| allele frequency (of alternate allele) | `af` | `a1freq`, `frq` | number in (0,1) |
| AF among cases | `case_af` | `af.cases` | number in (0,1) |
| AF among controls | `control_af` | `af.controls` | number in (0,1) |
| allele count | `ac` | | integer |
| effect size (of alternate allele) | `beta` | | number |
| standard error of effect size | `sebeta` | `se` | number |
| odds ratio (of alternate allele) | `or` | | number |
| R2 | `r2` | | number |
| number of samples | `num_samples` | `ns`, `n` | integer, must be the same for every variant in its phenotype |
| number of controls | `num_controls` | `ns.ctrl`, `n_controls` | integer, must be the same for every variant in its phenotype |
| number of cases | `num_cases` | `ns.case`, `n_cases` | integer, must be the same for every variant in its phenotype |
Column names are case-insensitive. If your file has a different column name, set `field_aliases = {"column_name": "field_name"}` in `config.py`. For example, `field_aliases = {'P_BOLT_LMM_INF': 'pval', 'NSAMPLES': 'num_samples'}`.
Any field can be null if it is one of ['', '.', 'NA', 'N/A', 'n/a', 'nan', '-nan', 'NaN', '-NaN', 'null', 'NULL']. If a required field is null, the variant gets dropped.
If your pval is log10 (like in REGENIE output), then set these variables in config.py: `pval_is_neglog10 = True` and `field_aliases = {'LOGP':'pval'}`.
### 4. Make a list of your phenotypes
Inside of your data directory, you need a file named `pheno-list.json` that looks like this:
```json
[
{
"assoc_files": ["/home/peter/data/ear-length.gz"],
"phenocode": "ear-length"
},
{
"assoc_files": ["/home/peter/data/a1c.X.gz","/home/peter/data/a1c.autosomal.gz"],
"phenocode": "A1C"
}
]
```
Each phenotype needs `assoc_files` (a list of paths to association files) and `phenocode` (a string representing your phenotype that is used in filenames and URLs, comprised of `[A-Za-z0-9_~-]`).
If you want, you can also include:
- `phenostring` (string): a name for the phenotype. Shown in tables and tooltips and page headers.
- `category` (string): groups together phenotypes in the PheWAS plot. Shown in tables and tooltips.
- `num_cases`, `num_controls`, and/or `num_samples` (number): if your input data only has `AC` or `MAC`, this will be used to calculated `AF` or `MAF`. Shown in tooltips. If your input data has correctly-named columns for these, the command `pheweb phenolist read-info-from-association-files` will add them into your existing `pheno-list.json`.
- anything else you want, but you'll have to modify templates to use it.
You can use a csv by running:
```
pheweb phenolist import-phenolist "/path/to/pheno-list.csv"
```
or you can make one from scratch by running:
```
pheweb phenolist glob --star-is-phenocode "/home/peter/data/*.gz"
```
You can see other methods [here](etc/detailed-loading-instructions.md#making-pheno-listjson).
### 5. Load your association files
Run `pheweb process`.
To distribute jobs across a cluster, follow [these instructions](etc/detailed-loading-instructions.md#distributing-jobs-across-a-cluster).
To include VEP annotations, follow [these instructions](etc/detailed-loading-instructions.md#annotating-with-vep).
If something breaks and you can't understand the error message or it's something that PheWeb should support by default, [open an issue on github](https://github.com/statgen/pheweb/issues/new) or email me.
### 6. Serve the website
Run `pheweb serve --open`.
That command should either open a browser to your new PheWeb, or it should give you a URL that you can open in your browser to access your new PheWeb.
If it doesn't, follow [the directions for hosting a PheWeb and accessing it from your browser](etc/detailed-webserver-instructions.md#hosting-a-pheweb-and-accessing-it-from-your-browser).
### More options:
To run pheweb through systemd, see sample file [here](etc/pheweb.service).
To use Apache2 or Nginx, see instructions [here](etc/detailed-webserver-instructions.md#using-apache2-or-nginx).
To require login via OAuth, see instructions [here](etc/detailed-webserver-instructions.md#using-oauth).
To track page views with Google Analytics, see instructions [here](etc/detailed-webserver-instructions.md#using-google-analytics).
To reduce storage use, see instructions [here](etc/detailed-webserver-instructions.md#reducing-storage-use).
To customize page contents, see instructions [here](etc/detailed-webserver-instructions.md#customizing-page-contents).
PheWeb can display genetic correlations generated by [another tool](https://github.com/statgen/pheweb-rg-pipeline).
To use this feature, set `show_correlations = True` in `config.py` and place the output of the rg pipeline as `pheno-correlations.txt` in the same folder as `pheno-list.json`.
To hide the button for downloading summary stats, add `download_pheno_sumstats = "secret"` and `SECRET_KEY = "your random string"` in `config.py`. That will make a secret page (printed to the console when you start the server) to share summary stats.
To hide the button for downloading top hits and phenotypes, add `download_top_hits = "hide"` and `download_phenotypes = "hide"` respectively.
To allow dynamically filtering the manhattan plot, run `pheweb best-of-pheno` and set `show_manhattan_filter_button=True` in `config.py`.
# Modifying PheWeb
See instructions [here](etc/detailed-development-instructions.md).
See documentation about the files in `generated-by-pheweb/` [here](etc/detailed-internal-dataflow.md).