{"id":13715529,"url":"https://github.com/Novartis/peax","last_synced_at":"2025-05-07T04:30:58.260Z","repository":{"id":34835374,"uuid":"150585972","full_name":"Novartis/peax","owner":"Novartis","description":"Peax is a tool for interactive visual pattern search and exploration in epigenomic data based on unsupervised representation learning with autoencoders","archived":false,"fork":false,"pushed_at":"2022-12-14T18:38:58.000Z","size":54691,"stargazers_count":68,"open_issues_count":21,"forks_count":14,"subscribers_count":5,"default_branch":"develop","last_synced_at":"2024-11-14T03:34:30.754Z","etag":null,"topics":["autoencoder","data-visualization","deep-learning","epigenomics","interactive-machine-learning","pattern-search","sequential-data"],"latest_commit_sha":null,"homepage":"http://peax.lekschas.de","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Novartis.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-09-27T12:49:07.000Z","updated_at":"2024-10-25T20:01:01.000Z","dependencies_parsed_at":"2023-01-15T09:31:05.054Z","dependency_job_id":null,"html_url":"https://github.com/Novartis/peax","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Novartis%2Fpeax","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Novartis%2Fpeax/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Novartis%2Fpeax/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Novartis%2Fpeax/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Novartis","download_url":"https://codeload.github.com/Novartis/peax/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252813693,"owners_count":21808372,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autoencoder","data-visualization","deep-learning","epigenomics","interactive-machine-learning","pattern-search","sequential-data"],"created_at":"2024-08-03T00:01:00.159Z","updated_at":"2025-05-07T04:30:53.248Z","avatar_url":"https://github.com/Novartis.png","language":"Jupyter Notebook","readme":"\u003ch1 align=\"center\"\u003e\n  Peax\n\u003c/h1\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \n  **A Visual Pattern Explorer For Epigenomic Data Using Unsupervised Deep Representation Learning**\n  \n\u003c/div\u003e\n\n\n\u003cdiv align=\"center\"\u003e\n  \n  [![Intro](https://img.shields.io/badge/7min%20intro-📺-7fd4ff.svg?style=flat-square)](https://youtu.be/FlzTdFUVE-M)\n  [![Slides](https://img.shields.io/badge/BioIT%20World%20Slides-🧬-7f99ff.svg?style=flat-square)](https://speakerdeck.com/flekschas/peax-interactive-concept-learning-for-visual-exploration-of-epigenetic-patterns)\n  [![Presentation](https://img.shields.io/badge/EuroVis%20Presentation-📊-e17fff.svg?style=flat-square)](https://youtu.be/oQG5DxqiXPI?t=299)\n  [![Paper](https://img.shields.io/badge/Publication-📖-ff7fe1.svg?style=flat-square)](https://vcg.seas.harvard.edu/pubs/peax)\n  \n\u003c/div\u003e\n\n\u003cdiv id=\"teaser\" align=\"center\"\u003e\n  \n  ![Peax's UI](teaser.png)\n  \n\u003c/div\u003e\n\nEpigenomic data expresses a rich body of diverse patterns that help to identify regulatory elements like promoter, enhancers, etc. But finding these patterns reliably genome wide is challenging. Peax is a tool for interactive visual pattern search and exploration of epigenomic patterns based on unsupervised representation learning with convolutional autoencoders. The visual search is driven by manually labeled genomic regions for actively learning a classifier to reflect your notion of interestingness.\n\n**Citation:** Lekschas et al., [Peax: Interactive Visual Pattern Search in Sequential Data Using Unsupervised Deep Representation Learning](https://vcg.seas.harvard.edu/pubs/peax),\n_Computer Graphics Forum_, 2020, doi: [10.1111/cgf.13971](https://doi.org/10.1111/cgf.13971).\n\n**More Details:** [peax.lekschas.de](http://peax.lekschas.de)\n\n## Installation\n\n**Requirements:**\n\n- [Conda](https://docs.conda.io/en/latest/) \u003e= 4.8\n\n**Install:**\n\n```bash\ngit clone https://github.com/Novartis/peax \u0026\u0026 cd peax\nmake install\n```\n\n_Do not fear, `make install` is just a convenience function for setting up conda and installing npm packages._\n\n**Notes:**\n\n- If you're a macOS user you might need to [brew](https://brew.sh) install `libpng` and `openssl` for the [pybbi](https://github.com/nvictus/pybbi) package (see [here](https://github.com/nvictus/pybbi/issues/2)) and `xz` for pysam (if you see an error related to `lzma.h`).\n\n## Overview\n\nPeax consists of three main parts:\n\n1. A server application for serving genomic and autoencoded data on the web. [[/server](server)].\n2. A user interface for exploring, visualizing, and interactively labeling genomic regions. [[/ui](ui)].\n3. A set of examples showing how to configure Peax and build your own. [[/examples](examples)]\n\n## Data\n\nWe provide 6 autoencoders trained on 3 kb, 12 kb, and 120 kb window sizes (with 25,\n100, and 1000 bp binning) on DNase-seq and histone mark ChIP-seq data (H3K4me1, H3K4me3, H3K27ac, H3K9ac, H3K27me3, H3K9me3, and H3K36me).\n\nYou can find detailed descriptions of the autoencoders at [zenodo.org/record/2609763](https://zenodo.org/record/2609763). When you follow the [Quick Start](#quick-start) instructions, you will automatically download the related autoencoders.\n\n## Quick start\n\nPeax comes with [6 autoencoders](#data) for DNase-seq and histone mark\nChIP-seq data and several example configurations for which we provide\nconvenience scripts to get you started as quickly as possible.\n\nFor instance, run one of the following commands to start Peax with a DNase-seq\ntrack for 3 kb, 12 kb, and 120 kb genomic windows.\n\n| Command              | Window Size | Step Freq. | Chromosomes |\n| -------------------- | ----------- | ---------- | ----------- |\n| `make example-3kb`   | 3 kb        | 2          | 21          |\n| `make example-12kb`  | 12 kb       | 3          | 20-21       |\n| `make example-120kb` | 120 kb      | 6          | 17-21       |\n\n**Note:** The first time Peax is started it will precompute the datasets for\nexploration. This can take a few minutes depending on your hardware. Also, these demos\nwill only prepare the above mentioned chromosomes, so don't try to search for patterns\non another chromosome. It won't work! For your own data you can freely configure this\nof course.\n\nThe scripts will download test ENCODE tracks and use the matching\nconfiguration to start the server. More examples are described in [`/examples`](examples).\n\n## Get Started\n\nIn the following we describe how you can configure Peax for your own data.\n\n#### Configure Peax with your data\n\nNext, you need to configure Peax with your data to tell it which tracks you want to visualize in HiGlass and which of those tracks are encodable using an (auto)encoder.\n\nThe fastest way to get started is to copy the example config:\n\n```\ncp config.json.sample config.json\n```\n\nThe config file has 10 top level properties:\n\n| Field             | Description                                                                                                                                                                                 | Dtype |\n| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- |\n| encoders          | List of encoders.                                                                                                                                                                           | list  |\n| datasets          | List of tracks.                                                                                                                                                                             | list  |\n| coords            | Genome coordinates. Peax currently supports hg19, hg38, mm9, and mm10                                                                                                                       | str   |\n| chroms            | Chromosomes to to be searched. If omitted all chromosomes will be prepared for searching.                                                                                                   | list  |\n| step_freq         | Step frequency of the sliding window approach. E.g., given an encoder with window size 12 kb, a step frequency of 6 means that every 2 kb a 12 kb window will be extracted from the bigWig. | int   |\n| db_path           | Relative path to the sqlite db for storing searches.                                                                                                                                        | str   |\n| normalize_tracks  | If `true` the y-scale of tracks within a window will be normalized to the minimum and maximum value. This is useful for exploring differential signal.                                      | bool  |\n| variable_target   | If `true` the window with the highest prediction probability will be shown in the query view.                                                                                               | bool  |\n| classifier        | The class name of an SciKit Learn Classifier                                                                                                                                                | str   |\n| classifier_params | A dictionary of parameters to customize the classifier                                                                                                                                      | obj   |\n\nThe main parts to adjust are `encoders` and `datasets`. `encoders` is a list of\n(auto)encoder definitions for different datatypes.T here are two ways to\nconfigure an (auto)encoder: (a) point to a pre-defined autoencoder or (b)\nconfigure from scratch.\n\nAssuming you want to use predefined encoders all you have to do is to specify the path to the encoder configuration\n\n**Example:**\n\n```json\n{\n  \"encoders\": \"examples/encoders.json\"\n}\n```\n\nThe encoder configuration file is a dictionary with the top level keys acting\nas the identifier. Given the example from above the file could look like this:\n\n```json\n{\n  \"histone-mark-chip-seq-3kb\": {},\n  \"dnase-seq-3kb\": {}\n}\n```\n\nSee `[encoders.json](encoders.json)` for an example. The specific definition if an\nautoencoder is the same as described in the following.\n\nTo configure an autoencoder from scratch you need to provide a dictionary with\nthe following required format:\n\n| Field        | Description                                                                                                                                   | Defaults | Dtype |\n| ------------ | --------------------------------------------------------------------------------------------------------------------------------------------- | -------- | ----- |\n| autoencoder  | Relative path to your pickled autoencoder model. (hdf5 file)                                                                                  |          | str   |\n| encoder      | Relative path to your pickled encoder model. (hdf5 file)                                                                                      |          | str   |\n| decoder      | Relative path to your pickled decoder model. (hdf5 file)                                                                                      |          | str   |\n| content_type | Unique string describing the content this autoencoder can handle. Data tracks with the same content type will be encoded by this autoencoder. |          | str   |\n| window_size  | Window size in base pairs used for training the autoencoder.                                                                                  |          | int   |\n| resolution   | Resolution or bin size of the window in base pairs.                                                                                           |          | int   |\n| latent_dim   | Number of latent dimensions of the encoded windows.                                                                                           |          | int   |\n| input_dim    | Number of input dimensions for Keras. For 1D data these are 3: samples, data length (which is `window_size` / `resolution`), channels.        | 3        | int   |\n| channels     | Number of channels of the input data. This is normally 1.                                                                                     | 1        | int   |\n| model_args   | List of arguments passed to a custom encoder model                                                                                            | 1        | int   |\n\n_Note that if you have specified an `autoencoder` you do not need to provide\nseparate `encoder` and `decoder` models._\n\n**Example:**\n\n```json\n{\n  \"encoder\": \"path/to/my-12kb-chip-seq-encoder.h5\",\n  \"decoder\": \"path/to/my-12kb-chip-seq-decoder.h5\",\n  \"content_type\": \"histone-mark-chip-seq\",\n  \"window_size\": 12000,\n  \"resolution\": 100,\n  \"channels\": 1,\n  \"input_dim\": 3,\n  \"latent_dim\": 12\n}\n```\n\nDatasets require the following format:\n\n| Field        | Description                                                                                                                                                       | Dtype |\n| ------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- |\n| filepath     | Relative path to the data file (bigWig or bigBed).                                                                                                                | str   |\n| content_type | Unique string describing the content this dataset. If you want to search for patterns in this track you need to have an autoencoder with a matching content type. | str   |\n| id           | A unique string identifying your track. (Optional)                                                                                                                | str   |\n| name         | A human readable name to be shown in HiGlass. (Optional)                                                                                                          | str   |\n\n**Example:**\n\n```json\n{\n  \"filepath\": \"data/chip-seq/my-fancy-gm12878-chip-seq-h3k27ac-fc-signal.bigWig\",\n  \"content_type\": \"histone-mark-chip-seq\",\n  \"uuid\": \"my-fancy-gm12878-chip-seq-h3k27c-track\",\n  \"name\": \"My Fancy GM12878 ChIP-Seq H3k27c Track\"\n}\n```\n\n#### Start Peax\n\nFirst, start the Peax server to serve your data.\n\n**Note:** The first time you run Peax on a new dataset all the data will be prepared!\nDepending on your machine this can take some time. If you want to track the progress\nactivate the debugging mode using `-d`.\n\n```bash\npython start.py\n```\n\nNow go to [http://localhost:5000](http://localhost:5000).\n\nTo `start.py` script supports the following options:\n\n```bash\nusage: start.py [-h] [-c CONFIG] [--clear] [--clear-cache]\n                [--clear-cache-at-exit] [--clear-db] [-d] [--host HOST]\n                [--port PORT] [-v]\n\nPeak Explorer CLI\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -c CONFIG, --config CONFIG\n                        path to your JSON config file\n  -b BASE_DATA_DIR, --base-data-dir BASE_DATA_DIR\n                        base directory which the config file refers to\n  --clear               clears the cache and database on startup\n  --clear-cache         clears the cache on startup\n  --clear-cache-at-exit\n                        clear the cache on shutdown\n  --clear-db            clears the database on startup\n  -d, --debug           turn on debug mode\n  --host HOST           customize the hostname\n  --port PORT           customize the port\n  -v, --verbose         turn verbose logging on\n```\n\nThe `hostname` defaults to `localhost` and the `port` of the backend server defaults\nto `5000`.\n\nIn order to speed up subsequend user interaction, Peax initially prepapres all\nthe data and caches that data under `/cache`. You can always remove this\ndirectory manually or clear the cache on startup or at exist using the `--clear`\nas specified above.\n\n---\n\n## Development\n\nHandy commands to keep in mind:\n\n- `make install` installs the conda environment and npm packages and builds the UI\n- `make update` updates the conda environment and npm packages and rebuilds the UI\n- `make build` builds the UI\n- `python start.py` starts the Flask server application for serving data\n- [/ui]: `npm install` installs and updates all the needed packages for the frontend\n- [/ui]: `npm build` creates the production built of the frontend\n- [/ui]: `npm start` starts a dev server with hot reloading for the frontend\n\nTo start developing on the server and the ui in parallel, first start the backend server\napplication using `./start.py` and then start the frontend server application from\n`./ui` using `npm start`. Both server's watch the source code, so whenever you make\nchanges to the source code the servers will reload.\n\n### Configuration\n\nThere are 2 types of configuration files. The [backend server configuration](#configure-peax-with-your-data)\ndefines the datasets to explore and is described in detail [above](#configure-peax-with-your-data).\n\nAdditionally, the frontend application can be configured to talk to a different backend\nserver and port if needed. Get started by copying the example configuration:\n\n```bash\ncd ui \u0026\u0026 cp config.json.sample config.json\n```\n\nBy default the `server` is dynamically set to the hostname of the server running the\nfrontend application. I.e., it is assumed that the backend server application is\nrunning on the same host as the frontend application. The `port` of the server\ndefaults to `5000`.\n\n### Start the backend and frontend apps\n\nFor development the backend and frontend applications run as seperate server\napplications.\n\n```bash\n# Backend server\n./start.py --debug --config path/to/your/config.json\n\n# Frontend server\ncd ui \u0026\u0026 npm start\n```\n","funding_links":[],"categories":["Epigenomics","Ranked by starred repositories"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNovartis%2Fpeax","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNovartis%2Fpeax","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNovartis%2Fpeax/lists"}