{"id":19839891,"url":"https://github.com/qdata/attentivechrome","last_synced_at":"2025-05-01T19:30:33.023Z","repository":{"id":69609557,"uuid":"105061674","full_name":"QData/AttentiveChrome","owner":"QData","description":"NeurIPS17: [AttentiveChrome] Attend and Predict: Using Deep Attention Model to Understand Gene Regulation by Selective Attention on Chromatin","archived":false,"fork":false,"pushed_at":"2021-02-08T17:31:21.000Z","size":81896,"stargazers_count":27,"open_issues_count":0,"forks_count":9,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-06T17:05:43.342Z","etag":null,"topics":["deep-learning","deep-neural-network","epigenetic-data","interpretable-deep-learning"],"latest_commit_sha":null,"homepage":"http://deepchrome.org","language":"Lua","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/QData.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-09-27T19:41:49.000Z","updated_at":"2024-02-21T19:38:21.000Z","dependencies_parsed_at":"2023-03-11T06:34:49.090Z","dependency_job_id":null,"html_url":"https://github.com/QData/AttentiveChrome","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QData%2FAttentiveChrome","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QData%2FAttentiveChrome/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QData%2FAttentiveChrome/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QData%2FAttentiveChrome/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/QData","download_url":"https://codeload.github.com/QData/AttentiveChrome/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251932532,"owners_count":21667159,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","deep-neural-network","epigenetic-data","interpretable-deep-learning"],"created_at":"2024-11-12T12:24:44.993Z","updated_at":"2025-05-01T19:30:28.278Z","avatar_url":"https://github.com/QData.png","language":"Lua","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AttentiveChrome\n\nReference Paper: [Attend and Predict: Using Deep Attention Model to Understand Gene Regulation by Selective Attention on Chromatin](https://arxiv.org/abs/1708.00339)\n\nBibTex Citation:\n```\n@inproceedings{singh2017attend,\n  title={Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin},\n  author={Singh, Ritambhara and Lanchantin, Jack and Sekhon, Arshdeep  and Qi, Yanjun},\n  booktitle={Advances in Neural Information Processing Systems},\n  pages={6769--6779},\n  year={2017}\n}\n```\n\nAttentiveChrome is a unified architecture to model and to interpret dependencies among chromatin factors for controlling gene regulation. AttentiveChrome uses a hierarchy of multiple Long short-term memory (LSTM) modules to encode the input signals and to model how various chromatin marks cooperate automatically. AttentiveChrome trains two levels of attention jointly with the target prediction, enabling it to attend differentially to relevant marks and to locate important positions per mark. We evaluate the model across 56 different cell types (tasks) in human. Not only is the proposed architecture more accurate, but its attention scores also provide a better interpretation than state-of-the-art feature visualization methods such as saliency map. \n\n**Feature Generation for AttentiveChrome model:** \n\nWe used the five core histone modification (listed in the paper) read counts from REMC database as input matrix. We downloaded the files from [REMC dabase](http://egg2.wustl.edu/roadmap/web_portal/processed_data.html#ChipSeq_DNaseSeq). We converted 'tagalign.gz' format to 'bam' by using the command:\n```\ngunzip \u003cfilename\u003e.tagAlign.gz\nbedtools bedtobam -i \u003cfilename\u003e.tagAlign -g hg19chrom.sizes \u003e \u003cfilename\u003e.bam \n```\nNext, we used \"bedtools multicov\" to get the read counts. \nBins of length 100 base-pairs (bp) are selected from regions (+/- 5000 bp) flanking the transcription start site (TSS) of each gene. The signal value of all five selected histone modifications from REMC in bins forms input matrix X, while discretized gene expression (label +1/-1) is the output y.\n\nFor gene expression, we used the RPKM read count files available in REMC database. We took the median of the RPKM read counts as threshold for assigning binary labels (-1: gene low, +1: gene high). \n\nWe divided the genes into 3 separate sets for training, validation and testing. It was a simple file split resulting into 6601, 6601 and 6600 genes respectively. \n\nWe performed training and validation on the first 2 sets and then reported AUC scores of best performing epoch model for the third test data set. \n\n**Datasets**\n\nWe have provided a toy dataset to test out model in the data subdirectory of v2PyTorch\n\nThe complete set of 56 Cell Type datasets is located at https://zenodo.org/record/2652278\n\nThe rows are bins for all genes (100 rows per gene) and the columns are organised as follows:\n\nGeneID, Bin ID, H3K27me3 count, H3K36me3 count, H3K4me1 count, H3K4me3 count, H3K9me3 counts, Binary Label for gene expression (0/1)  \ne.g. 000003,1,4,3,0,8,4,1\n\n**Running The Model** \n\nSee the v1LuaTorch or v2PyTorch directories to run the code.\n\n\n\n# v2PyTorch folder includes Pytorch version of the  AttentiveChrome Implementation. \nYou can run it via the following command: \n\n```\npython train.py --cell_type Toy\n```\n\n\n\n## We also provide trained AttentiveChrome models through the Kipoi model zoo     [http://kipoi.org/](http://kipoi.org/)\n\nAttentive Chrome model can be run using Kipoi, which is a repository of predictive models for genomics. All models in the repo can be used through shared API.\n\n- The utility codes to adapt AttentiveChrome to Kipoi are in /kipoiutil\n\n### Installation Requirements\n* python\u003e=3.5\n* numpy\n* pytorch-cpu\n* torchvision-cpu\n\n## Quick Start\n### Creating new conda environtment using kipoi\n`kipoi env create AttentiveChrome`\n\n\n### Activating environment\n`conda activate kipoi-AttentiveChrome`\n\n## Command Line\nWe can run AttentiveChrome using a terminal.\n\n### Getting example input file\nTo get an example input file for a specific model, run the following command. Replace {model_name} with the actual name of model (e.g. E003, E005, etc.)\n\n`kipoi get-example AttentiveChrome/{model_name} -o example_file`\n\nexample: `kipoi get-example AttentiveChrome/E003 -o example_file`\n\n### Predicting using example file\nTo make a prediction using an input file, run the following command.\n\n`kipoi predict AttentiveChrome/{model_name} --dataloader_args='{\"input_file\": \"example_file/input_file\", \"bin_size\": 100}' -o example_predict.tsv`\n\nThis should produce a tsv file containing the results. To run it using another file, replace \"example_file/input+file\" with the path of your file.\n\n## Python API\nWe can also use Attentive Chrome through the Kipoi Python API.\n### Fetching the model\nFirst, import kipoi:\n`import kipoi`\n\nNext, get the model. Replace {model_name} with the actual name of model (e.g. E003, E005, etc.)\n\n`model = kipoi.get_model(\"AttentiveChrome/{model_name}\")`\n\n### Predicting using pipeline\n`prediction = model.pipeline.predict({\"input_file\": \"path to input file\", \"bin_size\": {some integer}})`\n\nThis returns a numpy array containing the output from the final softmax function.\n\ne.g. `model.pipeline.predict({\"input_file\": \"data/input_file\", \"bin_size\": 100})`\n\n### Predicting for a single batch\nFirst, we need to set up our dataloader `dl`.\n\n`dl = model.default_dataloader(input_file=\"path to input file\", bin_size={some integer})`\n\nNext, we can use the iterator functionality of the dataloader.\n\n`it = dl.batch_iter(batch_size=32)`\n\n`single_batch = next(it)`\n\nFirst line gets us an iterator named `it` with each batch containing 32 items. We can use `next(it)` to get a batch.\n\nThen, we can perform prediction on this single batch.\n\n`prediction = model.predict_on_batch(single_batch['inputs'])`\n\nThis also returns a numpy array containing the output from the final softmax function.\n\n\n# We have extended attentiveChrome to DeepDiffChrome\n\n\n- [DeepDiff: Deep-learning for predicting Differential\ngene expression from histone modifications](https://academic.oup.com/bioinformatics/article/34/17/i891/5093224)\n\n- Code Github [https://github.com/QData/DeepDiffChrome](https://github.com/QData/DeepDiffChrome)\n\n\n## Meanwhile, here are some links for general data processing tools/guidance on ChIP-seq data:\n\n[https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003326](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003326)\n\n[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389943/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389943/)\n\n[https://bedtools.readthedocs.io/en/latest/](https://bedtools.readthedocs.io/en/latest/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqdata%2Fattentivechrome","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqdata%2Fattentivechrome","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqdata%2Fattentivechrome/lists"}