{"id":13755648,"url":"https://kkrismer.github.io/seqgra/","last_synced_at":"2025-05-10T02:32:53.608Z","repository":{"id":57465713,"uuid":"209076083","full_name":"kkrismer/seqgra","owner":"kkrismer","description":"seqgra: Synthetic rule-based biological sequence data generation for architecture evaluation and search","archived":false,"fork":false,"pushed_at":"2022-05-30T18:35:51.000Z","size":9299,"stargazers_count":0,"open_issues_count":0,"forks_count":2,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-10-28T22:23:22.398Z","etag":null,"topics":["deep-learning","interpretable-machine-learning","neural-networks","pytorch","tensorflow"],"latest_commit_sha":null,"homepage":"https://kkrismer.github.io/seqgra/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kkrismer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-09-17T14:26:55.000Z","updated_at":"2022-01-08T14:51:51.000Z","dependencies_parsed_at":"2022-09-17T18:11:48.995Z","dependency_job_id":null,"html_url":"https://github.com/kkrismer/seqgra","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kkrismer%2Fseqgra","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kkrismer%2Fseqgra/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kkrismer%2Fseqgra/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kkrismer%2Fseqgra/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kkrismer","download_url":"https://codeload.github.com/kkrismer/seqgra/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224873386,"owners_count":17384078,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","interpretable-machine-learning","neural-networks","pytorch","tensorflow"],"created_at":"2024-08-03T11:00:17.748Z","updated_at":"2024-11-16T10:31:11.906Z","avatar_url":"https://github.com/kkrismer.png","language":"Python","funding_links":[],"categories":["Software packages"],"sub_categories":["Data wrangling"],"readme":"# seqgra: Principled Selection of Neural Network Architectures for Genomics Prediction Tasks\n\n[![license: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT) [![DOI](https://img.shields.io/badge/DOI-10.1093%2Fbioinformatics%2Fbtac101-blue.svg)](https://doi.org/10.1093/bioinformatics/btac101) [![PyPI version](https://badge.fury.io/py/seqgra.svg)](https://badge.fury.io/py/seqgra) [![Travis build status](https://travis-ci.com/kkrismer/seqgra.svg?branch=master)](https://travis-ci.com/kkrismer/seqgra)\n\nhttps://kkrismer.github.io/seqgra/\n\n## What is seqgra?\n\nSequence models based on deep neural networks have achieved state-of-the-art \nperformance on regulatory genomics prediction tasks, such as chromatin \naccessibility and transcription factor binding. But despite their high \naccuracy, their contributions to a mechanistic understanding of the biology \nof regulatory elements is often hindered by the complexity of the predictive \nmodel and thus poor interpretability of its decision boundaries. To address \nthis, we introduce seqgra, a deep learning pipeline that incorporates the \nrule-based simulation of biological sequence data and the training and \nevaluation of models, whose decision boundaries mirror the rules from the \nsimulation process. The method can be used to (1) generate data under the \nassumption of a hypothesized model of genome regulation, (2) identify neural \nnetwork architectures capable of recovering the rules of said model, and (3) \nanalyze a model's predictive performance as a function of training set size, \nnoise level, and the complexity of the rules behind the simulated data.\n\n## Installation\n\nseqgra is a Python package that is part of [PyPI](https://pypi.org/), \nthe package repositories behind [pip](https://pip.pypa.io/en/stable/).\n\nTo install seqgra with pip, run:\n```\npip install seqgra\n```\n\nTo install seqgra directly from this repository, run:\n```\ngit clone https://github.com/gifford-lab/seqgra\ncd seqgra\npip install .\n```\n\n### System requirements\n\n- Python 3.7 (or higher)\n- *R 3.5 (or higher)*\n    - *R package `ggplot2` 3.3.0 (or higher)*\n    - *R package `gridExtra` 2.3 (or higher)*\n    - *R package `scales` 1.1.0 (or higher)*\n\n\nThe ``tensorflow`` package is only required if TensorFlow models are used \nand will not be automatically installed by ``pip install seqgra``. Same is \ntrue for packages ``torch`` and ``pytorch-ignite``, which are only \nrequired if PyTorch models are used.\n\nR is a soft dependency, in the sense that it is used to create a number \nof plots (grammar-model-agreement plots, \ngrammar heatmaps, and motif similarity matrix plots) and if not available, \nthese plots will be skipped.\n\nseqgra depends upon the Python package [lxml](https://lxml.de/), which in turn \ndepends on system libraries that are not always present. On a \nDebian/Ubuntu machine you can satisfy those requirements using:\n```\nsudo apt-get install libxml2-dev libxslt-dev\n```\n\n## Usage\n\nCheck out the following help pages:\n\n* [Usage examples](https://kkrismer.github.io/seqgra/examples.html): seqgra example analyses with data definitions and model definitions\n* [Command line utilities](https://kkrismer.github.io/seqgra/cmd.html): argument descriptions for `seqgra`, `seqgras`, `seqgrae`, and `seqgraa` commands\n* [Data definition](https://kkrismer.github.io/seqgra/dd.html): detailed description of the data definition language that is used to formalize grammars\n* [Model definition](https://kkrismer.github.io/seqgra/md.html): detailed description of the model definition language that is used to describe neural network architectures and hyperparameters for the optimizer, the loss, and the training process\n* [Simulators, Learners, Evaluators, Comparators](https://kkrismer.github.io/seqgra/slec.html): brief descriptions of the most important classes\n* [seqgra API reference](https://kkrismer.github.io/seqgra/seqgra.html): detailed description of the seqgra API\n* [Source code](https://github.com/gifford-lab/seqgra): seqgra source code repository on GitHub \n\n## Citation\n\nIf you use seqgra in your work, please cite:\n\n**seqgra: Principled Selection of Neural Network Architectures for Genomics Prediction Tasks**  \nKonstantin Krismer, Jennifer Hammelman, and David K. Gifford  \nBioinformatics, Volume 38, Issue 9, 1 May 2022, Pages 2381–2388; DOI: https://doi.org/10.1093/bioinformatics/btac101\n\n## Funding\n\nWe gratefully acknowledge funding from NIH grants 1R01HG008754 and \n1R01NS109217.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/kkrismer.github.io%2Fseqgra%2F","html_url":"https://awesome.ecosyste.ms/projects/kkrismer.github.io%2Fseqgra%2F","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/kkrismer.github.io%2Fseqgra%2F/lists"}