{"id":16450964,"url":"https://github.com/snikumbh/seqarchr","last_synced_at":"2025-06-19T12:34:20.203Z","repository":{"id":69087815,"uuid":"434036357","full_name":"snikumbh/seqArchR","owner":"snikumbh","description":"seqArchR: Identifying (promoter) sequence architectures de novo using NMF","archived":false,"fork":false,"pushed_at":"2024-03-06T20:18:41.000Z","size":12248,"stargazers_count":1,"open_issues_count":2,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-08T07:30:08.373Z","etag":null,"topics":["clustering","nmf","nonnegative-matrix-factorization","promoter-sequence-architectures","r","r-package","scikit-learn","sequence-analysis","sequence-architectures","unsupervised-machine-learning"],"latest_commit_sha":null,"homepage":"https://snikumbh.github.io/seqArchR","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/snikumbh.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-12-02T00:59:07.000Z","updated_at":"2022-12-14T23:58:00.000Z","dependencies_parsed_at":"2025-02-13T08:47:54.235Z","dependency_job_id":null,"html_url":"https://github.com/snikumbh/seqArchR","commit_stats":{"total_commits":127,"total_committers":2,"mean_commits":63.5,"dds":"0.015748031496062964","last_synced_commit":"1fb808d0ef874a1c58546d1d3e4f75d75a01e3ea"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/snikumbh/seqArchR","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snikumbh%2FseqArchR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snikumbh%2FseqArchR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snikumbh%2FseqArchR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snikumbh%2FseqArchR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/snikumbh","download_url":"https://codeload.github.com/snikumbh/seqArchR/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snikumbh%2FseqArchR/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260751467,"owners_count":23057155,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering","nmf","nonnegative-matrix-factorization","promoter-sequence-architectures","r","r-package","scikit-learn","sequence-analysis","sequence-architectures","unsupervised-machine-learning"],"created_at":"2024-10-11T10:06:41.102Z","updated_at":"2025-06-19T12:34:15.192Z","avatar_url":"https://github.com/snikumbh.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\n# seqArchR\n\u003c!-- badges: start --\u003e\n[![DOI](https://zenodo.org/badge/188449833.svg)](https://zenodo.org/badge/latestdoi/188449833)\n[![codecov](https://codecov.io/gh/snikumbh/seqArchR/branch/main/graph/badge.svg?token=NEjCGuOUlW)](https://codecov.io/gh/snikumbh/seqArchR)\n\u003c!-- bioc badges: start --\u003e\n  [![Bioc release status](http://www.bioconductor.org/shields/build/release/bioc/seqArchR.svg)](https://bioconductor.org/checkResults/release/bioc-LATEST/seqArchR)\n  [![Bioc downloads rank](https://bioconductor.org/shields/downloads/release/seqArchR.svg)](http://bioconductor.org/packages/stats/bioc/seqArchR/)\n  [![Bioc support](https://bioconductor.org/shields/posts/seqArchR.svg)](https://support.bioconductor.org/tag/seqArchR)\n  [![Bioc history](https://bioconductor.org/shields/years-in-bioc/seqArchR.svg)](https://bioconductor.org/packages/release/bioc/html/seqArchR.html#since)\n  [![Bioc dependencies](https://bioconductor.org/shields/dependencies/release/seqArchR.svg)](https://bioconductor.org/packages/release/bioc/html/seqArchR.html#since)\n  \u003c!-- bioc badges: end --\u003e\n  \n\u003c!-- [![R build status](https://github.com/snikumbh/seqArchR/workflows/R-CMD-check/badge.svg)](https://github.com/snikumbh/seqArchR/actions) --\u003e\n\u003c!-- badges: end --\u003e\n\n\nseqArchR is an unsupervised, non-negative matrix factorization (NMF)-based algorithm for discovery of sequence architectures de novo.\nBelow is a schematic of seqArchR's algorithm.\n\n\u003cimg src=\"https://github.com/snikumbh/seqArchR/blob/main/vignettes/seqArchR_algorithm_1080p_cropped.gif\" width=\"550\" align=\"center\"\u003e\n\n\n## Installation\n\n### Python scikit-learn dependency\nThis package requires the Python module scikit-learn. Please see installation instructions [here](https://scikit-learn.org/stable/install.html).\n\n\n### To install this package, use \n\n```r\nif (!requireNamespace(\"remotes\", quietly = TRUE)) {\n    install.packages(\"remotes\")   \n}\n\nremotes::install_github(\"snikumbh/seqArchR\", build_vignettes = FALSE)\n``` \n\n\n\n### Usage\n```r\n# load package\nlibrary(seqArchR)\nlibrary(Biostrings)\n\n\n# Creation of one-hot encoded data matrix from FASTA file\n# You can use your own FASTA file instead\ninputFastaFilename \u003c- system.file(\"extdata\", \"example_data.fa\", \n                                  package = \"seqArchR\", \n                                  mustWork = TRUE)\n\n# Specifying dinuc generates dinucleotide features\ninputSeqsMat \u003c- seqArchR::prepare_data_from_FASTA(inputFastaFilename,\n                                                  sinuc_or_dinuc = \"dinuc\")\n\ninputSeqsRaw \u003c- seqArchR::prepare_data_from_FASTA(inputFastaFilename, \n                                               raw_seq = TRUE)\n\nnSeqs \u003c- length(inputSeqsRaw)\npositions \u003c- seq(1, Biostrings::width(inputSeqsRaw[1]))\n\n# Set seqArchR configuration\n# Most arguments have default values\nseqArchRconfig \u003c- seqArchR::set_config(\n        parallelize = TRUE,\n        n_cores = 2,\n        n_runs = 100,\n        k_min = 1,\n        k_max = 20,\n        mod_sel_type = \"stability\",\n        bound = 10^-6,\n        chunk_size = 100,\n\tresult_aggl = \"ward.D\",\n\tresult_dist = \"euclid\",\n        flags = list(debug = FALSE, time = TRUE, verbose = TRUE,\n                     plot = FALSE)\n        )\n\n#\n### Call/Run seqArchR\nseqArchRresult \u003c- seqArchR::seqArchR(config = seqArchRconfig,\n                               seqs_ohe_mat = inputSeqsMat,\n                               seqs_raw = inputSeqsRaw,\n                               seqs_pos = positions,\n                               total_itr = 2,\n\t\t\t       set_ocollation = c(TRUE, FALSE))\n\n```\n\n\n# Contact\nComments, suggestions, enquiries/requests are welcome! Feel free to email sarvesh.nikumbh@gmail.com or [create an new issue](https://github.com/snikumbh/seqArchR/issues/new)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnikumbh%2Fseqarchr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsnikumbh%2Fseqarchr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnikumbh%2Fseqarchr/lists"}