Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/snikumbh/archr
archR: Identifying promoter sequence architectures de novo using NMF
https://github.com/snikumbh/archr
archr discovery nmf non-negative-matrix-factorization promoter-sequence-architectures r r-package scikit-learn sequence-architectures unsupervised-machine-learning
Last synced: 6 days ago
JSON representation
archR: Identifying promoter sequence architectures de novo using NMF
- Host: GitHub
- URL: https://github.com/snikumbh/archr
- Owner: snikumbh
- License: gpl-3.0
- Created: 2019-05-24T15:53:01.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2021-07-01T13:38:24.000Z (over 3 years ago)
- Last Synced: 2024-11-11T13:43:01.474Z (2 months ago)
- Topics: archr, discovery, nmf, non-negative-matrix-factorization, promoter-sequence-architectures, r, r-package, scikit-learn, sequence-architectures, unsupervised-machine-learning
- Language: R
- Homepage: https://snikumbh.github.io/archR
- Size: 17.7 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: NEWS.md
- License: LICENSE
Awesome Lists containing this project
README
# archR
[![Lifecycle:
experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
[![DOI](https://zenodo.org/badge/188449833.svg)](https://zenodo.org/badge/latestdoi/188449833)
[![Build status](https://travis-ci.org/snikumbh/archR.svg?branch=master)](https://travis-ci.org/snikumbh/archR)
[![Codecov test coverage](https://codecov.io/gh/snikumbh/archR/branch/master/graph/badge.svg)](https://codecov.io/gh/snikumbh/archR?branch=master)
[![R build status](https://github.com/snikumbh/archR/workflows/R-CMD-check/badge.svg)](https://github.com/snikumbh/archR/actions)Note: _This package is currently under development. So, please bear with me while I put the final blocks together. Thanks for your understanding!_
archR is an unsupervised, non-negative matrix factorization (NMF)-based algorithm for discovery of sequence architectures de novo.
Below is a schematic of archR's algorithm.## Installation
### Python scikit-learn dependency
This package requires the Python module scikit-learn. Please see installation instructions [here](https://scikit-learn.org/stable/install.html).### To install this package, use
```r
if (!requireNamespace("remotes", quietly = TRUE)) {
install.packages("remotes")
}remotes::install_github("snikumbh/archR", build_vignettes = FALSE)
```### Usage
```r
# load package
library(archR)
library(Biostrings)# Creation of one-hot encoded data matrix from FASTA file
# You can use your own FASTA file instead
inputFastaFilename <- system.file("extdata", "example_data.fa",
package = "archR",
mustWork = TRUE)# Specifying dinuc generates dinucleotide features
inputSeqsMat <- archR::prepare_data_from_FASTA(inputFastaFilename,
sinuc_or_dinuc = "dinuc")inputSeqsRaw <- archR::prepare_data_from_FASTA(inputFastaFilename,
raw_seq = TRUE)nSeqs <- length(inputSeqsRaw)
positions <- seq(1, Biostrings::width(inputSeqsRaw[1]))# Set archR configuration
# Most arguments have default values
archRconfig <- archR::archR_set_config(
parallelize = TRUE,
n_cores = 2,
n_runs = 100,
k_min = 1,
k_max = 20,
mod_sel_type = "stability",
bound = 10^-6,
chunk_size = 100,
result_aggl = "ward.D",
result_dist = "euclid",
flags = list(debug = FALSE, time = TRUE, verbose = TRUE,
plot = FALSE)
)#
### Call/Run archR
archRresult <- archR::archR(config = archRconfig,
seqs_ohe_mat = inputSeqsMat,
seqs_raw = inputSeqsRaw,
seqs_pos = positions,
total_itr = 2,
set_ocollation = c(TRUE, FALSE))```
# Contact
Comments, suggestions, enquiries/requests are welcome! Feel free to email [email protected] or [create an new issue](https://github.com/snikumbh/archR/issues/new)