{"id":21028122,"url":"https://github.com/slowikj/seqr","last_synced_at":"2025-05-15T10:33:09.988Z","repository":{"id":56934577,"uuid":"221626515","full_name":"slowikj/seqR","owner":"slowikj","description":"fast and comprehensive k-mer counting package","archived":false,"fork":false,"pushed_at":"2021-09-27T16:40:05.000Z","size":1803,"stargazers_count":18,"open_issues_count":7,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-03T08:03:37.322Z","etag":null,"topics":["bioinformatics","bioinformatics-tool","dna-processing","feature-engineering","feature-extraction","genomics","hashing","hashing-algorithms","k-mer","k-mer-counting","kmer","kmer-counting","kmer-frequency-count","kmers","ngram","ngrams","protein-sequences","rcpp","rcppparallel","rpackage"],"latest_commit_sha":null,"homepage":"https://slowikj.github.io/seqR/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/slowikj.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-11-14T06:20:51.000Z","updated_at":"2024-10-25T21:02:08.000Z","dependencies_parsed_at":"2022-08-21T05:20:44.958Z","dependency_job_id":null,"html_url":"https://github.com/slowikj/seqR","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slowikj%2FseqR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slowikj%2FseqR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slowikj%2FseqR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slowikj%2FseqR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/slowikj","download_url":"https://codeload.github.com/slowikj/seqR/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254323249,"owners_count":22051746,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","bioinformatics-tool","dna-processing","feature-engineering","feature-extraction","genomics","hashing","hashing-algorithms","k-mer","k-mer-counting","kmer","kmer-counting","kmer-frequency-count","kmers","ngram","ngrams","protein-sequences","rcpp","rcppparallel","rpackage"],"created_at":"2024-11-19T11:53:57.871Z","updated_at":"2025-05-15T10:33:08.843Z","avatar_url":"https://github.com/slowikj.png","language":"C++","readme":"---\noutput: github_document\n---\n\n```{r setup, include=FALSE}\nknitr::opts_chunk$set(\n  echo = TRUE,\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  out.width = \"100%\"\n)\n```\n\n```{r, include = FALSE}\nlibrary(seqR)\n```\n\n# seqR - fast and comprehensive k-mer counting package\n\n\u003c!-- badges: start --\u003e\n[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/seqR)](https://cran.r-project.org/package=seqR)\n[![R build status](https://github.com/slowikj/seqR/workflows/R-CMD-check/badge.svg)](https://github.com/slowikj/seqR/actions)\n[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n[![codecov.io](https://codecov.io/github/slowikj/seqR/coverage.svg?branch=master)](https://codecov.io/github/slowikj/seqR?branch=master)\n[![Code Quality Status](https://www.code-inspector.com/project/23909/status/svg)](https://www.code-inspector.com/project/23909/status/svg)\n[![Code Quality Score](https://www.code-inspector.com/project/23909/score/svg)](https://www.code-inspector.com/project/23909/score/svg)\n\u003c!-- badges: end --\u003e\n\n## About\n\n`seqR` is an R package for fast k-mer counting. It provides\n\n* **highly optimized** (the core algorithm is written in C++)\n* **in-memory**\n* **probabilistic** (with configurable dimensionality of a hash value\nused for storing k-mers internally),\n* **multi-threaded** (with a configurable size of the batch of sequences (`batch_size`) to process in a single step. If `batch_size` equals 1, the multi-threaded mode is disabled, which potentially causes a longer computation time)\n\nimplementation that supports \n\n* **various variants of k-mers** (contiguous, gapped, and positional counterparts)\n* **all biological sequences** (e.g., nucleic acids and proteins)\n\nMoreover, the result optimizes memory consumption by the application of **sparse matrices**\n(see [package Matrix](https://CRAN.R-project.org/package=Matrix)),\ncompatible with machine learning packages\nsuch as [ranger](https://CRAN.R-project.org/package=ranger)\nand [xgboost](https://CRAN.R-project.org/package=xgboost).\n\n## How to...\n\n### How to install\n\nTo install `seqR` from CRAN:\n\n```{r, eval=FALSE}\ninstall.packages(\"seqR\")\n```\n\nAlternatively, if you want to use the latest development version:\n\n```{r, eval=FALSE}\n# install.packages(\"devtools\")\ndevtools::install_github(\"slowikj/seqR\")\n```\n\n### How to use\n\nThe package provides two functions that facilitate k-mer counting\n\n* `count_kmers` (used for counting k-mers of one type)\n* `count_multimers` (a wrapper of `count_kmers`, used for counting k-mers of many types in a single invocation of the function)\n\nand one function used for custom processing of k-mer matrices:\n\n* `rbind_columnwise` (a helper function used for merging several k-mer matrices that do not have same sets of columns)\n\nTo learn more, see [features overview vignette](https://slowikj.github.io/seqR/articles/features-overview.html)\nand [reference](https://slowikj.github.io/seqR/reference/index.html).\n\n#### Examples\n\n##### counting 5-mers\n\n```{r}\ncount_kmers(sequences=c(\"AAAAAVVAVFF\", \"DFGSADFGSA\"),\n            k=5)\n```\n\n##### counting gapped 5-mers with gaps (0, 1, 0, 2) (XX_XX__X)\n\n```{r}\ncount_kmers(sequences=c(\"AAAAAVVAVFF\", \"DFGSADFGSA\"),\n            kmer_gaps=c(0, 1, 0, 2))\n```\n\n\n##### counting 1-mers and 2-mers\n\n```{r}\ndata(CsgA)\n\nCsgA[1L:2]\n\ncount_multimers(sequences=CsgA,\n                k_vector = c(1, 2))\n```\n\n\n### How to cite\n\nFor citation type:\n\n```{r, eval=FALSE}\ncitation(\"seqR\")\n```\n\nor use:\n\nJadwiga Słowik and Michał Burdukiewicz (2021). seqR: fast and comprehensive k-mer counting package. R package version 1.0.0.\n\n## Benchmarks\n\nThe `seqR` package has been compared with other existing k-mer counting R packages:\n[biogram](https://CRAN.R-project.org/package=biogram),\n[kmer](https://CRAN.R-project.org/package=kmer),\n[seqinr](https://CRAN.R-project.org/package=seqinr),\nand [biostrings](https://bioconductor.org/packages/Biostrings).\n\nAll benchmark experiments have been performed using Intel Core i7-6700HQ 2.60GHz  8 cores, using the [microbenchmark](https://CRAN.R-project.org/package=microbenchmark) R package. \n\n### Contiguous k-mers\n\n#### Changing k\n\n\u003cimg src = \"https://raw.githubusercontent.com/slowikj/seqR/master/man/img/packages_different_k.png\" align = \"center\" width=\"100%\"/\u003e\n\nThe input consists of one `DNA` sequence of length `3 000`.\n\n#### Changing the number of sequences\n\n\u003cimg src = \"https://raw.githubusercontent.com/slowikj/seqR/master/man/img/packages_different_seq_num.png\" align = \"center\" width=\"100%\"/\u003e\n\nEach `DNA` sequence has `3 000` elements, `contiguous 5-mer` counting.\n\n### Gapped k-mers\n\n#### Changing the first contiguous part of a k-mer\n\n\u003cimg src = \"https://raw.githubusercontent.com/slowikj/seqR/master/man/img/gapped_kmers_changing_the_first_contiguous_part.png\" align = \"center\" width=\"100%\"/\u003e\n\nThe input consists of one `DNA` sequence of length `1 000 000`. `Gapped 5-mers` counting with base gaps `(1, 0, 0, 1)`.\n\n#### Changing the first gap size\n\n\u003cimg src = \"https://raw.githubusercontent.com/slowikj/seqR/master/man/img/gapped_kmers_changing_the_first_gap.png\" align = \"center\" width=\"100%\"/\u003e\n\nThe input consists of one `DNA` sequence of length `100 000`. `Gapped 5-mers` counting with base gaps `(1, 0, 0, 1)`.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fslowikj%2Fseqr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fslowikj%2Fseqr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fslowikj%2Fseqr/lists"}