An open API service indexing awesome lists of open source software.

https://github.com/biogenies/tidysq

tidy processing of biological sequences in R
https://github.com/biogenies/tidysq

bioconductor bioinformatics biological-sequences fasta r rstats s3 sequences tibble tidy tidyverse vctrs

Last synced: 10 months ago
JSON representation

tidy processing of biological sequences in R

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = TRUE,
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# tidysq

[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/tidysq)](https://cran.r-project.org/package=tidysq)
[![Github Actions Build Status](https://github.com/BioGenies/tidysq/workflows/R-CMD-check-bioc/badge.svg)](https://github.com/BioGenies/tidysq/actions)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)

## Overview

`tidysq` contains tools for analysis and manipulation of biological sequences (including amino acid and nucleic acid -- e.g. RNA, DNA -- sequences). Two major features of this package are:

- effective compression of sequence data, allowing to fit larger datasets in **R**,

- compatibility with most of `tidyverse` universe, especially `dplyr` and `vctrs` packages, making analyses *tidier*.

## Getting started

[Try our quick start vignette](http://biogenies.info/tidysq/articles/quick-start.html) or [our exhaustive documentation](http://biogenies.info/tidysq/reference/index.html).

## Installation

The easiest way to install `tidysq` package is to download its latest version from CRAN repository:

```{r, eval=FALSE}
install.packages("tidysq")
```

Alternatively, it is possible to download the development version directly from GitHub repository:

```{r, eval=FALSE}
# install.packages("devtools")
devtools::install_github("BioGenies/tidysq")
```

## Example usage

```{r, message=FALSE}
library(tidysq)
```

```{r}
file <- system.file("examples", "example_aa.fasta", package = "tidysq")
sqibble <- read_fasta(file)
sqibble

sq_ami <- sqibble$sq
sq_ami

# Subsequences can be extracted with bite()
bite(sq_ami, 5:10)

# There are also more traditional functions
reverse(sq_ami)

# find_motifs() returns a whole tibble of useful informations
find_motifs(sqibble, "^VHX")
```

An example of `dplyr` integration:

```{r, message=FALSE}
library(dplyr)
# tidysq integrates well with dplyr verbs
sqibble %>%
filter(sq %has% "VFF") %>%
mutate(length = get_sq_lengths(sq))
```

## Citation

For citation type:

```{r, eval=FALSE}
citation("tidysq")
```

or use:

Michal Burdukiewicz, Dominik Rafacz, Laura Bakala, Jadwiga Slowik, Weronika Puchala, Filip Pietluch, Katarzyna Sidorczuk, Stefan Roediger and Leon Eyrich Jessen (2021). tidysq: Tidy Processing and Analysis of Biological Sequences. R package version 1.1.3.