https://github.com/ulelab/cluster_kmers

Get clusters of k-mers based solely on their sequence or in combination with enrichment in PEKA.
https://github.com/ulelab/cluster_kmers

Last synced: 5 months ago
JSON representation

Get clusters of k-mers based solely on their sequence or in combination with enrichment in PEKA.

Host: GitHub
URL: https://github.com/ulelab/cluster_kmers
Owner: ulelab
License: gpl-3.0
Created: 2023-02-22T15:04:05.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2023-11-21T09:16:28.000Z (over 2 years ago)
Last Synced: 2025-09-05T04:12:02.115Z (10 months ago)
Language: Python
Size: 1.29 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff

Awesome Lists containing this project

README

# cluster_kmers
[![DOI](https://zenodo.org/badge/605154452.svg)](https://zenodo.org/badge/latestdoi/605154452)

Get clusters of k-mers based solely on their sequence or in combination with enrichment in PEKA.

## Installation

```
conda create -n cluster_kmers
conda activate cluster_kmers
# Python must be >=3.8 or <=3.11 to support SciPy 1.9
conda install python=3.9 pip
pip install git+https://github.com/ulelab/cluster_kmers.git@master
```

## Usage
```
cluster_kmers [-h] -k KMERS [KMERS ...] -o RESULTS_FOLDER [-co {seq,seq_enrichment}] [-n N_CLUSTERS] [-peka PEKA_FILES [PEKA_FILES ...]] [-tl MIN_TOKEN_LENGTH] [-wl {True,False}] [-cons CONSENSUS_LENGTH]
```

Cluster k-mers based on sequence or on a combination of sequence and enrichment in CLIP data.
```
required arguments:
-k KMERS [KMERS ...], --kmers KMERS [KMERS ...]
A list of k-mers encoded in RNA alphabet separated by spaces: AAGG GGAG GCCU.
-o RESULTS_FOLDER, --output_folder RESULTS_FOLDER
A path to an existing output folder, for example "~/results"

optional arguments:
-co {seq,seq_enrichment}, --cluster_on {seq,seq_enrichment}
Inputs to clustering. Valid options are: seq - Cluster only based on sequence similarity. seq_enrichment - Cluster based on sequence similarity and based on enrichment of motifs in CLIP data (this option requires arguments passed to -peka)
-n N_CLUSTERS, --n_clusters N_CLUSTERS
Number of clusters to split k-mers into. Valid options are "auto" or integer. Default is "auto".
-peka PEKA_FILES [PEKA_FILES ...], --peka_files PEKA_FILES [PEKA_FILES ...]
A list of peka output files with extensions *mer_distribution_{region_name}.tsv, separated by spaces.
-tl MIN_TOKEN_LENGTH, --min_token_length MIN_TOKEN_LENGTH
Minimal length of a substrings used for clustering. For k-mers with lengths greater than 5, setting this value to be greater than 1 can improve the results of clustering.
-wl {True,False}, --weblogos {True,False}
Whether to plot weblogos for motif groups, True by default.
-cons CONSENSUS_LENGTH, --consensus_length CONSENSUS_LENGTH
Length of consensus sequence to name k-mer groups. Automatically this length is determined as k-mer length - 1. Valid choices are "auto" or integer.
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ulelab/cluster_kmers

Awesome Lists containing this project

README