https://github.com/ellfran-7/cluefish

A specialised workflow designed to enhance the biological interpretation of transcriptomic data series 🎣
https://github.com/ellfran-7/cluefish

interpretation r transcriptomics workflow

Last synced: 8 months ago
JSON representation

A specialised workflow designed to enhance the biological interpretation of transcriptomic data series 🎣

Host: GitHub
URL: https://github.com/ellfran-7/cluefish
Owner: ellfran-7
License: other
Created: 2024-10-16T09:28:56.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-05-30T17:03:47.000Z (about 1 year ago)
Last Synced: 2025-05-30T23:18:57.175Z (about 1 year ago)
Topics: interpretation, r, transcriptomics, workflow
Language: R
Homepage: https://ellfran-7.github.io/cluefish/
Size: 27.6 MB
Stars: 1
Watchers: 1
Forks: 1
Open Issues: 6
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
- Codemeta: codemeta.json

Awesome Lists containing this project

README

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# cluefish

```{=html}

```

[![Contributors](https://img.shields.io/github/contributors/ellfran-7/cluefish.svg?style=flat-square)](https://github.com/ellfran-7/cluefish/graphs/contributors) [![Forks](https://img.shields.io/github/forks/ellfran-7/cluefish.svg?style=flat-square)](https://github.com/ellfran-7/cluefish/network/members) [![Stargazers](https://img.shields.io/github/stars/ellfran-7/cluefish.svg?style=flat-square)](https://github.com/ellfran-7/cluefish/stargazers) [![Issues](https://img.shields.io/github/issues/ellfran-7/cluefish.svg?style=flat-square)](https://github.com/ellfran-7/cluefish/issues) [![License](https://img.shields.io/badge/licence-CECILL_2.1-blue)](https://github.com/ellfran-7/cluefish/blob/main/LICENSE) [![LinkedIn](https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=555)](https://linkedin.com/in/ellis-franklin-6188831ba)

Table of Contents

Overview

Installation

Usage

Contributing

License

Contact

Acknowledgments

## Overview

Cluefish is a free and open-source, semi-automated R workflow designed for comprehensive and untargeted exploration of transcriptomic data series. Its name reflects the three key concepts driving the workflow: **Clustering**, **Enrichment**, and **Fishing**—metaphorically aligned with "*fishing for clues*"🎣 in complex biological data.

When used alongside the [DRomics](https://lbbe-software.github.io/DRomics/) (Dose-Response for Omics) R package, Cluefish provides a more comprehensive analysis of dose-response transcriptomic data. In toxicology/ecotoxicology, this will support the understanding/highlighting of contaminant’s mode of action.

This workflow addresses the limitations of the standard Over-Representation Analysis (ORA) by applying ORA to pre-clustered networks. These clusters serve as anchors for ORA, enhancing enrichment detection sensitivity and thus enabling the identification of smaller, more specific biological processes while simultaneously forming exploratory gene groups.

Cluefish is designed to be adaptable to a wide range of organisms, both model and non-model, ensuring broad applicability across various biological contexts.

------------------------------------------------------------------------

If you're ready to dive straight into using Cluefish, check out the Introduction to Cluefish vignette

------------------------------------------------------------------------

Cluefish graphical abstract

Graphical abstract of the Cluefish workflow.

(back to top)

## Installation

The Cluefish tool is developed in **R**, so having **R** installed is a prerequisite. You can download it [here](https://posit.co/download/rstudio-desktop/).

For an enhanced experience, we recommend using the **RStudio** integrated development environment (IDE), which is available for download at the same link, [here](https://posit.co/download/rstudio-desktop/).

You can use Cluefish locally in one of two ways:

1. Clone the repository via a terminal:

``` sh
git clone https://github.com/ellfran-7/cluefish.git
```

2. Install the developmental version of Cluefish from GitHub in R (`remotes` needed):

``` r
if (!requireNamespace("remotes", quietly = TRUE))
install.packages("remotes")

remotes::install_github("ellfran-7/cluefish")
```

(back to top)

## Additional Requirements

Cluefish relies on external open source software for an intermediate step within its workflow. Please ensure the following tools are installed:

1. **Cytoscape**:

Cluefish uses Cytoscape in order to visualize PPI networks. Install Cytoscape from their [download page](https://cytoscape.org/download.html).

2. **Required Cytoscape Apps**:

Within Cytoscape, install the **StringApp** and **clusterMaker2** apps. To do this:

- Open Cytoscape
- Navigate to `Apps` \> `App Store` \> `Show App Store`
- Search for and install "StringApp" (for retrieving STRING protein interactions) and "clusterMaker2"" (for clustering network data).

*You can also view more about these apps on the [Cytoscape App Store](https://apps.cytoscape.org/).*

(back to top)

## Usage

To run the Cluefish workflow, you can use the `make.R` script, which serves as the 'master' script for the entire process. We recommend using this script as a template to ensure smooth and sequential execution of the workflow steps.

### Required R packages

A key feature of Cluefish is the integration of `renv` to create reproducible environments. This allows you to install the required R packages in two ways:

- Run `renv::install()` to install the most recent version of the packages listed in the `renv.lock` file.
- For full reproducibility, run `renv::restore()` to install the exact package versions specified in the `renv.lock` file. Note that this process may take longer.

### Required inputs

Cluefish requires two key inputs:

1. **A background transcript list**: Typically, this includes the identifiers for all detected transcripts in the experiment.
2. **A deregulated transcript list**: A subset of the background list, containing the identifiers of significantly deregulated transcripts. This list can be derived using any selection method.

### Recommended Selection Method

While the inputs can be derived from any selection method, Cluefish was optimised to work seamlessly with the results from `DRomics`, a tool tailored for dose-response modelling of omics data.

Although using `DRomics` is optional, Cluefish leverages some of its visualization functions and modelling metrics to provide deeper insights into the biological interpretation of the data.

*For more information on DRomics, please refer to their [documentation](https://lbbe-software.github.io/DRomics/)*.

(back to top)

## Workflow

A schematic overview of the Cluefish workflow is shown below. For a full, step-by-step guide, refer to the vignette, [Introduction to Cluefish](https://ellfran-7.github.io/cluefish/articles/cluefish.html), which provides instructions using the *ZebrafishDBP* example dataset. The raw count data is publicly available on NCBI GEO and can be accessed with **GSE283957**.

Cluefish schematic

Schematic of the Cluefish workflow.

(back to top)

## Citation

If you use Cluefish, please cite the associated paper as follows:

> Ellis Franklin, Elise Billoir, Philippe Veber, Jérémie Ohanessian, Marie Laure Delignette-Muller, Sophie Martine Prud’homme, Cluefish: mining the dark matter of transcriptional data series with over-representation analysis enhanced by aggregated biological prior knowledge, *NAR Genomics and Bioinformatics*, Volume 7, Issue 3, September 2025, lqaf103,

## Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

1. Fork the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingIdea`)
3. Commit your Changes (`git commit -m 'Add some AmazingIdea'`)
4. Push to the Branch (`git push origin feature/AmazingIdea`)
5. Open a Pull Request

(back to top)

## License

This project is distributed under the CeCILL Free Software License Agreement v2.1 (CECILL-2.1). See `LICENSE.txt` for more information.

CECILL-2.1 is compatible with GNU GPL. See the [official CeCILL site](http://www.cecill.eu/index.en.html) for more information.

Please note that the creative assets, such as the logos and schematics associated with Cluefish, are distributed under the [CC-BY-SA-4.0 license](https://choosealicense.com/licenses/cc-by-sa-4.0/).

(back to top)

## Contact

If you have any need that is not yet covered, any feedback on Cluefish, or anything other question, feel free to contact me !

Ellis Franklin - [Website](https://ellfranklin.com/) - [LinkedIn](https://www.linkedin.com/in/ellis-franklin-6188831ba/) [Bluesky](https://bsky.app/profile/elfrank7.bsky.social) - [ellis.franklin\@univ-lorraine.fr](mailto:ellis.franklin@univ-lorraine.fr){.email}

Project Link:

(back to top)

## Acknowledgments

- [Othneil Drew's README template](https://github.com/othneildrew/Best-README-Template)
- [Malven's Flexbox Cheatsheet](https://flexbox.malven.co/)
- [Malven's Grid Cheatsheet](https://grid.malven.co/)
- [Img Shields](https://shields.io/)

(back to top)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ellfran-7/cluefish

Awesome Lists containing this project

README