https://github.com/biocpy/summarizedexperiment

Container class for genomic experiments
https://github.com/biocpy/summarizedexperiment

summarizedexperiment

Last synced: 3 months ago
JSON representation

Container class for genomic experiments

Host: GitHub
URL: https://github.com/biocpy/summarizedexperiment
Owner: BiocPy
License: mit
Created: 2022-06-15T06:18:08.000Z (about 3 years ago)
Default Branch: master
Last Pushed: 2025-04-21T16:28:40.000Z (3 months ago)
Last Synced: 2025-04-21T17:35:41.536Z (3 months ago)
Topics: summarizedexperiment
Language: Python
Homepage: https://biocpy.github.io/SummarizedExperiment/
Size: 4.69 MB
Stars: 5
Watchers: 2
Forks: 2
Open Issues: 5
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
- Authors: AUTHORS.md

Awesome Lists containing this project

README

        [![Project generated with PyScaffold](https://img.shields.io/badge/-PyScaffold-005CA0?logo=pyscaffold)](https://pyscaffold.org/)

[![PyPI-Server](https://img.shields.io/pypi/v/SummarizedExperiment.svg)](https://pypi.org/project/SummarizedExperiment/)

![Unit tests](https://github.com/BiocPy/SummarizedExperiment/actions/workflows/run-tests.yml/badge.svg)

# SummarizedExperiment

This package provides containers to represent genomic experimental data as 2-dimensional matrices, follows Bioconductor's [SummarizedExperiment](https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html). In these matrices, the rows typically denote features or genomic regions of interest, while columns represent samples or cells.

The package currently includes representations for both `SummarizedExperiment` and `RangedSummarizedExperiment`. A distinction lies in the fact `RangedSummarizedExperiment` object provides an additional slot to store genomic regions for each feature and is expected to be `GenomicRanges` (more [here](https://github.com/BiocPy/GenomicRanges/)).

## Install

To get started, Install the package from [PyPI](https://pypi.org/project/summarizedexperiment/),

```shell

pip install summarizedexperiment

```

## Usage

A `SummarizedExperiment` contains three key attributes,

- `assays`: A dictionary of matrices with assay names as keys, e.g. counts, logcounts etc.

- `row_data`: Feature information e.g. genes, transcripts, exons, etc.

- `column_data`: Sample information about the columns of the matrices.

First lets mock feature and sample data:

```python

from random import random

import pandas as pd

import numpy as np

from biocframe import BiocFrame

nrows = 200

ncols = 6

counts = np.random.rand(nrows, ncols)

row_data = BiocFrame(

    {

        "seqnames": [

            "chr1",

            "chr2",

            "chr2",

            "chr2",

            "chr1",

            "chr1",

            "chr3",

            "chr3",

            "chr3",

            "chr3",

        ]

        * 20,

        "starts": range(100, 300),

        "ends": range(110, 310),

        "strand": ["-", "+", "+", "*", "*", "+", "+", "+", "-", "-"] * 20,

        "score": range(0, 200),

        "GC": [random() for _ in range(10)] * 20,

    }

)

col_data = pd.DataFrame(

    {

        "treatment": ["ChIP", "Input"] * 3,

    }

)

```

To create a `SummarizedExperiment`,

```python

from summarizedexperiment import SummarizedExperiment

tse = SummarizedExperiment(

    assays={"counts": counts}, row_data=row_data, column_data=col_data,

    metadata={"seq_platform": "Illumina NovaSeq 6000"},

)

```

    ## output

    class: SummarizedExperiment

    dimensions: (200, 6)

    assays(1): ['counts']

    row_data columns(6): ['seqnames', 'starts', 'ends', 'strand', 'score', 'GC']

    row_names(0):

    column_data columns(1): ['treatment']

    column_names(0):

    metadata(1): seq_platform

To create a `RangedSummarizedExperiment`

```python

from summarizedexperiment import RangedSummarizedExperiment

from genomicranges import GenomicRanges

trse = RangedSummarizedExperiment(

    assays={"counts": counts}, row_data=row_data,

    row_ranges=GenomicRanges.from_pandas(row_data.to_pandas()), column_data=col_data

)

```

    ## output

    class: RangedSummarizedExperiment

    dimensions: (200, 6)

    assays(1): ['counts']

    row_data columns(6): ['seqnames', 'starts', 'ends', 'strand', 'score', 'GC']

    row_names(0):

    column_data columns(1): ['treatment']

    column_names(0):

    metadata(0):

For more examples, checkout the [documentation](https://biocpy.github.io/SummarizedExperiment/).

## Note

This project has been set up using PyScaffold 4.5. For details and usage

information on PyScaffold see https://pyscaffold.org/.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/biocpy/summarizedexperiment

Awesome Lists containing this project

README