https://github.com/biocpy/summarizedexperiment
Container class for genomic experiments
https://github.com/biocpy/summarizedexperiment
summarizedexperiment
Last synced: 19 days ago
JSON representation
Container class for genomic experiments
- Host: GitHub
- URL: https://github.com/biocpy/summarizedexperiment
- Owner: BiocPy
- License: mit
- Created: 2022-06-15T06:18:08.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2025-04-21T16:28:40.000Z (21 days ago)
- Last Synced: 2025-04-21T17:35:41.536Z (21 days ago)
- Topics: summarizedexperiment
- Language: Python
- Homepage: https://biocpy.github.io/SummarizedExperiment/
- Size: 4.69 MB
- Stars: 5
- Watchers: 2
- Forks: 2
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
- Authors: AUTHORS.md
Awesome Lists containing this project
README
[](https://pyscaffold.org/)
[](https://pypi.org/project/SummarizedExperiment/)
# SummarizedExperiment
This package provides containers to represent genomic experimental data as 2-dimensional matrices, follows Bioconductor's [SummarizedExperiment](https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html). In these matrices, the rows typically denote features or genomic regions of interest, while columns represent samples or cells.
The package currently includes representations for both `SummarizedExperiment` and `RangedSummarizedExperiment`. A distinction lies in the fact `RangedSummarizedExperiment` object provides an additional slot to store genomic regions for each feature and is expected to be `GenomicRanges` (more [here](https://github.com/BiocPy/GenomicRanges/)).
## Install
To get started, Install the package from [PyPI](https://pypi.org/project/summarizedexperiment/),
```shell
pip install summarizedexperiment
```## Usage
A `SummarizedExperiment` contains three key attributes,
- `assays`: A dictionary of matrices with assay names as keys, e.g. counts, logcounts etc.
- `row_data`: Feature information e.g. genes, transcripts, exons, etc.
- `column_data`: Sample information about the columns of the matrices.First lets mock feature and sample data:
```python
from random import random
import pandas as pd
import numpy as np
from biocframe import BiocFramenrows = 200
ncols = 6
counts = np.random.rand(nrows, ncols)
row_data = BiocFrame(
{
"seqnames": [
"chr1",
"chr2",
"chr2",
"chr2",
"chr1",
"chr1",
"chr3",
"chr3",
"chr3",
"chr3",
]
* 20,
"starts": range(100, 300),
"ends": range(110, 310),
"strand": ["-", "+", "+", "*", "*", "+", "+", "+", "-", "-"] * 20,
"score": range(0, 200),
"GC": [random() for _ in range(10)] * 20,
}
)col_data = pd.DataFrame(
{
"treatment": ["ChIP", "Input"] * 3,
}
)
```To create a `SummarizedExperiment`,
```python
from summarizedexperiment import SummarizedExperimenttse = SummarizedExperiment(
assays={"counts": counts}, row_data=row_data, column_data=col_data,
metadata={"seq_platform": "Illumina NovaSeq 6000"},
)
```## output
class: SummarizedExperiment
dimensions: (200, 6)
assays(1): ['counts']
row_data columns(6): ['seqnames', 'starts', 'ends', 'strand', 'score', 'GC']
row_names(0):
column_data columns(1): ['treatment']
column_names(0):
metadata(1): seq_platformTo create a `RangedSummarizedExperiment`
```python
from summarizedexperiment import RangedSummarizedExperiment
from genomicranges import GenomicRangestrse = RangedSummarizedExperiment(
assays={"counts": counts}, row_data=row_data,
row_ranges=GenomicRanges.from_pandas(row_data.to_pandas()), column_data=col_data
)
```## output
class: RangedSummarizedExperiment
dimensions: (200, 6)
assays(1): ['counts']
row_data columns(6): ['seqnames', 'starts', 'ends', 'strand', 'score', 'GC']
row_names(0):
column_data columns(1): ['treatment']
column_names(0):
metadata(0):For more examples, checkout the [documentation](https://biocpy.github.io/SummarizedExperiment/).
## Note
This project has been set up using PyScaffold 4.5. For details and usage
information on PyScaffold see https://pyscaffold.org/.