https://github.com/open2c/assemblyinfo
Genome assembly metadata
https://github.com/open2c/assemblyinfo
Last synced: 8 months ago
JSON representation
Genome assembly metadata
- Host: GitHub
- URL: https://github.com/open2c/assemblyinfo
- Owner: open2c
- License: mit
- Created: 2024-05-28T18:02:30.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-08-12T03:20:57.000Z (almost 2 years ago)
- Last Synced: 2025-09-23T12:01:06.934Z (8 months ago)
- Language: Python
- Size: 2.1 MB
- Stars: 2
- Watchers: 8
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Assemblyinfo: Interact with assembly metadata in Python

[](https://genomeinfo.readthedocs.io/en/latest/)
[](https://bit.ly/open2c-slack)
Assemblyinfo simplifies the management and analysis of genome assembly metadata in Python.
This package provides:
* Efficient tools for querying and manipulating assembly information datasets.
* Streamlined methods for importing, exporting, and converting between common chromosome formats.
* Utilities for retrieving assembly statistics across different versions or species.
Read the [documentation](https://genomeinfo.readthedocs.io/en/latest/) for more information.
## Installation
Bioframe is available on [PyPI](https://pypi.org/project/bioframe/):
```sh
pip install assemblyinfo
```
## Basic operations on chromosome data
Assemblyinfo offers a flexible and straigthforward interface to interact and perform basic queries.
```python
import assemblyinfo
db = assemblyinfo.connect()
hg38 = db.assembly_info("hg38", roles=["assembled"])
```
Easily allows getting chromosome sizes:
```text
hg38.chromsizes
> name
> chr1 248956422
> chr2 242193529
> ...
```
chromosome equivalences:
```text
hg38.chromeq
> ncbi genbank refseq
> chr1 1 CM000663.2 NC_000001.11
> chr2 2 CM000664.2 NC_000002.12
> chr3 3 CM000665.2 NC_000003.12
> ...
```
or assembly metadata:
```text
hg38.metadata
> {'assembly_level': 'Chromosome',
'assembly_type': 'haploid-with-alt-loci',
'bioproject': 'PRJNA168',
'submitter': 'Genome Reference Consortium',
'synonyms': ['GRCh38', 'hg38'],
'taxid': '9606',
'species': 'homo_sapiens',
'common_name': 'human',
... }
```
and more!
# Request an assembly
Feel free to open an issue and request a non-reference assembly! Current supported species are:
```plaintext
['caenorhabditis_elegans',
'homo_sapiens',
'mus_musculus',
'drosophila_melanogaster',
'danio_rerio',
'bos_taurus',
'gallus_gallus',
'canis_lupus_familiaris']
```
You also can easily see which specific assemblies are supported by:
```python
db = assemblyinfo.connect()
db.available_assemblies()
```
## Citing
If you use ***assemblyinfo*** in your work, please refer to:
```bibtex
@software{assemblyinfo_2024,
author = {Open2C},
title = {assemblyinfo},
year = {2024},
publisher = {Github},
version = {v0.0.1},
url = {https://github.com/open2c/assemblyinfo}
}
```