https://github.com/databio/bedms
Tool for standardization of genomics/epigenomics metadata
https://github.com/databio/bedms
genetics genomic-intervals metadata
Last synced: 4 months ago
JSON representation
Tool for standardization of genomics/epigenomics metadata
- Host: GitHub
- URL: https://github.com/databio/bedms
- Owner: databio
- License: bsd-2-clause
- Created: 2024-02-20T21:08:37.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2024-12-10T07:53:52.000Z (over 1 year ago)
- Last Synced: 2026-02-22T08:14:21.397Z (4 months ago)
- Topics: genetics, genomic-intervals, metadata
- Language: Python
- Homepage:
- Size: 13.9 MB
- Stars: 3
- Watchers: 17
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# BEDMS
BEDMS (BED Metadata Standardizer) is a tool desgined to standardize genomics and epigenomics metadata attributes according to user-selected schemas such as `ENCODE`, `FAIRTRACKS` and `BEDBASE`. BEDMS ensures consistency and FAIRness of metadata across different platforms. Additionally, users have the option to train their own standardizer model using a custom schema (`CUSTOM`), allowing for the standardization of attributes based on users' specific research requirements.
## Installation
To install `bedms` use this command:
```
pip install bedms
```
or install the latest version from the GitHub repository:
```
pip install git+https://github.com/databio/bedms.git
```
## Usage
### Standardizing based on available schemas
To choose the schema you want to standardize according to, please refer to the [HuggingFace repository](https://huggingface.co/databio/attribute-standardizer-model6). Based on the schema design `.yaml` files, you can select which schema best represents your attributes. In the example below, we have chosen `encode` schema.
```python
from bedms import AttrStandardizer
model = AttrStandardizer(
repo_id="databio/attribute-standardizer-model6", model_name="encode"
)
results = model.standardize(pep="geo/gse228634:default")
assert results
```
### Training custom schemas
Training your custom schema is very easy with `BEDMS`. You would need two things to get started:
1. Training Sets
2. `training_config.yaml`
To instantiate `TrainStandardizer` class:
```python
from bedms.train import AttrStandardizerTrainer
trainer = AttrStandardizerTrainer("training_config.yaml")
```
To load the datasets and encode them:
```python
train_data, val_data, test_data, label_encoder, vectorizer = trainer.load_data()
```
To train the custom model:
```python
trainer.train()
```
To test the custom model:
```python
test_results_dict = trainer.test()
```
To generate visualizations such as Learning Curves, Confusion Matrices, and ROC Curve:
```python
acc_fig, loss_fig, conf_fig, roc_fig = trainer.plot_visualizations()
```
Where `acc_fig` is Accuracy Curve figure object, `loss_fig` is Loss Curve figure object, `conf_fig` is the Confusion Matrix figure object, and `roc_fig` is the ROC Curve figure object.
### Standardizing based on custom schema
For standardizing based on custom schema, your model should be on HuggingFace. The directory structure should follow the instructions mentioned on [HuggingFace](https://huggingface.co/databio/attribute-standardizer-model6).
```python
from bedms import AttrStandardizer
model = AttrStandardizer(
repo_id="name/of/your/hf/repo", model_name="model/name"
)
results = model.standardize(pep="geo/gse228634:default")
print(results) #Dictionary of suggested predictions with their confidence: {'attr_1':{'prediction_1': 0.70, 'prediction_2':0.30}}
```