{"id":20425001,"url":"https://github.com/databio/bedms","last_synced_at":"2026-03-10T07:05:22.895Z","repository":{"id":256979083,"uuid":"760888787","full_name":"databio/bedms","owner":"databio","description":"Tool for standardization of genomics/epigenomics metadata","archived":false,"fork":false,"pushed_at":"2024-12-10T07:53:52.000Z","size":14567,"stargazers_count":3,"open_issues_count":2,"forks_count":0,"subscribers_count":17,"default_branch":"master","last_synced_at":"2026-02-22T08:14:21.397Z","etag":null,"topics":["genetics","genomic-intervals","metadata"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/databio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-02-20T21:08:37.000Z","updated_at":"2024-12-03T20:15:32.000Z","dependencies_parsed_at":"2025-04-12T18:54:30.893Z","dependency_job_id":"b754f56c-8822-4de2-957f-75fc9d914b9e","html_url":"https://github.com/databio/bedms","commit_stats":null,"previous_names":["databio/bedms","databio/bedmess"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/databio/bedms","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databio%2Fbedms","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databio%2Fbedms/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databio%2Fbedms/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databio%2Fbedms/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/databio","download_url":"https://codeload.github.com/databio/bedms/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databio%2Fbedms/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30326893,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-10T05:25:20.737Z","status":"ssl_error","status_checked_at":"2026-03-10T05:25:17.430Z","response_time":106,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["genetics","genomic-intervals","metadata"],"created_at":"2024-11-15T07:11:55.492Z","updated_at":"2026-03-10T07:05:22.872Z","avatar_url":"https://github.com/databio.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# BEDMS\n\nBEDMS (BED Metadata Standardizer) is a tool desgined to standardize genomics and epigenomics metadata attributes according to user-selected schemas such as `ENCODE`, `FAIRTRACKS` and `BEDBASE`. BEDMS ensures consistency and FAIRness of metadata across different platforms. Additionally, users have the option to train their own standardizer model using a custom schema (`CUSTOM`), allowing for the standardization of attributes based on users' specific research requirements. \n\n## Installation\n\nTo install `bedms` use this command: \n```\npip install bedms\n```\nor install the latest version from the GitHub repository:\n```\npip install git+https://github.com/databio/bedms.git\n```\n\n## Usage\n\n### Standardizing based on available schemas\n\nTo choose the schema you want to standardize according to, please refer to the [HuggingFace repository](https://huggingface.co/databio/attribute-standardizer-model6). Based on the schema design `.yaml` files, you can select which schema best represents your attributes. In the example below, we have chosen `encode` schema. \n\n```python\nfrom bedms import AttrStandardizer\n\nmodel = AttrStandardizer(\n    repo_id=\"databio/attribute-standardizer-model6\", model_name=\"encode\"\n)\nresults = model.standardize(pep=\"geo/gse228634:default\")\n\nassert results\n```\n\n### Training custom schemas\nTraining your custom schema is very easy with `BEDMS`. You would need two things to get started:\n1. Training Sets\n2. `training_config.yaml`\n\nTo instantiate `TrainStandardizer` class:\n\n```python\nfrom bedms.train import AttrStandardizerTrainer\n\ntrainer = AttrStandardizerTrainer(\"training_config.yaml\")\n\n```\nTo load the datasets and encode them:\n\n```python\ntrain_data, val_data, test_data, label_encoder, vectorizer = trainer.load_data()\n```\n\nTo train the custom model:\n\n```python\ntrainer.train()\n```\n\nTo test the custom model:\n\n```python\ntest_results_dict = trainer.test()\n```\n\nTo generate visualizations such as Learning Curves, Confusion Matrices, and ROC Curve:\n\n```python\nacc_fig, loss_fig, conf_fig, roc_fig = trainer.plot_visualizations() \n```\n\nWhere `acc_fig` is Accuracy Curve figure object, `loss_fig` is Loss Curve figure object, `conf_fig` is the Confusion Matrix figure object, and `roc_fig` is the ROC Curve figure object. \n\n\n### Standardizing based on custom schema\n\nFor standardizing based on custom schema, your model should be on HuggingFace. The directory structure should follow the instructions mentioned on [HuggingFace](https://huggingface.co/databio/attribute-standardizer-model6). \n\n```python\nfrom bedms import AttrStandardizer\n\nmodel = AttrStandardizer(\n    repo_id=\"name/of/your/hf/repo\", model_name=\"model/name\"\n)\nresults = model.standardize(pep=\"geo/gse228634:default\")\n\nprint(results) #Dictionary of suggested predictions with their confidence: {'attr_1':{'prediction_1': 0.70, 'prediction_2':0.30}}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabio%2Fbedms","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatabio%2Fbedms","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabio%2Fbedms/lists"}