{"id":35621586,"url":"https://github.com/linkml/valuesets","last_synced_at":"2026-01-05T07:00:09.770Z","repository":{"id":315081922,"uuid":"1058035964","full_name":"linkml/valuesets","owner":"linkml","description":"Common value sets (enums) for science, biomedicine, computing, and other areas","archived":false,"fork":false,"pushed_at":"2025-12-23T02:23:32.000Z","size":35589,"stargazers_count":10,"open_issues_count":13,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-12-24T02:50:17.050Z","etag":null,"topics":["ai4curation","fair-data","linkml","monarchinitiative","semantics","standards","value-sets"],"latest_commit_sha":null,"homepage":"https://linkml.io/valuesets/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/linkml.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":"docs/governance.md","roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-09-16T14:34:23.000Z","updated_at":"2025-12-23T02:22:18.000Z","dependencies_parsed_at":"2025-09-16T16:43:47.455Z","dependency_job_id":"8495748f-e0ab-4cfb-8a92-f1443bac8dcf","html_url":"https://github.com/linkml/valuesets","commit_stats":null,"previous_names":["linkml/valuesets"],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/linkml/valuesets","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkml%2Fvaluesets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkml%2Fvaluesets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkml%2Fvaluesets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkml%2Fvaluesets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/linkml","download_url":"https://codeload.github.com/linkml/valuesets/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linkml%2Fvaluesets/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28214808,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2026-01-05T02:00:06.358Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai4curation","fair-data","linkml","monarchinitiative","semantics","standards","value-sets"],"created_at":"2026-01-05T07:00:05.834Z","updated_at":"2026-01-05T07:00:09.758Z","avatar_url":"https://github.com/linkml.png","language":"Python","readme":"# Common Value Sets\n\n[![PyPI version](https://badge.fury.io/py/valuesets.svg)](https://badge.fury.io/py/valuesets)\n[![LinkML](https://img.shields.io/badge/LinkML-1.9+-orange.svg)](https://linkml.io/)\n[![Documentation](https://img.shields.io/badge/docs-linkml.io-green.svg)](https://linkml.io/valuesets/)\n[![OWL/RDF](https://img.shields.io/badge/OWL-RDF-purple.svg)](https://w3id.org/valuesets/valuesets.owl.ttl)\n[![BioPortal](https://img.shields.io/badge/BioPortal-VALUESETS-blue.svg)](https://bioportal.bioontology.org/ontologies/VALUESETS)\n\nA comprehensive collection of standardized enumerations and value sets for data science, bioinformatics, materials science, and beyond.\n\n## 🎯 Why Common Value Sets?\n\nData standardization is hard. Every project reinvents the wheel with custom enums, inconsistent naming, and no semantic meaning.  \n**Common Value Sets** solves this by providing:\n\n- 📚 **Rich, standardized enumerations** – Pre-defined value sets across multiple domains  \n- 🧬 **Semantic meaning** – Every value is linked to ontology terms (when possible)  \n- 🐍 **Python-first convenience** – Work with simple enums, get semantics for free  \n- 🌐 **Multi-language support** – Generate JSON Schema, TypeScript, and more  \n- 🔗 **Interoperability** – Built on LinkML standards for maximum compatibility  \n\n---\n\n### 🔍 A Simple Example\n\nDifferent datasets often represent the same concept in incompatible ways:\n\n- `M` / `F`  \n- `male` / `female`  \n- `1` / `2`  \n\nThey all mean the same thing, but they don’t interoperate.  \nWith **Common Value Sets**, you can instead use a shared enum:\n\n```python\nfrom valuesets.enums.core import SexEnum\n\ns = SexEnum.MALE\nprint(s.value)            # \"MALE\"\nprint(s.get_meaning())    # \"NCIT:C20197\"\nprint(s.get_description())# \"Male sex\"\n```\n\n## ⚡ Quick Start\n\n### For Python Developers\n\n```python\nfrom valuesets.enums.bio.structural_biology import StructuralBiologyTechnique\nfrom valuesets.enums.spatial.spatial_qualifiers import AnatomicalSide\n\n# Rich enums with metadata and ontology mappings\ntechnique = StructuralBiologyTechnique.CRYO_EM\nprint(technique.value)  # \"CRYO_EM\"\nprint(technique.get_description())  # \"Cryo-electron microscopy\"\nprint(technique.get_meaning())  # \"CHMO:0002413\" (Chemical Methods Ontology)\nprint(technique.get_annotations())  # {'resolution_range': '2-30 Å typical', ...}\n\n# Spatial relationships with BSPO mappings\nside = AnatomicalSide.LEFT\nprint(side.get_meaning())  # \"BSPO:0000000\" (Biological Spatial Ontology)\n\n# Look up enums by their ontology terms\nfound = AnatomicalSide.from_meaning(\"BSPO:0000000\")  # Returns LEFT\n```\n\n### For Data Scientists\n\n```python\nfrom valuesets.enums.statistics import StatisticalTest, PValueThreshold\nfrom valuesets.enums.data_science import DatasetSplitType, ModelType\n\n# Standardized statistical tests with STATO ontology mappings\ntest = StatisticalTest.STUDENTS_T_TEST\nprint(test.get_meaning())  # \"STATO:0000176\"\nprint(test.get_description())  # \"Student's t-test for comparing means\"\n\n# ML pipeline with standard splits\nsplit = DatasetSplitType.TRAIN\nmodel = ModelType.RANDOM_FOREST\n\n# P-value thresholds with clear semantics\nthreshold = PValueThreshold.SIGNIFICANT\nprint(threshold.get_annotations())  # {'value': 0.05, 'symbol': '*'}\n```\n\n### For Bioinformaticians\n\n```python\nfrom valuesets.enums.bio.taxonomy import CommonOrganismTaxaEnum, BiologicalKingdom\nfrom valuesets.enums.bio.cell_biology import CellCyclePhase, CellType\n\n# Model organisms with NCBI Taxonomy IDs\nhuman = CommonOrganismTaxaEnum.HUMAN\nprint(human.get_meaning())  # \"NCBITaxon:9606\"\nprint(human.get_description())  # \"Homo sapiens (human)\"\n\n# Cell biology with CL and GO mappings\nphase = CellCyclePhase.S_PHASE\nprint(phase.get_meaning())  # \"GO:0000084\"\n\nneuron = CellType.NEURON\nprint(neuron.get_meaning())  # \"CL:0000540\"\n\n# Get all organisms at a specific taxonomic level\nmammals = [org for org in CommonOrganismTaxaEnum\n           if 'MAMMALIA' in str(org)]\n```\n\n## 🏗️ Available Domains\n\n### Core Domains (Most Mature)\n- **🧬 Biology**:\n  - **Structural Biology**: Cryo-EM techniques, crystallization methods, detectors\n  - **Cell Biology**: Cell types, cell cycle phases, organelles\n  - **Taxonomy**: Model organisms (all with NCBI Taxonomy IDs)\n- **📍 Spatial**: Anatomical directions, planes, relationships (BSPO mapped)\n- **📊 Statistics**: Statistical tests (STATO mapped), p-value thresholds\n\n### Expanding Domains\n- **🧪 Data Science**: ML model types, dataset splits, metrics\n- **⚗️ Materials Science**: Crystal structures, characterization methods\n- **🏥 Clinical/Medical**: Blood types (SNOMED), vital status\n- **🌍 Environmental**: Exposure routes, pollutants\n- **⚡ Energy**: Sources, storage methods, efficiency ratings\n\n### Coming Soon\n- **🧭 Geography**: Country codes (ISO), time zones, coordinate systems\n- **⏰ Time**: Temporal relationships, periods, frequencies\n- **💼 Academic**: Publication types, research roles, funding sources\n- **🏭 Industrial**: Manufacturing processes, quality standards\n\n## 🔄 Multiple Use Cases\n\n### 1. **LinkML Standards** (YAML schemas)\nUse the raw LinkML schemas for data modeling, validation, and documentation:\n```yaml\n# Direct schema usage\nPerson:\n  attributes:\n    vital_status:\n      range: VitalStatusEnum  # ALIVE, DECEASED, UNKNOWN\n```\n\n### 2. **Python Programming** (Rich Enums)\nGet Python enums with full IDE support, type checking, and semantic metadata:\n```python\n# Type-safe enums with ontology mappings\nstatus = VitalStatusEnum.ALIVE  \nprint(status.meaning)  # \"NCIT:C37987\"\n```\n\n### 3. **\"Stealth Semantics\"**\nWrite simple code, get semantic meaning automatically:\n```python\n# Example: Different systems use different names for the same concept\nfrom valuesets.enums.medical import BloodTypeEnum\nfrom external_system import PatientBloodType  # Third-party enum\n\n# Even though the enum values might be named differently:\n# BloodTypeEnum.A_POSITIVE vs PatientBloodType.A_POS\n# They map to the same SNOMED code: SNOMED:278149003\n\nif blood_type.get_meaning() == patient_blood.get_meaning():\n    # Semantic interoperability - works across different naming conventions\n    process_compatible_blood_type()\n\n# Or use the utility function\nif same_meaning_as(blood_type, patient_blood):\n    process_compatible_blood_type()\n```\n\n### 4. **Multi-language Interoperability**\nGenerate schemas and types for any language:\n\n```bash\n# Generate JSON Schema for web apps\ngen-jsonschema schema.yaml\n\n# Generate TypeScript definitions  \ngen-typescript schema.yaml -t typescript\n\n# Generate JSON-LD\ngen-jsonld schema.yaml\n```\n\n### 5. **Integration \u0026 Tooling**\n- **Excel/Google Sheets**: Generate dropdown validation lists\n- **Web forms**: Auto-generate select options with descriptions\n- **APIs**: Standardized response codes and classifications\n- **Databases**: Consistent foreign key constraints\n\n## 🛠️ Advanced Features\n\n### Hierarchical Relationships\n\n```python\n# Some enums support hierarchical is_a relationships\nfrom valuesets.enums import ViralGenomeTypeEnum\n\n# Baltimore classification with hierarchy\npositive_rna = ViralGenomeTypeEnum.SSRNA_POSITIVE  # Group IV\n# inherits from SSRNA (single-stranded RNA)\n```\n\n### Rich Metadata\n\n```python\nfrom valuesets.enums.bio.structural_biology import CryoEMGridType\n\ngrid = CryoEMGridType.QUANTIFOIL\nmetadata = grid.get_metadata()\nprint(metadata)\n# {\n#   'name': 'QUANTIFOIL',\n#   'value': 'QUANTIFOIL',\n#   'description': 'Quantifoil holey carbon grid',\n#   'annotations': {\n#     'hole_sizes': '1.2/1.3, 2/1, 2/2 μm common',\n#     'manufacturer': 'Quantifoil'\n#   }\n# }\n\n# Get all grid types with their descriptions at once\nall_grids = CryoEMGridType.get_all_descriptions()\n# {'C_FLAT': 'C-flat holey carbon grid', 'QUANTIFOIL': ...}\n```\n\n### Utility Functions\n\n```python\nfrom valuesets.enums.spatial import AnatomicalPlane\n\n# Get all ontology mappings for an enum\nmappings = AnatomicalPlane.get_all_meanings()\nprint(mappings)\n# {'SAGITTAL': 'BSPO:0000417', 'CORONAL': 'BSPO:0000019', ...}\n\n# List all metadata for every value in an enum\nall_metadata = AnatomicalPlane.list_metadata()\nfor name, meta in all_metadata.items():\n    print(f\"{name}: {meta.get('description', 'No description')}\")\n\n# Find enum by ontology term (useful for data integration)\nplane = AnatomicalPlane.from_meaning(\"BSPO:0000417\")  # Returns SAGITTAL\n```\n\n### Dynamic Enums\n\nSome enums in this collection are **dynamic enums** that can be expanded at runtime by querying ontologies. This uses LinkML's [Dynamic Enum](https://linkml.io/linkml/schemas/enums.html#dynamic-enums) feature.\n\n```yaml\n# Example: A dynamic enum that pulls values from an ontology\nCellTypeEnum:\n  # Dynamic expansion from Cell Ontology\n  reachable_from:\n    source_ontology: obo:cl\n    source_nodes:\n      - CL:0000540  # neuron\n    include_self: false\n    relationship_types:\n      - rdfs:subClassOf\n```\n\n**Note**: Runtime expansion support is coming soon! Currently, dynamic enums provide:\n- ✅ Static values with ontology mappings\n- ✅ Metadata and descriptions\n- 🚧 Runtime expansion from ontologies (coming in next release)\n\nWhen runtime expansion is available, you'll be able to:\n```python\n# Future: Dynamically expand enum with all neuron subtypes\ncell_types = CellTypeEnum.expand_from_ontology()\n# Would add: MOTOR_NEURON, SENSORY_NEURON, INTERNEURON, etc.\n```\n\n## 📖 Documentation\n\n[**Full Documentation Website →**](https://linkml.io/valuesets/)\n\n### OWL/RDF Representation\n\nThe value sets are also available as an OWL ontology for semantic web applications and ontology browsers:\n\n- **Direct Download**: [https://w3id.org/valuesets/valuesets.owl.ttl](https://w3id.org/valuesets/valuesets.owl.ttl)\n- **BioPortal**: Available at [BioPortal](https://bioportal.bioontology.org/ontologies/VALUESETS)\n- **Ontology Lookup Service (OLS)**: Submission planned for [OLS](https://www.ebi.ac.uk/ols/)\n\nThe OWL representation allows you to:\n- Browse value sets in ontology browsers\n- Perform SPARQL queries\n- Integrate with semantic web applications\n- Link to other biomedical ontologies\n\n## 🚀 Future Directions\n\n### Maturity Levels\nWe plan to add maturity level metadata to each enum to help users understand their readiness:\n\n- **🟢 Stable**: Production-ready, well-tested, unlikely to change\n- **🟡 Beta**: Usable but may have minor changes\n- **🔴 Draft**: Under development, expect changes\n\n```python\n# Future: Check maturity before use\nif enum_def.maturity_level == MaturityLevel.STABLE:\n    use_in_production()\n```\n\n### Modularization\nSplit the package into domain-specific modules for lighter installs:\n\n```bash\n# Future: Install only what you need\npip install valuesets-core        # Core functionality\npip install valuesets-bio         # Biological domains\npip install valuesets-materials   # Materials science\npip install valuesets-clinical    # Clinical/medical\n```\n\n### Community Extensions\n- **Domain Packages**: Community-maintained domain-specific value sets\n- **Organization Standards**: Company/institution-specific enums that extend base sets\n- **Mapping Tables**: Cross-ontology and cross-standard mappings\n\n### Advanced Features\n- **🤖 AI/LLM Integration**: Semantic annotations optimized for language models\n- **📊 Usage Analytics**: Track which enums are most used, identify gaps\n- **🔄 Version Management**: Handle enum evolution with deprecation warnings\n- **🌐 Multi-ontology Support**: Map single values to multiple ontologies\n- **🔍 Fuzzy Matching**: Find enums by approximate string matching\n\n## 🏗️ Development\n\n### Installation\n```bash\ngit clone https://github.com/linkml/valuesets\ncd valuesets\nuv install\n```\n\n### Available Commands\n```bash\njust --list  # Show all available commands\njust test    # Run tests  \njust doctest # Run doctests\njust lint    # Run linting\njust site    # Build documentation site\n```\n\n## 🤝 Contributing\n\nWe welcome contributions! Whether you're adding new domains, improving existing enums, or fixing bugs:\n\n1. **Domain Experts**: Contribute standardized value sets for your field\n2. **Developers**: Add utility functions, improve tooling, fix issues  \n3. **Users**: Report missing enums, suggest improvements, share use cases\n\n## 📁 Repository Structure\n\n```\n├── src/valuesets/\n│   ├── schema/              # 📝 LinkML YAML schemas (source of truth)\n│   │   ├── bio/            # Biological domains\n│   │   │   ├── cell_biology.yaml\n│   │   │   ├── structural_biology.yaml\n│   │   │   └── taxonomy.yaml\n│   │   ├── spatial/        # Spatial and anatomical\n│   │   │   └── spatial_qualifiers.yaml\n│   │   ├── statistics.yaml\n│   │   └── core.yaml\n│   ├── enums/              # 🐍 Generated Python enums\n│   │   └── \u003cauto-generated from schemas\u003e\n│   ├── generators/         # 🔧 Rich enum generator\n│   │   └── rich_enum.py\n│   └── validators/         # ✓ Ontology validation\n│       └── enum_evaluator.py\n├── docs/                   # 📚 Documentation\n└── tests/                  # 🧪 Test cases\n    ├── test_rich_enums.py  # Rich enum functionality\n    └── validators/         # Ontology validation tests\n```\n\n## 📜 Credits\n\nBuilt with [LinkML](https://linkml.io/) and the [linkml-project-copier](https://github.com/dalito/linkml-project-copier) template.\n\n---\n\n*Making data standardization simple, semantic, and scalable* 🚀\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinkml%2Fvaluesets","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinkml%2Fvaluesets","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinkml%2Fvaluesets/lists"}