https://github.com/linkml/valuesets
Common value sets (enums) for science, biomedicine, computing, and other areas
https://github.com/linkml/valuesets
ai4curation fair-data linkml monarchinitiative semantics standards value-sets
Last synced: 4 months ago
JSON representation
Common value sets (enums) for science, biomedicine, computing, and other areas
- Host: GitHub
- URL: https://github.com/linkml/valuesets
- Owner: linkml
- License: apache-2.0
- Created: 2025-09-16T14:34:23.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-12-23T02:23:32.000Z (4 months ago)
- Last Synced: 2025-12-24T02:50:17.050Z (4 months ago)
- Topics: ai4curation, fair-data, linkml, monarchinitiative, semantics, standards, value-sets
- Language: Python
- Homepage: https://linkml.io/valuesets/
- Size: 33.9 MB
- Stars: 10
- Watchers: 0
- Forks: 0
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Governance: docs/governance.md
- Agents: AGENTS.md
Awesome Lists containing this project
README
# Common Value Sets
[](https://badge.fury.io/py/valuesets)
[](https://linkml.io/)
[](https://linkml.io/valuesets/)
[](https://w3id.org/valuesets/valuesets.owl.ttl)
[](https://bioportal.bioontology.org/ontologies/VALUESETS)
A comprehensive collection of standardized enumerations and value sets for data science, bioinformatics, materials science, and beyond.
## π― Why Common Value Sets?
Data standardization is hard. Every project reinvents the wheel with custom enums, inconsistent naming, and no semantic meaning.
**Common Value Sets** solves this by providing:
- π **Rich, standardized enumerations** β Pre-defined value sets across multiple domains
- 𧬠**Semantic meaning** β Every value is linked to ontology terms (when possible)
- π **Python-first convenience** β Work with simple enums, get semantics for free
- π **Multi-language support** β Generate JSON Schema, TypeScript, and more
- π **Interoperability** β Built on LinkML standards for maximum compatibility
---
### π A Simple Example
Different datasets often represent the same concept in incompatible ways:
- `M` / `F`
- `male` / `female`
- `1` / `2`
They all mean the same thing, but they donβt interoperate.
With **Common Value Sets**, you can instead use a shared enum:
```python
from valuesets.enums.core import SexEnum
s = SexEnum.MALE
print(s.value) # "MALE"
print(s.get_meaning()) # "NCIT:C20197"
print(s.get_description())# "Male sex"
```
## β‘ Quick Start
### For Python Developers
```python
from valuesets.enums.bio.structural_biology import StructuralBiologyTechnique
from valuesets.enums.spatial.spatial_qualifiers import AnatomicalSide
# Rich enums with metadata and ontology mappings
technique = StructuralBiologyTechnique.CRYO_EM
print(technique.value) # "CRYO_EM"
print(technique.get_description()) # "Cryo-electron microscopy"
print(technique.get_meaning()) # "CHMO:0002413" (Chemical Methods Ontology)
print(technique.get_annotations()) # {'resolution_range': '2-30 Γ
typical', ...}
# Spatial relationships with BSPO mappings
side = AnatomicalSide.LEFT
print(side.get_meaning()) # "BSPO:0000000" (Biological Spatial Ontology)
# Look up enums by their ontology terms
found = AnatomicalSide.from_meaning("BSPO:0000000") # Returns LEFT
```
### For Data Scientists
```python
from valuesets.enums.statistics import StatisticalTest, PValueThreshold
from valuesets.enums.data_science import DatasetSplitType, ModelType
# Standardized statistical tests with STATO ontology mappings
test = StatisticalTest.STUDENTS_T_TEST
print(test.get_meaning()) # "STATO:0000176"
print(test.get_description()) # "Student's t-test for comparing means"
# ML pipeline with standard splits
split = DatasetSplitType.TRAIN
model = ModelType.RANDOM_FOREST
# P-value thresholds with clear semantics
threshold = PValueThreshold.SIGNIFICANT
print(threshold.get_annotations()) # {'value': 0.05, 'symbol': '*'}
```
### For Bioinformaticians
```python
from valuesets.enums.bio.taxonomy import CommonOrganismTaxaEnum, BiologicalKingdom
from valuesets.enums.bio.cell_biology import CellCyclePhase, CellType
# Model organisms with NCBI Taxonomy IDs
human = CommonOrganismTaxaEnum.HUMAN
print(human.get_meaning()) # "NCBITaxon:9606"
print(human.get_description()) # "Homo sapiens (human)"
# Cell biology with CL and GO mappings
phase = CellCyclePhase.S_PHASE
print(phase.get_meaning()) # "GO:0000084"
neuron = CellType.NEURON
print(neuron.get_meaning()) # "CL:0000540"
# Get all organisms at a specific taxonomic level
mammals = [org for org in CommonOrganismTaxaEnum
if 'MAMMALIA' in str(org)]
```
## ποΈ Available Domains
### Core Domains (Most Mature)
- **𧬠Biology**:
- **Structural Biology**: Cryo-EM techniques, crystallization methods, detectors
- **Cell Biology**: Cell types, cell cycle phases, organelles
- **Taxonomy**: Model organisms (all with NCBI Taxonomy IDs)
- **π Spatial**: Anatomical directions, planes, relationships (BSPO mapped)
- **π Statistics**: Statistical tests (STATO mapped), p-value thresholds
### Expanding Domains
- **π§ͺ Data Science**: ML model types, dataset splits, metrics
- **βοΈ Materials Science**: Crystal structures, characterization methods
- **π₯ Clinical/Medical**: Blood types (SNOMED), vital status
- **π Environmental**: Exposure routes, pollutants
- **β‘ Energy**: Sources, storage methods, efficiency ratings
### Coming Soon
- **π§ Geography**: Country codes (ISO), time zones, coordinate systems
- **β° Time**: Temporal relationships, periods, frequencies
- **πΌ Academic**: Publication types, research roles, funding sources
- **π Industrial**: Manufacturing processes, quality standards
## π Multiple Use Cases
### 1. **LinkML Standards** (YAML schemas)
Use the raw LinkML schemas for data modeling, validation, and documentation:
```yaml
# Direct schema usage
Person:
attributes:
vital_status:
range: VitalStatusEnum # ALIVE, DECEASED, UNKNOWN
```
### 2. **Python Programming** (Rich Enums)
Get Python enums with full IDE support, type checking, and semantic metadata:
```python
# Type-safe enums with ontology mappings
status = VitalStatusEnum.ALIVE
print(status.meaning) # "NCIT:C37987"
```
### 3. **"Stealth Semantics"**
Write simple code, get semantic meaning automatically:
```python
# Example: Different systems use different names for the same concept
from valuesets.enums.medical import BloodTypeEnum
from external_system import PatientBloodType # Third-party enum
# Even though the enum values might be named differently:
# BloodTypeEnum.A_POSITIVE vs PatientBloodType.A_POS
# They map to the same SNOMED code: SNOMED:278149003
if blood_type.get_meaning() == patient_blood.get_meaning():
# Semantic interoperability - works across different naming conventions
process_compatible_blood_type()
# Or use the utility function
if same_meaning_as(blood_type, patient_blood):
process_compatible_blood_type()
```
### 4. **Multi-language Interoperability**
Generate schemas and types for any language:
```bash
# Generate JSON Schema for web apps
gen-jsonschema schema.yaml
# Generate TypeScript definitions
gen-typescript schema.yaml -t typescript
# Generate JSON-LD
gen-jsonld schema.yaml
```
### 5. **Integration & Tooling**
- **Excel/Google Sheets**: Generate dropdown validation lists
- **Web forms**: Auto-generate select options with descriptions
- **APIs**: Standardized response codes and classifications
- **Databases**: Consistent foreign key constraints
## π οΈ Advanced Features
### Hierarchical Relationships
```python
# Some enums support hierarchical is_a relationships
from valuesets.enums import ViralGenomeTypeEnum
# Baltimore classification with hierarchy
positive_rna = ViralGenomeTypeEnum.SSRNA_POSITIVE # Group IV
# inherits from SSRNA (single-stranded RNA)
```
### Rich Metadata
```python
from valuesets.enums.bio.structural_biology import CryoEMGridType
grid = CryoEMGridType.QUANTIFOIL
metadata = grid.get_metadata()
print(metadata)
# {
# 'name': 'QUANTIFOIL',
# 'value': 'QUANTIFOIL',
# 'description': 'Quantifoil holey carbon grid',
# 'annotations': {
# 'hole_sizes': '1.2/1.3, 2/1, 2/2 ΞΌm common',
# 'manufacturer': 'Quantifoil'
# }
# }
# Get all grid types with their descriptions at once
all_grids = CryoEMGridType.get_all_descriptions()
# {'C_FLAT': 'C-flat holey carbon grid', 'QUANTIFOIL': ...}
```
### Utility Functions
```python
from valuesets.enums.spatial import AnatomicalPlane
# Get all ontology mappings for an enum
mappings = AnatomicalPlane.get_all_meanings()
print(mappings)
# {'SAGITTAL': 'BSPO:0000417', 'CORONAL': 'BSPO:0000019', ...}
# List all metadata for every value in an enum
all_metadata = AnatomicalPlane.list_metadata()
for name, meta in all_metadata.items():
print(f"{name}: {meta.get('description', 'No description')}")
# Find enum by ontology term (useful for data integration)
plane = AnatomicalPlane.from_meaning("BSPO:0000417") # Returns SAGITTAL
```
### Dynamic Enums
Some enums in this collection are **dynamic enums** that can be expanded at runtime by querying ontologies. This uses LinkML's [Dynamic Enum](https://linkml.io/linkml/schemas/enums.html#dynamic-enums) feature.
```yaml
# Example: A dynamic enum that pulls values from an ontology
CellTypeEnum:
# Dynamic expansion from Cell Ontology
reachable_from:
source_ontology: obo:cl
source_nodes:
- CL:0000540 # neuron
include_self: false
relationship_types:
- rdfs:subClassOf
```
**Note**: Runtime expansion support is coming soon! Currently, dynamic enums provide:
- β
Static values with ontology mappings
- β
Metadata and descriptions
- π§ Runtime expansion from ontologies (coming in next release)
When runtime expansion is available, you'll be able to:
```python
# Future: Dynamically expand enum with all neuron subtypes
cell_types = CellTypeEnum.expand_from_ontology()
# Would add: MOTOR_NEURON, SENSORY_NEURON, INTERNEURON, etc.
```
## π Documentation
[**Full Documentation Website β**](https://linkml.io/valuesets/)
### OWL/RDF Representation
The value sets are also available as an OWL ontology for semantic web applications and ontology browsers:
- **Direct Download**: [https://w3id.org/valuesets/valuesets.owl.ttl](https://w3id.org/valuesets/valuesets.owl.ttl)
- **BioPortal**: Available at [BioPortal](https://bioportal.bioontology.org/ontologies/VALUESETS)
- **Ontology Lookup Service (OLS)**: Submission planned for [OLS](https://www.ebi.ac.uk/ols/)
The OWL representation allows you to:
- Browse value sets in ontology browsers
- Perform SPARQL queries
- Integrate with semantic web applications
- Link to other biomedical ontologies
## π Future Directions
### Maturity Levels
We plan to add maturity level metadata to each enum to help users understand their readiness:
- **π’ Stable**: Production-ready, well-tested, unlikely to change
- **π‘ Beta**: Usable but may have minor changes
- **π΄ Draft**: Under development, expect changes
```python
# Future: Check maturity before use
if enum_def.maturity_level == MaturityLevel.STABLE:
use_in_production()
```
### Modularization
Split the package into domain-specific modules for lighter installs:
```bash
# Future: Install only what you need
pip install valuesets-core # Core functionality
pip install valuesets-bio # Biological domains
pip install valuesets-materials # Materials science
pip install valuesets-clinical # Clinical/medical
```
### Community Extensions
- **Domain Packages**: Community-maintained domain-specific value sets
- **Organization Standards**: Company/institution-specific enums that extend base sets
- **Mapping Tables**: Cross-ontology and cross-standard mappings
### Advanced Features
- **π€ AI/LLM Integration**: Semantic annotations optimized for language models
- **π Usage Analytics**: Track which enums are most used, identify gaps
- **π Version Management**: Handle enum evolution with deprecation warnings
- **π Multi-ontology Support**: Map single values to multiple ontologies
- **π Fuzzy Matching**: Find enums by approximate string matching
## ποΈ Development
### Installation
```bash
git clone https://github.com/linkml/valuesets
cd valuesets
uv install
```
### Available Commands
```bash
just --list # Show all available commands
just test # Run tests
just doctest # Run doctests
just lint # Run linting
just site # Build documentation site
```
## π€ Contributing
We welcome contributions! Whether you're adding new domains, improving existing enums, or fixing bugs:
1. **Domain Experts**: Contribute standardized value sets for your field
2. **Developers**: Add utility functions, improve tooling, fix issues
3. **Users**: Report missing enums, suggest improvements, share use cases
## π Repository Structure
```
βββ src/valuesets/
β βββ schema/ # π LinkML YAML schemas (source of truth)
β β βββ bio/ # Biological domains
β β β βββ cell_biology.yaml
β β β βββ structural_biology.yaml
β β β βββ taxonomy.yaml
β β βββ spatial/ # Spatial and anatomical
β β β βββ spatial_qualifiers.yaml
β β βββ statistics.yaml
β β βββ core.yaml
β βββ enums/ # π Generated Python enums
β β βββ
β βββ generators/ # π§ Rich enum generator
β β βββ rich_enum.py
β βββ validators/ # β Ontology validation
β βββ enum_evaluator.py
βββ docs/ # π Documentation
βββ tests/ # π§ͺ Test cases
βββ test_rich_enums.py # Rich enum functionality
βββ validators/ # Ontology validation tests
```
## π Credits
Built with [LinkML](https://linkml.io/) and the [linkml-project-copier](https://github.com/dalito/linkml-project-copier) template.
---
*Making data standardization simple, semantic, and scalable* π