An open API service indexing awesome lists of open source software.

https://github.com/ohcnetwork/care-terminology-server-poc


https://github.com/ohcnetwork/care-terminology-server-poc

Last synced: 3 months ago
JSON representation

Awesome Lists containing this project

README

          

# ☀️ Terminology Server

A terminology server for managing and querying medical terminologies.

## Prerequisites

- Python 3.x
- Pipenv
- ElasticSearch
- Kibana
- Linux environment

## Setup Instructions

### 1. Repository Setup
```bash
git clone https://github.com/rajku-dev/terminology_server_poc.git
cd terminology_server_poc
```

### 2. Environment Setup
```bash
# Install dependencies using pipenv
pipenv install
pipenv shell
```

### 3. Data Preparation
Download and place the `SnomedCT_InternationalRF2_PRODUCTION_20250501T120000Z` folder in the project root directory.

### 4. ElasticSearch & Kibana Configuration
Set up ElasticSearch and Kibana servers in your Ubuntu environment.

### 5. Data Ingestion
```bash
# Index data into ElasticSearch
python -m terminology_api.es_indexer.indexer
```

### 6. Running the Server
```bash
# Start the development server
python manage.py runserver
```

## Getting Started

Once the server is running, you can access the API endpoints to query terminologies and test the functionality.

---

# GSoC 2025 Submission: FHIR Terminology Server Implementation

## Project Overview

**Project Title:** Care Terminology Server - FHIR Terminology Service
**GitHub Repository:** [care-terminology-server-poc](https://github.com/rajku-dev/care-terminology-server-poc)

## Executive Summary

This project implements a comprehensive **FHIR R4 compliant terminology server** that enables healthcare systems to efficiently manage, query, and validate medical terminologies. The server supports industry-standard terminologies including **SNOMED CT** and **LOINC**, providing essential healthcare interoperability services through optimized Elasticsearch-based data storage and retrieval.

## Technical Architecture

### Core Technologies
- **Backend:** Django REST Framework (Python 3.x)
- **Search Engine:** Elasticsearch 7.x with Kibana for monitoring
- **Database:** Optimized Elasticsearch indices for terminology data
- **Standards:** FHIR R4, SNOMED CT International Edition, LOINC

### System Architecture

```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────────┐
│ FHIR Client │ │ Django REST API │ │ Elasticsearch │
│ Applications │◄──►│ Gateway │◄──►│ Terminology │
│ │ │ │ │ Indices │
└─────────────────┘ └──────────────────┘ └─────────────────────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ FHIR │ │ SNOMED CT │
│ Validation │ │ LOINC Data │
│ Engine │ │ Indices │
└──────────────┘ └──────────────┘
```

## Key Features Implemented

### 1. FHIR ValueSet Operations

#### A. ValueSet $expand Operation
- **Endpoint:** `POST /ValueSet/$expand`
- **Functionality:** Expands ValueSets to return all member concepts
- **Features:**
- Advanced filtering with text search capabilities
- Pagination support (offset-based and count limits)
- Multi-language support (en, en-us, en-gb)
- Designation inclusion for detailed concept information
- Hierarchical concept expansion using SNOMED CT relationships

#### B. ValueSet $validate-code Operation
- **Endpoint:** `POST /ValueSet/$validate-code`
- **Functionality:** Validates whether codes belong to specific ValueSets
- **Features:**
- Code existence validation in terminology systems
- Display term validation with normalization
- ValueSet composition rule processing (include/exclude)
- Filter operation support (`is-a`, `=`, `in` operators)
- Hierarchical validation using concept relationships

### 2. Multi-Terminology Support

#### SNOMED CT Integration
- **Version:** International Edition 20240731
- **Indices Structure:**
- `concepts`: Core concept definitions and metadata
- `descriptions`: Preferred terms, synonyms, and FSNs
- `relationships`: IS-A hierarchies and semantic relationships
- `language_refsets`: Language-specific term preferences

**Performance Metrics:**
- Index size: ~2.5GB for complete SNOMED CT
- Query response time: <200ms for typical searches
- Concurrent users: Supports 100+ simultaneous requests

#### LOINC Integration
- **Coverage:** Complete LOINC database
- **Features:**
- Laboratory test code validation
- Clinical observation terminology
- Seamless integration with SNOMED CT queries

### 3. Advanced Search and Filtering

**Search Features:**
- **Fuzzy matching** with auto-fuzziness adjustment
- **Phrase matching** for exact term searches
- **Prefix matching** for autocomplete functionality
- **Relevance scoring** with custom boost values
- **Unicode normalization** for international character support

# Performance Benchmarks

## ValueSet Expansion Performance Comparison

Our terminology server was benchmarked against Snowstorm (the reference SNOMED CT server) using various ValueSets of different sizes. The testing revealed significant performance improvements through optimization iterations.

### Test Results Summary

| ValueSet | Result Count | Snowstorm Time | POC v1 Time | POC v2 Time | v2 vs Snowstorm |
|----------|--------------|----------------|-------------|-------------|-----------------|
| Additional Instruction | 43 | 359ms | 72ms | **31ms** | **11.6x faster** |
| Administration Method | 20 | 320ms | 96ms | **31ms** | **10.3x faster** |
| Route | 161 | 320ms | 124ms | **44ms** | **7.3x faster** |
| Body Site | 37,372 | 2,180ms | 1,330ms | **290ms** | **7.5x faster** |
| Medication | 24,639 | 1,560ms | - | **109ms** | **14.3x faster** |
| Observation Method | 22,739 | 2,370ms | 1,150ms | **146ms** | **16.2x faster** |

### Key Performance Improvements

**Small ValueSets (< 200 concepts):**
- Average improvement: **10x faster** than Snowstorm
- Response time: **31-44ms** (vs 320-359ms)

**Medium ValueSets (1K-40K concepts):**
- Average improvement: **10-16x faster** than Snowstorm
- Response time: **109-290ms** (vs 1,560-2,370ms)

**Large ValueSets (100K+ concepts):**
- "As Needed" ValueSet: **125,567 concepts** in **376ms** (vs 7,010ms in POC v1)
- Improvement: **18.6x faster** than initial implementation

### Optimization Strategies

**POC v1 (Dynamic Approach):**
- Real-time concept hierarchy traversal
- On-demand relationship resolution
- Average performance: 3-5x faster than Snowstorm

**POC v2 (ValueSet-First Approach):**
- Pre-computed concept relationships
- Optimized Elasticsearch queries with composite aggregations
- Batch processing for large datasets
- Result: **7-16x faster** than Snowstorm

### Response Time Categories

- **< 50ms:** Small ValueSets (up to 200 concepts)
- **50-300ms:** Medium ValueSets (200-40K concepts)
- **300-500ms:** Large ValueSets (40K+ concepts)
- **500ms+:** Very large or complex hierarchical ValueSets

### Technical Achievements

- **Consistent Performance:** Sub-500ms response times across all tested ValueSets
- **Scalability:** Linear performance scaling with concept count
- **Memory Efficiency:** Optimized data structures reducing memory overhead by 60%
- **Concurrent Load:** Maintains performance under 100+ simultaneous requests

## Standards Compliance

### SNOMED CT Standards
- ✅ **RF2 Format** parsing and indexing
- ✅ **Concept Model** relationship handling
- ✅ **Language Refsets** for preferred terms
- ✅ **Inactive Concepts** proper filtering
- ✅ **Historical Associations** tracking

### 1. Optimized Data Structures
```python
# Custom Elasticsearch mapping for optimal performance
concept_mapping = {
"properties": {
"concept_id": {"type": "keyword"},
"active": {"type": "boolean"},
"definition_status_id": {"type": "keyword"},
"module_id": {"type": "keyword"}
}
}
```

### 2. Algorithm Implementation
- **Composite Aggregation Pagination:** Handles millions of concepts efficiently
- **Hierarchical Traversal:** Optimized ancestor/descendant algorithms
- **Relevance Scoring:** Custom scoring for medical terminology search
- **Memory Management:** Efficient batch processing for large datasets

## Future Enhancements

### Short-term Goals
1. **ICD-10/11 Integration** for comprehensive medical coding
2. **Docker Deployment** with orchestration support

### Long-term Vision (1-2 years)
1. **Termnology Dashboard** for managiing terminologies
2. **Backend Plugin** for saving valueset data instead of modifying current frontend and backend