https://github.com/pisong314/snoextract
Offline clinical free text --> structured SNOMED engine
https://github.com/pisong314/snoextract
entity-linking snomed snomed-ct snomed-ct-au
Last synced: 29 days ago
JSON representation
Offline clinical free text --> structured SNOMED engine
- Host: GitHub
- URL: https://github.com/pisong314/snoextract
- Owner: pisong314
- Created: 2026-05-17T08:37:59.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-05-28T21:02:22.000Z (about 1 month ago)
- Last Synced: 2026-05-28T23:07:49.698Z (about 1 month ago)
- Topics: entity-linking, snomed, snomed-ct, snomed-ct-au
- Homepage:
- Size: 59.6 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# SNOExtract
Lightweight offline SNOMED CT clinical concept extraction.
- Deterministic
- No outbound network calls — patient notes never leave the host
- CPU-only, 128 MB memory requirement
- REST / gRPC / CLI / Python
- Runs on-premise or at the point of care (clinician in the loop review encouraged)
Extracts SNOMED concepts (CUIs, semantic types, negation/uncertainty/historicity context) from clinical free-text. Ships as a self-contained binary with data files — no Python install, no database, no internet calls at runtime.

## Try it live (web demo)
**[snomed-ner-demo-874953055038.australia-southeast1.run.app](https://snomed-ner-demo-874953055038.australia-southeast1.run.app/)** — paste a clinical note, see entities in your browser.
Demo runs in the cloud for convenience; production binaries are fully offline. Don't paste real patient data into the demo.
## SNOMED CT licensing — required before installing
SNOExtract embeds SNOMED CT-AU concept data, so using the binaries requires a current SNOMED CT licence. This is a SNOMED International obligation, not a SNOExtract one.
- **In Australia** — licences are issued at no charge to healthcare organisations and approved researchers via the [National Clinical Terminology Service](https://www.healthterminologies.gov.au/access-clinical-terminology/access-snomed-ct-au/snomed-ct-au-releases/) (NCTS), administered by the Australian Digital Health Agency.
- **Outside Australia** — apply through your country's National Release Centre, or directly via [SNOMED International](https://www.snomed.org/) for affiliate licensing.
The dist includes `SNOMED_CT_NOTICE.txt` covering the attribution and end-user obligations that apply to your usage.
## Download
Latest builds for Linux and Windows (x86_64):
**[github.com/pisong314/snoextract/releases/latest](https://github.com/pisong314/snoextract/releases/latest)**
| File | Platform |
|---|---|
| `snoextract--linux-x86_64.tar.gz` | Linux glibc 2.28+ (RHEL 8 / Ubuntu 20.04+) |
| `snoextract--windows-x86_64.zip` | Windows 10/11, Server 2019+ |
Each build runs for 90 days, then you download a fresh build with the latest SNOMED CT-AU data. The exact date is printed in the bundled `README.txt`.
## Quickstart
Unzip what you downloaded first then pick the interface that matches how you'll use it.
### 1. Single-call CLI — `<100 ms` per call
`snoextract-json` reads JSON on stdin, writes JSON on stdout. Fresh process per call, ~70–100 ms load-dominated. Best for **ad-hoc use and low-volume integrations** — for bulk work, use [server mode](docs/rest.md) (6× faster per note).
Run from the unzipped dist directory so `./data` is auto-discovered, or point at it explicitly with `export SNOEXTRACT_DATA_DIR=/path/to/dist/data` (or `--data-dir`).
| | Linux / macOS | Windows (cmd) |
|---|---|---|
| **Run** | `echo '{"text":"Pt on Metformin 1g BD for diabetes mellitus."}' \| ./snoextract-json` | `echo {"text":"Pt on Metformin 1g BD for diabetes mellitus."} \| snoextract-json.exe` |
| **Or from file** | `./snoextract-json --input-file in.json --output-file out.json` | `snoextract-json.exe --input-file in.json --output-file out.json` |
Output (truncated):
```json
{
"version": "0.34.7",
"entities": [
{ "text": "Metformin", "start": 6, "end": 15, "cui": "372567009",
"name": "Metformin (substance)", "semantic_type": "substance", ... },
{ "text": "diabetes mellitus", "start": 23, "end": 40, "cui": "73211009",
"name": "Diabetes mellitus (disorder)", "semantic_type": "disorder", ... }
]
}
```
Full input/output schema is in the bundled `README.txt`.
### 2. Python (in-process, zero overhead)
Pre-built wheel ships under `wheels/`. Requires Python 3.10+.
```bash
python3 -m pip install wheels/snoextract-0.34.7-cp310-abi3-*.whl
export SNOEXTRACT_DATA_DIR=/path/to/dist/data
```
```python
from snoextract import Pipeline
pipeline = Pipeline.load() # reads SNOEXTRACT_DATA_DIR
result = pipeline.process("Patient has chest pain and diabetes mellitus.")
for e in result.entities:
print(e.text, e.cui, e.name, e.semantic_type)
```
One wheel works across CPython 3.10/3.11/3.12/3.13 (abi3).
## More
- **[docs/benchmarks.md](docs/benchmarks.md)** — perf and accuracy numbers
- **[docs/rest.md](docs/rest.md)** — REST interface: HTTP+JSON extract endpoint
- **[docs/grpc.md](docs/grpc.md)** — gRPC interface (`.proto` contract, client codegen for Python / Go / Node / C#)
- **[docs/python.md](docs/python.md)** — Python API: entity attributes, context flags (negation / uncertainty / historicity)
## Reporting issues or terminology gaps
Bugs and feature requests → **[Issues](https://github.com/pisong314/snoextract/issues)**.
Please include the dist version, your OS, and a minimal repro. The issue templates prompt for these.
**Terminology curation is ongoing.** Coverage prioritises Australian clinical contexts (GP notes, discharge summaries, common diagnoses and medications). If a concept you'd expect isn't matched — or is matched to the wrong CUI — open an Issue with the input snippet and the expected CUI. Your reports help improve coverage over time.
Maintained by **Dr Pi Songsiritat MBBS FRACGP** — [piyawoot.song@gmail.com](mailto:piyawoot.song@gmail.com). Questions and feedback welcome.