https://github.com/coreason-ai/coreason-prism
coreason-prism
https://github.com/coreason-ai/coreason-prism
Last synced: 4 months ago
JSON representation
coreason-prism
- Host: GitHub
- URL: https://github.com/coreason-ai/coreason-prism
- Owner: CoReason-AI
- License: other
- Created: 2026-01-11T05:23:27.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2026-01-25T21:20:16.000Z (4 months ago)
- Last Synced: 2026-01-26T12:47:42.745Z (4 months ago)
- Language: Python
- Size: 489 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Notice: NOTICE
- Agents: AGENTS.md
Awesome Lists containing this project
README
# coreason-prism
**The Scientific Eye / Multi-Modal Encoder**
[](https://github.com/CoReason-AI)
[](https://prosperitylicense.com/versions/3.0.0)
[](https://github.com/CoReason-AI/coreason_prism/actions)
[](https://github.com/astral-sh/ruff)
[](docs/product_requirements.md)
**coreason-prism** is the specialized processing engine for scientific data types (Chemistry and Vision) within the CoReason AI ecosystem. It acts as the "Scientific Eye" and "Multi-Modal Encoder", transforming fragile string representations and static images into robust, mathematical graphs and vectors.
**Core Philosophy:**
> "A Molecule is a Graph, not a String. A Chart is Data, not an Image."
---
## Features
Derived from the [Product Requirements Document](docs/product_requirements.md):
* **Cheminformatic Grounding (The Chemist):**
* Treats molecules as mathematical graphs, not strings.
* Normalizes and sanitizes chemical structures (SMILES/InChI) using `datamol`.
* Transmutes SMILES to **SELFIES** for 100% valid generative output.
* Computes fingerprints (Morgan/ECFP) for structural similarity search.
* Calculates key properties: Molecular Weight, LogP, TPSA, Lipinski Violations.
* **Visual De-Plotting (The Analyst):**
* Extracts raw data from scientific figures (e.g., Kaplan-Meier curves) using **DePlot**.
* Digitizes charts into linear tables/DataFrames.
* Enables meta-analysis of data locked in PDF images.
* **Bio-Image Segmentation (The Biologist):**
* Segments and classifies medical images (e.g., Histology) using **MedSAM**.
* Detects ROIs and computes metrics like cell counts and tumor area.
* **Multi-Modal Embedding (The Embedder):**
* Generates joint embeddings for text, molecules, and images using **BioCLIP**.
* Enables multi-modal retrieval (e.g., searching for histology slides via text description).
---
## Installation
```bash
pip install coreason-prism
```
Or install from source:
```bash
git clone https://github.com/CoReason-AI/coreason_prism.git
cd coreason_prism
pip install .
```
## Usage
Here is a concise snippet showing how to initialize and use the library:
```python
from pathlib import Path
from coreason_prism.interface import Prism, PrismMode
# Initialize Prism (The Facade)
prism = Prism(light_mode=False) # Set light_mode=True to skip heavy models (DePlot/BioCLIP)
# 1. Process a Molecule (SMILES -> Graph/SELFIES + Properties)
molecule_result = prism.process_molecule("CC(=O)Oc1ccccc1C(=O)O") # Aspirin
if molecule_result.status == "VALID":
print(f"Canonical SMILES: {molecule_result.canonical_smiles}")
print(f"SELFIES: {molecule_result.selfies_string}")
print(f"LogP: {molecule_result.logp}")
print(f"Fingerprint (first 10 bits): {molecule_result.fingerprint_vector[:10]}")
# 2. Process a Chart Image (Extract Data)
# Ensure you have an image file at the specified path
chart_path = Path("tests/data/kaplan_meier.png")
if chart_path.exists():
chart_result = prism.process_image(
image_path=chart_path,
source_document_id="doc_123",
mode=PrismMode.CHART
)
print(f"Figure Type: {chart_result.figure_type}")
print(f"Extracted Data: {chart_result.data_series}")
if chart_result.metadata:
print(f"Median Survival: {chart_result.metadata.get('median_survival')}")
# 3. Process a Bio-Image (Segmentation)
bio_path = Path("tests/data/histology_slide.jpg")
if bio_path.exists():
bio_result = prism.process_image(
image_path=bio_path,
source_document_id="doc_456",
mode=PrismMode.BIO
)
print(f"Cell Count: {bio_result.metadata.get('cell_count')}")
```