An open API service indexing awesome lists of open source software.

https://github.com/coreason-ai/coreason-prism

coreason-prism
https://github.com/coreason-ai/coreason-prism

Last synced: 4 months ago
JSON representation

coreason-prism

Awesome Lists containing this project

README

          

# coreason-prism

**The Scientific Eye / Multi-Modal Encoder**

[![Organization](https://img.shields.io/badge/org-CoReason--AI-blue)](https://github.com/CoReason-AI)
[![License](https://img.shields.io/badge/license-Prosperity%203.0-blue)](https://prosperitylicense.com/versions/3.0.0)
[![CI](https://github.com/CoReason-AI/coreason_prism/actions/workflows/ci.yml/badge.svg)](https://github.com/CoReason-AI/coreason_prism/actions)
[![Code Quality](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)
[![Docs](https://img.shields.io/badge/docs-product%20requirements-green)](docs/product_requirements.md)

**coreason-prism** is the specialized processing engine for scientific data types (Chemistry and Vision) within the CoReason AI ecosystem. It acts as the "Scientific Eye" and "Multi-Modal Encoder", transforming fragile string representations and static images into robust, mathematical graphs and vectors.

**Core Philosophy:**
> "A Molecule is a Graph, not a String. A Chart is Data, not an Image."

---

## Features

Derived from the [Product Requirements Document](docs/product_requirements.md):

* **Cheminformatic Grounding (The Chemist):**
* Treats molecules as mathematical graphs, not strings.
* Normalizes and sanitizes chemical structures (SMILES/InChI) using `datamol`.
* Transmutes SMILES to **SELFIES** for 100% valid generative output.
* Computes fingerprints (Morgan/ECFP) for structural similarity search.
* Calculates key properties: Molecular Weight, LogP, TPSA, Lipinski Violations.

* **Visual De-Plotting (The Analyst):**
* Extracts raw data from scientific figures (e.g., Kaplan-Meier curves) using **DePlot**.
* Digitizes charts into linear tables/DataFrames.
* Enables meta-analysis of data locked in PDF images.

* **Bio-Image Segmentation (The Biologist):**
* Segments and classifies medical images (e.g., Histology) using **MedSAM**.
* Detects ROIs and computes metrics like cell counts and tumor area.

* **Multi-Modal Embedding (The Embedder):**
* Generates joint embeddings for text, molecules, and images using **BioCLIP**.
* Enables multi-modal retrieval (e.g., searching for histology slides via text description).

---

## Installation

```bash
pip install coreason-prism
```

Or install from source:

```bash
git clone https://github.com/CoReason-AI/coreason_prism.git
cd coreason_prism
pip install .
```

## Usage

Here is a concise snippet showing how to initialize and use the library:

```python
from pathlib import Path
from coreason_prism.interface import Prism, PrismMode

# Initialize Prism (The Facade)
prism = Prism(light_mode=False) # Set light_mode=True to skip heavy models (DePlot/BioCLIP)

# 1. Process a Molecule (SMILES -> Graph/SELFIES + Properties)
molecule_result = prism.process_molecule("CC(=O)Oc1ccccc1C(=O)O") # Aspirin
if molecule_result.status == "VALID":
print(f"Canonical SMILES: {molecule_result.canonical_smiles}")
print(f"SELFIES: {molecule_result.selfies_string}")
print(f"LogP: {molecule_result.logp}")
print(f"Fingerprint (first 10 bits): {molecule_result.fingerprint_vector[:10]}")

# 2. Process a Chart Image (Extract Data)
# Ensure you have an image file at the specified path
chart_path = Path("tests/data/kaplan_meier.png")
if chart_path.exists():
chart_result = prism.process_image(
image_path=chart_path,
source_document_id="doc_123",
mode=PrismMode.CHART
)
print(f"Figure Type: {chart_result.figure_type}")
print(f"Extracted Data: {chart_result.data_series}")
if chart_result.metadata:
print(f"Median Survival: {chart_result.metadata.get('median_survival')}")

# 3. Process a Bio-Image (Segmentation)
bio_path = Path("tests/data/histology_slide.jpg")
if bio_path.exists():
bio_result = prism.process_image(
image_path=bio_path,
source_document_id="doc_456",
mode=PrismMode.BIO
)
print(f"Cell Count: {bio_result.metadata.get('cell_count')}")
```