https://github.com/vincentkoc/datahub-langchain

Seamless LLM Lineage for DataHub with LangChain and LangSmith
https://github.com/vincentkoc/datahub-langchain

datahub langchain langsmith lineage observability

Last synced: 5 months ago
JSON representation

Seamless LLM Lineage for DataHub with LangChain and LangSmith

Host: GitHub
URL: https://github.com/vincentkoc/datahub-langchain
Owner: vincentkoc
License: gpl-3.0
Created: 2024-11-23T04:24:16.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-11-25T13:42:04.000Z (over 1 year ago)
Last Synced: 2025-10-14T16:52:51.357Z (9 months ago)
Topics: datahub, langchain, langsmith, lineage, observability
Language: Python
Homepage:
Size: 672 KB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          > [!CAUTION]

> This is an experimental project and not ready for production use. Use at your own risk.

# Datahub LLM Lineage 🔗



  





  Seamless LLM Lineage for DataHub with LangChain and LangSmith





  Features •

  Installation •

  Quick Start •

  Usage •

  Architecture •

  Contributing •

  License





  

  

  

  

  


  

  

  



A comprehensive observability solution that integrates LangChain and LangSmith workflows into DataHub's metadata platform, providing deep visibility into your LLM operations.

## Features

- 🔄 **Real-Time Observation**: Live monitoring of LangChain operations

- 📊 **Rich Metadata**: Detailed tracking of models, prompts, and chains

- 🔍 **Deep Insights**: Comprehensive metrics and lineage tracking

- 🚀 **Multiple Platforms**: Support for LangChain, LangSmith, and more

- 🛠 **Extensible**: Easy to add new platforms and emitters

- 🧪 **Debug Mode**: Built-in debugging and dry run capabilities

## Installation

```bash

# Clone the repository

git clone 

cd langchain-datahub-integration

# Create and activate virtual environment

python -m venv .venv

source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies

pip install -r requirements.txt

# Copy and configure environment

cp .env.example .env

```

## Quick Start

1. **Configure Environment**

```bash

# Required environment variables

LANGSMITH_API_KEY=ls-...

LANGCHAIN_TRACING_V2=true

LANGCHAIN_PROJECT=default

OPENAI_API_KEY=sk-...

DATAHUB_GMS_URL=http://localhost:8080

DATAHUB_TOKEN=your_token_here

```

2. **Run Basic Example**

```python

from langchain_openai import ChatOpenAI

from src.platforms.langchain import LangChainObserver

from src.emitters.datahub import DataHubEmitter

from src.config import ObservabilityConfig

# Setup observation

config = ObservabilityConfig(langchain_verbose=True)

emitter = DataHubEmitter(gms_server="http://localhost:8080")

observer = LangChainObserver(config=config, emitter=emitter)

# Initialize LLM with observer

llm = ChatOpenAI(callbacks=[observer])

# Run with automatic observation

response = llm.invoke("Tell me a joke")

```

## Architecture

The integration consists of three main components:

1. **Observers** (`src/platforms/`)

   - Real-time monitoring of LLM operations

   - Metric collection and event tracking

   - Platform-specific adapters

2. **Emitters** (`src/emitters/`)

   - DataHub metadata emission

   - Console debugging output

   - JSON file export

3. **Collectors** (`src/collectors/`)

   - Historical data collection

   - Batch processing

   - Aggregated metrics

## Usage Examples

### Basic LangChain Integration

```python

# examples/langchain_basic.py

from langchain_openai import ChatOpenAI

from src.platforms.langchain import LangChainObserver

observer = LangChainObserver(config=config, emitter=emitter)

llm = ChatOpenAI(callbacks=[observer])

```

### RAG Pipeline Integration

```python

# examples/langchain_rag.py

from langchain.chains import RetrievalQA

from src.utils.metrics import MetricsAggregator

chain = RetrievalQA.from_chain_type(

    llm=llm,

    retriever=vectorstore.as_retriever(),

    callbacks=[observer]

)

```

### Historical Data Ingestion

```python

# examples/langsmith_ingest.py

from src.cli.ingest import ingest_logic

ingest_logic(

    days=7,

    platform='langsmith',

    debug=True,

    save_debug_data=True

)

```

## Customization

The integration is highly customizable through:

- **Configuration** (`src/config.py`): Environment and platform settings

- **Custom Emitters**: Implement `LLMMetadataEmitter` for new destinations

- **Platform Extensions**: Add new platforms by implementing `LLMPlatformConnector`

- **Metrics Collection**: Extend `MetricsAggregator` for custom metrics

## Contributing

1. Fork the repository

2. Create a feature branch

3. Run tests and linting:

   ```bash

   make test

   make lint

   ```

4. Submit a pull request

## License

This project is licensed under the GNU General Public License v3.0 - see the [LICENSE](LICENSE) file for details.

---



  Made with ❤️ by Vincent Koc

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vincentkoc/datahub-langchain

Awesome Lists containing this project

README