An open API service indexing awesome lists of open source software.

https://github.com/vincentkoc/datahub-langchain

Seamless LLM Lineage for DataHub with LangChain and LangSmith
https://github.com/vincentkoc/datahub-langchain

datahub langchain langsmith lineage observability

Last synced: 5 months ago
JSON representation

Seamless LLM Lineage for DataHub with LangChain and LangSmith

Awesome Lists containing this project

README

          

> [!CAUTION]
> This is an experimental project and not ready for production use. Use at your own risk.

# Datahub LLM Lineage ๐Ÿ”—


Screenshot


Seamless LLM Lineage for DataHub with LangChain and LangSmith


Features โ€ข
Installation โ€ข
Quick Start โ€ข
Usage โ€ข
Architecture โ€ข
Contributing โ€ข
License


Python
License
LangChain
DataHub


Stars
Forks
Issues

A comprehensive observability solution that integrates LangChain and LangSmith workflows into DataHub's metadata platform, providing deep visibility into your LLM operations.

## Features

- ๐Ÿ”„ **Real-Time Observation**: Live monitoring of LangChain operations
- ๐Ÿ“Š **Rich Metadata**: Detailed tracking of models, prompts, and chains
- ๐Ÿ” **Deep Insights**: Comprehensive metrics and lineage tracking
- ๐Ÿš€ **Multiple Platforms**: Support for LangChain, LangSmith, and more
- ๐Ÿ›  **Extensible**: Easy to add new platforms and emitters
- ๐Ÿงช **Debug Mode**: Built-in debugging and dry run capabilities

## Installation

```bash
# Clone the repository
git clone
cd langchain-datahub-integration

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Copy and configure environment
cp .env.example .env
```

## Quick Start

1. **Configure Environment**

```bash
# Required environment variables
LANGSMITH_API_KEY=ls-...
LANGCHAIN_TRACING_V2=true
LANGCHAIN_PROJECT=default

OPENAI_API_KEY=sk-...

DATAHUB_GMS_URL=http://localhost:8080
DATAHUB_TOKEN=your_token_here
```

2. **Run Basic Example**

```python
from langchain_openai import ChatOpenAI
from src.platforms.langchain import LangChainObserver
from src.emitters.datahub import DataHubEmitter
from src.config import ObservabilityConfig

# Setup observation
config = ObservabilityConfig(langchain_verbose=True)
emitter = DataHubEmitter(gms_server="http://localhost:8080")
observer = LangChainObserver(config=config, emitter=emitter)

# Initialize LLM with observer
llm = ChatOpenAI(callbacks=[observer])

# Run with automatic observation
response = llm.invoke("Tell me a joke")
```

## Architecture

The integration consists of three main components:

1. **Observers** (`src/platforms/`)
- Real-time monitoring of LLM operations
- Metric collection and event tracking
- Platform-specific adapters

2. **Emitters** (`src/emitters/`)
- DataHub metadata emission
- Console debugging output
- JSON file export

3. **Collectors** (`src/collectors/`)
- Historical data collection
- Batch processing
- Aggregated metrics

## Usage Examples

### Basic LangChain Integration

```python
# examples/langchain_basic.py
from langchain_openai import ChatOpenAI
from src.platforms.langchain import LangChainObserver

observer = LangChainObserver(config=config, emitter=emitter)
llm = ChatOpenAI(callbacks=[observer])
```

### RAG Pipeline Integration

```python
# examples/langchain_rag.py
from langchain.chains import RetrievalQA
from src.utils.metrics import MetricsAggregator

chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=vectorstore.as_retriever(),
callbacks=[observer]
)
```

### Historical Data Ingestion

```python
# examples/langsmith_ingest.py
from src.cli.ingest import ingest_logic

ingest_logic(
days=7,
platform='langsmith',
debug=True,
save_debug_data=True
)
```

## Customization

The integration is highly customizable through:

- **Configuration** (`src/config.py`): Environment and platform settings
- **Custom Emitters**: Implement `LLMMetadataEmitter` for new destinations
- **Platform Extensions**: Add new platforms by implementing `LLMPlatformConnector`
- **Metrics Collection**: Extend `MetricsAggregator` for custom metrics

## Contributing

1. Fork the repository
2. Create a feature branch
3. Run tests and linting:
```bash
make test
make lint
```
4. Submit a pull request

## License

This project is licensed under the GNU General Public License v3.0 - see the [LICENSE](LICENSE) file for details.

---


Made with โค๏ธ by Vincent Koc