{"id":22476885,"url":"https://github.com/vincentkoc/datahub-langchain","last_synced_at":"2026-01-29T10:02:11.748Z","repository":{"id":264300657,"uuid":"892927786","full_name":"vincentkoc/datahub-langchain","owner":"vincentkoc","description":"Seamless LLM Lineage for DataHub with LangChain and LangSmith","archived":false,"fork":false,"pushed_at":"2024-11-25T13:42:04.000Z","size":688,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-14T16:52:51.357Z","etag":null,"topics":["datahub","langchain","langsmith","lineage","observability"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vincentkoc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-11-23T04:24:16.000Z","updated_at":"2025-08-24T14:37:37.000Z","dependencies_parsed_at":null,"dependency_job_id":"7b985a5e-2372-4053-82a9-ca5a5ba82b57","html_url":"https://github.com/vincentkoc/datahub-langchain","commit_stats":null,"previous_names":["vincentkoc/datahub-langchain"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/vincentkoc/datahub-langchain","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vincentkoc%2Fdatahub-langchain","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vincentkoc%2Fdatahub-langchain/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vincentkoc%2Fdatahub-langchain/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vincentkoc%2Fdatahub-langchain/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vincentkoc","download_url":"https://codeload.github.com/vincentkoc/datahub-langchain/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vincentkoc%2Fdatahub-langchain/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28875445,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-29T09:47:23.353Z","status":"ssl_error","status_checked_at":"2026-01-29T09:47:19.357Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["datahub","langchain","langsmith","lineage","observability"],"created_at":"2024-12-06T14:08:28.676Z","updated_at":"2026-01-29T10:02:11.740Z","avatar_url":"https://github.com/vincentkoc.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003e [!CAUTION]\n\u003e This is an experimental project and not ready for production use. Use at your own risk.\n\n# Datahub LLM Lineage 🔗\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"screenshot.png\" alt=\"Screenshot\" width=\"700\"/\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eSeamless LLM Lineage for DataHub with LangChain and LangSmith\u003c/strong\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#features\"\u003eFeatures\u003c/a\u003e •\n  \u003ca href=\"#installation\"\u003eInstallation\u003c/a\u003e •\n  \u003ca href=\"#quick-start\"\u003eQuick Start\u003c/a\u003e •\n  \u003ca href=\"#usage\"\u003eUsage\u003c/a\u003e •\n  \u003ca href=\"#architecture\"\u003eArchitecture\u003c/a\u003e •\n  \u003ca href=\"#contributing\"\u003eContributing\u003c/a\u003e •\n  \u003ca href=\"#license\"\u003eLicense\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/python-3.8+-blue.svg\" alt=\"Python\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/license-GPL--3.0-green.svg\" alt=\"License\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/LangChain-Integrated-orange.svg\" alt=\"LangChain\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/DataHub-Compatible-purple.svg\" alt=\"DataHub\"\u003e\n  \u003cbr/\u003e\n  \u003cimg src=\"https://img.shields.io/github/stars/vincentkoc/natilius\" alt=\"Stars\"\u003e\n  \u003cimg src=\"https://img.shields.io/github/forks/vincentkoc/natilius\" alt=\"Forks\"\u003e\n  \u003cimg src=\"https://img.shields.io/github/issues/vincentkoc/natilius\" alt=\"Issues\"\u003e\n\u003c/p\u003e\n\nA comprehensive observability solution that integrates LangChain and LangSmith workflows into DataHub's metadata platform, providing deep visibility into your LLM operations.\n\n## Features\n\n- 🔄 **Real-Time Observation**: Live monitoring of LangChain operations\n- 📊 **Rich Metadata**: Detailed tracking of models, prompts, and chains\n- 🔍 **Deep Insights**: Comprehensive metrics and lineage tracking\n- 🚀 **Multiple Platforms**: Support for LangChain, LangSmith, and more\n- 🛠 **Extensible**: Easy to add new platforms and emitters\n- 🧪 **Debug Mode**: Built-in debugging and dry run capabilities\n\n## Installation\n\n```bash\n# Clone the repository\ngit clone \u003crepository-url\u003e\ncd langchain-datahub-integration\n\n# Create and activate virtual environment\npython -m venv .venv\nsource .venv/bin/activate  # On Windows: .venv\\Scripts\\activate\n\n# Install dependencies\npip install -r requirements.txt\n\n# Copy and configure environment\ncp .env.example .env\n```\n\n## Quick Start\n\n1. **Configure Environment**\n\n```bash\n# Required environment variables\nLANGSMITH_API_KEY=ls-...\nLANGCHAIN_TRACING_V2=true\nLANGCHAIN_PROJECT=default\n\nOPENAI_API_KEY=sk-...\n\nDATAHUB_GMS_URL=http://localhost:8080\nDATAHUB_TOKEN=your_token_here\n```\n\n2. **Run Basic Example**\n\n```python\nfrom langchain_openai import ChatOpenAI\nfrom src.platforms.langchain import LangChainObserver\nfrom src.emitters.datahub import DataHubEmitter\nfrom src.config import ObservabilityConfig\n\n# Setup observation\nconfig = ObservabilityConfig(langchain_verbose=True)\nemitter = DataHubEmitter(gms_server=\"http://localhost:8080\")\nobserver = LangChainObserver(config=config, emitter=emitter)\n\n# Initialize LLM with observer\nllm = ChatOpenAI(callbacks=[observer])\n\n# Run with automatic observation\nresponse = llm.invoke(\"Tell me a joke\")\n```\n\n## Architecture\n\nThe integration consists of three main components:\n\n1. **Observers** (`src/platforms/`)\n   - Real-time monitoring of LLM operations\n   - Metric collection and event tracking\n   - Platform-specific adapters\n\n2. **Emitters** (`src/emitters/`)\n   - DataHub metadata emission\n   - Console debugging output\n   - JSON file export\n\n3. **Collectors** (`src/collectors/`)\n   - Historical data collection\n   - Batch processing\n   - Aggregated metrics\n\n## Usage Examples\n\n### Basic LangChain Integration\n\n```python\n# examples/langchain_basic.py\nfrom langchain_openai import ChatOpenAI\nfrom src.platforms.langchain import LangChainObserver\n\nobserver = LangChainObserver(config=config, emitter=emitter)\nllm = ChatOpenAI(callbacks=[observer])\n```\n\n### RAG Pipeline Integration\n\n```python\n# examples/langchain_rag.py\nfrom langchain.chains import RetrievalQA\nfrom src.utils.metrics import MetricsAggregator\n\nchain = RetrievalQA.from_chain_type(\n    llm=llm,\n    retriever=vectorstore.as_retriever(),\n    callbacks=[observer]\n)\n```\n\n### Historical Data Ingestion\n\n```python\n# examples/langsmith_ingest.py\nfrom src.cli.ingest import ingest_logic\n\ningest_logic(\n    days=7,\n    platform='langsmith',\n    debug=True,\n    save_debug_data=True\n)\n```\n\n## Customization\n\nThe integration is highly customizable through:\n\n- **Configuration** (`src/config.py`): Environment and platform settings\n- **Custom Emitters**: Implement `LLMMetadataEmitter` for new destinations\n- **Platform Extensions**: Add new platforms by implementing `LLMPlatformConnector`\n- **Metrics Collection**: Extend `MetricsAggregator` for custom metrics\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Run tests and linting:\n   ```bash\n   make test\n   make lint\n   ```\n4. Submit a pull request\n\n## License\n\nThis project is licensed under the GNU General Public License v3.0 - see the [LICENSE](LICENSE) file for details.\n\n---\n\n\u003cp align=\"center\"\u003e\n  Made with ❤️ by \u003ca href=\"https://github.com/vincentkoc\"\u003eVincent Koc\u003c/a\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvincentkoc%2Fdatahub-langchain","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvincentkoc%2Fdatahub-langchain","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvincentkoc%2Fdatahub-langchain/lists"}