https://github.com/whylabs/langkit

🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring safety & security. 🛡️ Features include text quality, relevance metrics, & sentiment analysis. 📊 A comprehensive tool for LLM observability. 👀
https://github.com/whylabs/langkit

large-language-models machine-learning nlg nlp observability prompt-engineering prompt-injection

Last synced: 6 months ago
JSON representation

Host: GitHub
URL: https://github.com/whylabs/langkit
Owner: whylabs
License: apache-2.0
Created: 2023-04-26T21:46:58.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-10-29T18:29:30.000Z (11 months ago)
Last Synced: 2024-10-29T20:36:17.095Z (11 months ago)
Topics: large-language-models, machine-learning, nlg, nlp, observability, prompt-engineering, prompt-injection
Language: Jupyter Notebook
Homepage: https://whylabs.ai
Size: 4.37 MB
Stars: 835
Watchers: 15
Forks: 66
Open Issues: 32
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-ai-cybersecurity - langkit - A toolkit for monitoring language models and detecting attacks. (Securing AI SaaS / Defensive Tools and Frameworks)
awesome-ai-cybersecurity - langkit
awesome-llm-eval - LangKit
awesome-open-data-centric-ai - langkit - source toolkit for monitoring Large Language Models (LLMs). | ![GitHub stars](https://img.shields.io/github/stars/whylabs/langkit?style=social) | <a href="https://github.com/whylabs/langkit/blob/main/LICENSE"><img src="https://img.shields.io/github/license/whylabs/langkit" height="15"/></a> | (Observability and Monitoring)
awesome-ai-security - langkit - _LangKit is an open-source text metrics toolkit for monitoring language models. The toolkit various security related metrics that can be used to detect attacks_ (Defensive tools and frameworks / Detection & scanners)
Awesome-LLMSecOps - WhyLabs LangKit - 06-12 | (PINT Benchmark scores (by lakera))
StarryDivineSky - whylabs/langkit - 与用户定义的正则表达式模式组匹配的字符串计数、越狱 - 已知越狱尝试的相似性分数、提示注入 - 已知提示注入攻击的相似性分数、幻觉 - 反应之间的一致性检查、拒绝 - 与已知 LLM 拒绝服务响应的相似度得分）；情绪和毒性（情感分析、毒性分析） (A01_文本生成_文本对话 / 大语言对话模型及数据)
awesome-llmops - LangKit - of-the-box LLM telemetry collection library that extracts features and profiles prompts, responses and metadata about how your LLM is performing over time to find problems at scale. | ![GitHub Badge](https://img.shields.io/github/stars/whylabs/langkit.svg?style=flat-square) | (LLMOps / Observability)
awesome-LLM-resources - LangKit - source toolkit for monitoring Large Language Models (LLMs). Extracts signals from prompts & responses, ensuring safety & security. (数据 Data)
awesome_ai_agents - Langkit - 🔍 LangKit - An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring safe… (Building / Tools)
awesome_ai_agents - Langkit - 🔍 LangKit - An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring safe… (Building / Tools)

README

          # LangKit

![LangKit graphic](static/img/LangKit-2024-readme.png)

LangKit is an open-source text metrics toolkit for monitoring language models. It offers an array of methods for extracting relevant signals from the input and/or output text, which are compatible with the open-source data logging library [whylogs](https://whylogs.readthedocs.io/en/latest).

> 💡 Want to experience LangKit? Go to this [notebook](https://github.com/whylabs/langkit/blob/main/langkit/examples/Intro_to_Langkit.ipynb)!

## Table of Contents 📖

- [Motivation](#motivation-)

- [Features](#features-)

- [Installation](#installation-)

- [Usage](#usage-)

- [Modules](#modules-)

## Motivation 🎯

Productionizing language models, including LLMs, comes with a range of risks due to the infinite amount of input combinations, which can elicit an infinite amount of outputs. The unstructured nature of text poses a challenge in the ML observability space - a challenge worth solving, since the lack of visibility on the model's behavior can have serious consequences.

## Features 🛠️

The out of the box metrics include:

- [Text Quality](https://github.com/whylabs/langkit/blob/main/langkit/docs/features/quality.md)

  - readability score

  - complexity and grade scores

- [Text Relevance](https://github.com/whylabs/langkit/blob/main/langkit/docs/features/relevance.md)

  - Similarity scores between prompt/responses

  - Similarity scores against user-defined themes

- [Security and Privacy](https://github.com/whylabs/langkit/blob/main/langkit/docs/features/security.md)

  - patterns - count of strings matching a user-defined regex pattern group

  - jailbreaks - similarity scores with respect to known jailbreak attempts

  - prompt injection - similarity scores with respect to known prompt injection attacks

  - hallucinations - consistency check between responses

  - refusals - similarity scores with respect to known LLM refusal of service responses

- [Sentiment and Toxicity](https://github.com/whylabs/langkit/blob/main/langkit/docs/features/sentiment.md)

  - sentiment analysis

  - toxicity analysis

## Installation 💻

To install LangKit, use the Python Package Index (PyPI) as follows:

```

pip install langkit[all]

```

## Usage 🚀

LangKit modules contain UDFs that automatically wire into the collection of UDFs on String features provided by whylogs by default. All we have to do is import the LangKit modules and then instantiate a custom schema as shown in the example below.

```python

import whylogs as why

from langkit import llm_metrics

results = why.log({"prompt": "Hello!", "response": "World!"}, schema=llm_metrics.init())

```

The code above will produce a set of metrics comprised of the default whylogs metrics for text features and all the metrics defined in the imported modules. This profile can be visualized and monitored in the [WhyLabs platform](https://whylabs.ai/safeguard-large-language-models?utm_source=github&utm_medium=referral&utm_campaign=langkit) or they can be further analyzed by the user on their own accord.

More examples are available [here](https://github.com/whylabs/langkit/tree/main/langkit/examples).

## Modules 📦

You can have more information about the different modules and their metrics [here](https://github.com/whylabs/langkit/blob/main/langkit/docs/modules.md).

## Benchmarks

| AWS Instance Type | Metric Module |     Throughput |

| ----------------- | :-----------: | -------------: |

| c5.xlarge         | Light metrics | 2335 chats/sec |

|                   |  LLM metrics  |  8.2 chats/sec |

|                   |  All metrics  | 0.28 chats/sec |

| g4dn.xlarge       | Light metrics | 2492 chats/sec |

|                   |  LLM metrics  | 23.3 chats/sec |

|                   |  All metrics  | 1.79 chats/sec |

## Frequently Asked Questions

You can check some frequently asked questions on our [FAQs section](https://github.com/whylabs/langkit/blob/main/langkit/docs/faq.md)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/whylabs/langkit

Awesome Lists containing this project

README