Awesome-LLM-Productization

Awesome-LLM-Productization: a curated list of tools/tricks/news/regulations about AI and Large Language Model (LLM) productization
https://github.com/oscinis-com/Awesome-LLM-Productization

Last synced: 3 days ago
JSON representation

Models and Tools
- General MLOps Tools
  - ml-ops - Some good acticles on machine learning operations
  - dbt - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
  - Awesome MLOps - A curated list of awesome MLOps tools
  - MLflow - A Machine Learning Lifecycle Platform
  - dvc - data and model versioning tool
- Open LLM Models
  - OpenLLM Leaderboard - https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard. (Note from the repo: a good place for you to have a list of avaialble open LLMs, be careful about their comercial terms)
  - LLaVA - Visual instruction tuning towards large language and vision models with GPT-4 level capabilities
  - ChatGLM-6B - an open bilingual language model based on General Language Model (GLM) framework, with 6.2 billion parameters. (Note from the repo: a small LM to start with so that you can have a taste on prompting & finetuning. You can use a comemrcial grade graphics card with only 8GB to successfully fine tune it without any other financial commitment. You can use it like it is a BERT.)
  - MiniGPT-4 - Enhancing Vision-language Understanding with Advanced Large Language Models
  - VisualGLM-6B - VisualGLM-6B is an open-source, multi-modal dialog language model that supports images, Chinese, and English.
- LLM Finetuning
  - Prompt2Model - Generate Deployable Models from Instructions
  - trl - a full stack library where we provide a set of tools to train transformer language models and stable diffusion models with Reinforcement Learning;
  - P-tuning v2 - An optimized prompt tuning strategy achieving comparable performance to fine-tuning on small/medium-sized models and sequence tagging challenges;
  - QLoRA - An efficient finetuning approach that reduces memory usage (Note from the repo: good for smaller dataset finetuning);
  - LLM QLoRA - Fine-tuning LLMs using QLoRA
  - trl - a full stack library where we provide a set of tools to train transformer language models and stable diffusion models with Reinforcement Learning;
- LLM Boilerplate
  - LlamaGPT - A self-hosted, offline, ChatGPT-like chatbot.
  - Zep - a fast, scalable building blocks for production LLM apps
  - Ollama - Get up and running with Llama 2 and other large language models locally
- LLM Deployment
  - mlc-llm - Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
  - text-generation-inference - Large Language Model Text Generation Inference
  - Ray Serve - Ray Serve is a scalable model serving library for building online inference APIs (Note from the repo: from the [Ray]() project)
  - OpenLLM from BentoML - an open-source platform designed to facilitate the deployment and operation of large language models (LLMs) in real-world applications.
  - Langfuse - Open source observability and analytics for LLM applications
  - vLLM - A high-throughput and memory-efficient inference and serving engine for LLMs
  - llm-awq - Efficient and accurate low-bit weight quantization (INT3/4) for LLMs, supporting instruction-tuned models and multi-modal LMs.
  - streaming-llm - deploy LLMs for infinite-length inputs without sacrificing efficiency and performance.
  - llama2.c - run LLMs on minimum hardware
  - TensorRT-LLM - an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.
- Full LLM Lifecycle
  - EasyLM - EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax. (Note from the repo: here comes the details of [Jax](https://github.com/google/jax) and [Flax](https://github.com/google/flax))
  - Jina - Jina lets you build multimodal AI services and pipelines that communicate via gRPC, HTTP and WebSockets, then scale them up and deploy to production
- LLM Prompt Management
  - Pezzo - Open-source, developer-first LLMOps platform designed to streamline prompt design, version management, instant delivery, collaboration, troubleshooting, observability and more.
- Embeddings
  - clip-as-service - a low-latency high-scalability service for embedding images and text. It can be easily integrated as a microservice into neural search solutions (Python based, Apache 2);
  - text-embeddings-inference - a toolkit for deploying and serving open source text embeddings and sequence classification models, enabling high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5 (Rust based; Apache 2);
  - infinity - a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks (Python based, MIT);
- Vector Store
  - ElasticSearch - a distributed, RESTful search engine optimized for speed and relevance on production-scale workloads (Java based)
  - pgvector - Open-source vector similarity search for Postgres (C based)
  - Weaviate - an open source vector database that stores both objects and vectors (Go based)
  - Milvus - an open-source vector database built to power embedding similarity search and AI applications (Go based)
  - gensim - a Python library for topic modelling, document indexing and similarity retrieval with large corpora (Python based)
  - txtai - All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows (Python based)
  - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI.(Rust Based)
  - Marqo - Vector search for humans based on Opensearch. (Python based)
  - Vald - A Highly Scalable Distributed Vector Search Engine (Go based)
  - - search, recommendation and personalization need to select a subset of data in a large corpus (Java based)
  - OpenSearch - Open source distributed and RESTful search engine (Java based)
  - ChromaDB - open-source embedding database (Python based - in-memory only at the moment)
  - gensim - a Python library for topic modelling, document indexing and similarity retrieval with large corpora (Python based)
- LLM Monitoring
  - OpenObserve - OpenObserve is a cloud native observability platform built specifically for logs, metrics, traces and analytics designed to work at petabyte scale.
  - AuditNLG - an open-source library that can help reduce the risks associated with using generative AI systems for language. The library supports three aspects of trust detection and improvement: Factualness, Safety, and Constraint.
- Use Cases
  - MetaGPT - The Multi-Agent Framework: Given one line Requirement, return PRD, Design, Tasks, Repo;
  - Doctor Dignity - a Large Language Model that can pass the US Medical Licensing Exam
The Survey Paper
- Anti-hype LLM reading list

Programming Languages

Python 27 Rust 4 Go 3 Java 3 TypeScript 3 C 2 Jupyter Notebook 2 C++ 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

Awesome-LLM-Productization

Models and Tools

General MLOps Tools

Open LLM Models

LLM Finetuning

LLM Boilerplate

LLM Deployment

Full LLM Lifecycle

LLM Prompt Management

Embeddings

Vector Store

LLM Monitoring

Use Cases

The Survey Paper