Awesome-LLM-Productization

Awesome-LLM-Productization: a curated list of tools/tricks/news/regulations about AI and Large Language Model (LLM) productization
https://github.com/oscinis-com/Awesome-LLM-Productization

Last synced: about 17 hours ago
JSON representation

Models and Tools
- Open LLM Models
  - ChatGLM-6B - an open bilingual language model based on General Language Model (GLM) framework, with 6.2 billion parameters. (Note from the repo: a small LM to start with so that you can have a taste on prompting & finetuning. You can use a comemrcial grade graphics card with only 8GB to successfully fine tune it without any other financial commitment. You can use it like it is a BERT.)
  - MiniGPT-4 - Enhancing Vision-language Understanding with Advanced Large Language Models
  - LLaVA - Visual instruction tuning towards large language and vision models with GPT-4 level capabilities
  - VisualGLM-6B - VisualGLM-6B is an open-source, multi-modal dialog language model that supports images, Chinese, and English.
  - OpenLLM Leaderboard - https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard. (Note from the repo: a good place for you to have a list of avaialble open LLMs, be careful about their comercial terms)
- Full LLM Lifecycle
  - EasyLM - EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax. (Note from the repo: here comes the details of [Jax](https://github.com/google/jax) and [Flax](https://github.com/google/flax))
  - Jina - Jina lets you build multimodal AI services and pipelines that communicate via gRPC, HTTP and WebSockets, then scale them up and deploy to production
- LLM Prompt Management
  - Pezzo - Open-source, developer-first LLMOps platform designed to streamline prompt design, version management, instant delivery, collaboration, troubleshooting, observability and more.
- LLM Finetuning
  - trl - a full stack library where we provide a set of tools to train transformer language models and stable diffusion models with Reinforcement Learning;
  - P-tuning v2 - An optimized prompt tuning strategy achieving comparable performance to fine-tuning on small/medium-sized models and sequence tagging challenges;
  - QLoRA - An efficient finetuning approach that reduces memory usage (Note from the repo: good for smaller dataset finetuning);
  - LLM QLoRA - Fine-tuning LLMs using QLoRA
  - Prompt2Model - Generate Deployable Models from Instructions
  - trl - a full stack library where we provide a set of tools to train transformer language models and stable diffusion models with Reinforcement Learning;
- Embeddings
  - clip-as-service - a low-latency high-scalability service for embedding images and text. It can be easily integrated as a microservice into neural search solutions (Python based, Apache 2);
  - text-embeddings-inference - a toolkit for deploying and serving open source text embeddings and sequence classification models, enabling high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5 (Rust based; Apache 2);
  - infinity - a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks (Python based, MIT);
- Vector Store
  - ElasticSearch - a distributed, RESTful search engine optimized for speed and relevance on production-scale workloads (Java based)
  - pgvector - Open-source vector similarity search for Postgres (C based)
  - Weaviate - an open source vector database that stores both objects and vectors (Go based)
  - Milvus - an open-source vector database built to power embedding similarity search and AI applications (Go based)
  - gensim - a Python library for topic modelling, document indexing and similarity retrieval with large corpora (Python based)
  - txtai - All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows (Python based)
  - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI.(Rust Based)
  - Marqo - Vector search for humans based on Opensearch. (Python based)
  - Vald - A Highly Scalable Distributed Vector Search Engine (Go based)
  - - search, recommendation and personalization need to select a subset of data in a large corpus (Java based)
  - OpenSearch - Open source distributed and RESTful search engine (Java based)
  - ChromaDB - open-source embedding database (Python based - in-memory only at the moment)
  - gensim - a Python library for topic modelling, document indexing and similarity retrieval with large corpora (Python based)
- LLM Deployment
  - Ray Serve - Ray Serve is a scalable model serving library for building online inference APIs (Note from the repo: from the [Ray]() project)
  - OpenLLM from BentoML - an open-source platform designed to facilitate the deployment and operation of large language models (LLMs) in real-world applications.
  - Langfuse - Open source observability and analytics for LLM applications
  - vLLM - A high-throughput and memory-efficient inference and serving engine for LLMs
  - mlc-llm - Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
  - llm-awq - Efficient and accurate low-bit weight quantization (INT3/4) for LLMs, supporting instruction-tuned models and multi-modal LMs.
  - streaming-llm - deploy LLMs for infinite-length inputs without sacrificing efficiency and performance.
  - llama2.c - run LLMs on minimum hardware
  - TensorRT-LLM - an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.
  - text-generation-inference - Large Language Model Text Generation Inference
- LLM Monitoring
  - AuditNLG - an open-source library that can help reduce the risks associated with using generative AI systems for language. The library supports three aspects of trust detection and improvement: Factualness, Safety, and Constraint.
  - OpenObserve - OpenObserve is a cloud native observability platform built specifically for logs, metrics, traces and analytics designed to work at petabyte scale.
- LLM Boilerplate
  - Zep - a fast, scalable building blocks for production LLM apps
  - LlamaGPT - A self-hosted, offline, ChatGPT-like chatbot.
  - Ollama - Get up and running with Llama 2 and other large language models locally
- Use Cases
  - MetaGPT - The Multi-Agent Framework: Given one line Requirement, return PRD, Design, Tasks, Repo;
  - Doctor Dignity - a Large Language Model that can pass the US Medical Licensing Exam
- General MLOps Tools
  - Awesome MLOps - A curated list of awesome MLOps tools
  - MLflow - A Machine Learning Lifecycle Platform
  - dvc - data and model versioning tool
  - dbt - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
  - ml-ops - Some good acticles on machine learning operations
The Survey Paper
- Anti-hype LLM reading list

Programming Languages

Python 27 Rust 4 Go 3 Java 3 TypeScript 3 C 2 Jupyter Notebook 2 C++ 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

Awesome-LLM-Productization

Models and Tools

Open LLM Models

Full LLM Lifecycle

LLM Prompt Management

LLM Finetuning

Embeddings

Vector Store

LLM Deployment

LLM Monitoring

LLM Boilerplate

Use Cases

General MLOps Tools

The Survey Paper