Awesome-LLM-Productization
  
  
    Awesome-LLM-Productization: a curated list of tools/tricks/news/regulations about AI and Large Language Model (LLM) productization 
    https://github.com/oscinis-com/Awesome-LLM-Productization
  
        Last synced: 5 days ago 
        JSON representation
    
- 
            
Models and Tools
- 
                    
Open LLM Models
- ChatGLM-6B - an open bilingual language model based on General Language Model (GLM) framework, with 6.2 billion parameters. (Note from the repo: a small LM to start with so that you can have a taste on prompting & finetuning. You can use a comemrcial grade graphics card with only 8GB to successfully fine tune it without any other financial commitment. You can use it like it is a BERT.)
 - MiniGPT-4 - Enhancing Vision-language Understanding with Advanced Large Language Models
 - LLaVA - Visual instruction tuning towards large language and vision models with GPT-4 level capabilities
 - VisualGLM-6B - VisualGLM-6B is an open-source, multi-modal dialog language model that supports images, Chinese, and English.
 - OpenLLM Leaderboard - https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard. (Note from the repo: a good place for you to have a list of avaialble open LLMs, be careful about their comercial terms)
 
 - 
                    
Full LLM Lifecycle
- EasyLM - EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax. (Note from the repo: here comes the details of [Jax](https://github.com/google/jax) and [Flax](https://github.com/google/flax))
 - Jina - Jina lets you build multimodal AI services and pipelines that communicate via gRPC, HTTP and WebSockets, then scale them up and deploy to production
 
 - 
                    
LLM Prompt Management
- Pezzo - Open-source, developer-first LLMOps platform designed to streamline prompt design, version management, instant delivery, collaboration, troubleshooting, observability and more.
 
 - 
                    
LLM Finetuning
- trl - a full stack library where we provide a set of tools to train transformer language models and stable diffusion models with Reinforcement Learning;
 - P-tuning v2 - An optimized prompt tuning strategy achieving comparable performance to fine-tuning on small/medium-sized models and sequence tagging challenges;
 - QLoRA - An efficient finetuning approach that reduces memory usage (Note from the repo: good for smaller dataset finetuning);
 - LLM QLoRA - Fine-tuning LLMs using QLoRA
 - Prompt2Model - Generate Deployable Models from Instructions
 - trl - a full stack library where we provide a set of tools to train transformer language models and stable diffusion models with Reinforcement Learning;
 
 - 
                    
Embeddings
- clip-as-service - a low-latency high-scalability service for embedding images and text. It can be easily integrated as a microservice into neural search solutions (Python based, Apache 2);
 - text-embeddings-inference - a toolkit for deploying and serving open source text embeddings and sequence classification models, enabling high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5 (Rust based; Apache 2);
 - infinity - a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks (Python based, MIT);
 
 - 
                    
Vector Store
- ElasticSearch - a distributed, RESTful search engine optimized for speed and relevance on production-scale workloads (Java based)
 - pgvector - Open-source vector similarity search for Postgres (C based)
 - Weaviate - an open source vector database that stores both objects and vectors (Go based)
 - Milvus - an open-source vector database built to power embedding similarity search and AI applications (Go based)
 - gensim - a Python library for topic modelling, document indexing and similarity retrieval with large corpora (Python based)
 - txtai - All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows (Python based)
 - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI.(Rust Based)
 - Marqo - Vector search for humans based on Opensearch. (Python based)
 - Vald - A Highly Scalable Distributed Vector Search Engine (Go based)
 - - search, recommendation and personalization need to select a subset of data in a large corpus (Java based)
 - OpenSearch - Open source distributed and RESTful search engine (Java based)
 - ChromaDB - open-source embedding database (Python based - in-memory only at the moment)
 - gensim - a Python library for topic modelling, document indexing and similarity retrieval with large corpora (Python based)
 
 - 
                    
LLM Deployment
- Ray Serve - Ray Serve is a scalable model serving library for building online inference APIs (Note from the repo: from the [Ray]() project)
 - OpenLLM from BentoML - an open-source platform designed to facilitate the deployment and operation of large language models (LLMs) in real-world applications.
 - Langfuse - Open source observability and analytics for LLM applications
 - vLLM - A high-throughput and memory-efficient inference and serving engine for LLMs
 - mlc-llm - Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
 - llm-awq - Efficient and accurate low-bit weight quantization (INT3/4) for LLMs, supporting instruction-tuned models and multi-modal LMs.
 - streaming-llm - deploy LLMs for infinite-length inputs without sacrificing efficiency and performance.
 - llama2.c - run LLMs on minimum hardware
 - TensorRT-LLM - an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.
 - text-generation-inference - Large Language Model Text Generation Inference
 
 - 
                    
LLM Monitoring
- AuditNLG - an open-source library that can help reduce the risks associated with using generative AI systems for language. The library supports three aspects of trust detection and improvement: Factualness, Safety, and Constraint.
 - OpenObserve - OpenObserve is a cloud native observability platform built specifically for logs, metrics, traces and analytics designed to work at petabyte scale.
 
 - 
                    
LLM Boilerplate
 - 
                    
Use Cases
- MetaGPT - The Multi-Agent Framework: Given one line Requirement, return PRD, Design, Tasks, Repo;
 - Doctor Dignity - a Large Language Model that can pass the US Medical Licensing Exam
 
 - 
                    
General MLOps Tools
- Awesome MLOps - A curated list of awesome MLOps tools
 - MLflow - A Machine Learning Lifecycle Platform
 - dvc - data and model versioning tool
 - dbt - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
 - ml-ops - Some good acticles on machine learning operations
 
 
 - 
                    
 - 
            
The Survey Paper
 
            Programming Languages
          
          
        
            Categories
          
          
        
            Sub Categories
          
          
        
            Keywords
          
          
              
                llm
                14
              
              
                machine-learning
                9
              
              
                ai
                9
              
              
                search-engine
                7
              
              
                mlops
                6
              
              
                gpt
                6
              
              
                vector-search
                6
              
              
                llama
                5
              
              
                openai
                5
              
              
                nearest-neighbor-search
                5
              
              
                large-language-models
                5
              
              
                llmops
                5
              
              
                deep-learning
                5
              
              
                observability
                4
              
              
                vector-database
                4
              
              
                natural-language-processing
                4
              
              
                language-model
                4
              
              
                python
                4
              
              
                information-retrieval
                4
              
              
                image-search
                4
              
              
                data-science
                4
              
              
                hnsw
                4
              
              
                llama2
                4
              
              
                pytorch
                4
              
              
                semantic-search
                3
              
              
                approximate-nearest-neighbor-search
                3
              
              
                prompt-engineering
                3
              
              
                similarity-search
                3
              
              
                neural-search
                3
              
              
                java
                3
              
              
                ml
                3
              
              
                embeddings
                3
              
              
                analytics
                3
              
              
                llm-serving
                3
              
              
                nlp
                3
              
              
                gpt-4
                3
              
              
                search
                3
              
              
                transformer
                3
              
              
                chatgpt
                3
              
              
                vector-search-engine
                3
              
              
                langchain
                3
              
              
                monitoring
                3
              
              
                elasticsearch
                2
              
              
                transformers
                2
              
              
                serving
                2
              
              
                neural-network
                2
              
              
                tensorflow
                2
              
              
                inference
                2
              
              
                llm-inference
                2
              
              
                self-hosted
                2