Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Awesome-LLM-Productization

Awesome-LLM-Productization: a curated list of tools/tricks/news/regulations about AI and Large Language Model (LLM) productization
https://github.com/oscinis-com/Awesome-LLM-Productization

Last synced: 3 days ago
JSON representation

  • Models and Tools

    • Open LLM Models

      • ChatGLM-6B - an open bilingual language model based on General Language Model (GLM) framework, with 6.2 billion parameters. (Note from the repo: a small LM to start with so that you can have a taste on prompting & finetuning. You can use a comemrcial grade graphics card with only 8GB to successfully fine tune it without any other financial commitment. You can use it like it is a BERT.)
      • MiniGPT-4 - Enhancing Vision-language Understanding with Advanced Large Language Models
      • LLaVA - Visual instruction tuning towards large language and vision models with GPT-4 level capabilities
      • VisualGLM-6B - VisualGLM-6B is an open-source, multi-modal dialog language model that supports images, Chinese, and English.
      • OpenLLM Leaderboard - https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard. (Note from the repo: a good place for you to have a list of avaialble open LLMs, be careful about their comercial terms)
      • OpenLLM Leaderboard - https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard. (Note from the repo: a good place for you to have a list of avaialble open LLMs, be careful about their comercial terms)
    • Full LLM Lifecycle

      • EasyLM - EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax. (Note from the repo: here comes the details of [Jax](https://github.com/google/jax) and [Flax](https://github.com/google/flax))
      • Jina - Jina lets you build multimodal AI services and pipelines that communicate via gRPC, HTTP and WebSockets, then scale them up and deploy to production
    • LLM Prompt Management

      • Pezzo - Open-source, developer-first LLMOps platform designed to streamline prompt design, version management, instant delivery, collaboration, troubleshooting, observability and more.
    • LLM Finetuning

      • trl - a full stack library where we provide a set of tools to train transformer language models and stable diffusion models with Reinforcement Learning;
      • P-tuning v2 - An optimized prompt tuning strategy achieving comparable performance to fine-tuning on small/medium-sized models and sequence tagging challenges;
      • QLoRA - An efficient finetuning approach that reduces memory usage (Note from the repo: good for smaller dataset finetuning);
      • LLM QLoRA - Fine-tuning LLMs using QLoRA
      • Prompt2Model - Generate Deployable Models from Instructions
    • Embeddings

      • clip-as-service - a low-latency high-scalability service for embedding images and text. It can be easily integrated as a microservice into neural search solutions (Python based, Apache 2);
      • text-embeddings-inference - a toolkit for deploying and serving open source text embeddings and sequence classification models, enabling high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5 (Rust based; Apache 2);
      • infinity - a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks (Python based, MIT);
    • Vector Store

      • ElasticSearch - a distributed, RESTful search engine optimized for speed and relevance on production-scale workloads (Java based)
      • pgvector - Open-source vector similarity search for Postgres (C based)
      • Weaviate - an open source vector database that stores both objects and vectors (Go based)
      • Milvus - an open-source vector database built to power embedding similarity search and AI applications (Go based)
      • gensim - a Python library for topic modelling, document indexing and similarity retrieval with large corpora (Python based)
      • txtai - All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows (Python based)
      • Qdrant - High-performance, massive-scale Vector Database for the next generation of AI.(Rust Based)
      • Marqo - Vector search for humans based on Opensearch. (Python based)
      • Vald - A Highly Scalable Distributed Vector Search Engine (Go based)
      • - search, recommendation and personalization need to select a subset of data in a large corpus (Java based)
      • OpenSearch - Open source distributed and RESTful search engine (Java based)
      • ChromaDB - open-source embedding database (Python based - in-memory only at the moment)
      • gensim - a Python library for topic modelling, document indexing and similarity retrieval with large corpora (Python based)
    • LLM Deployment

      • Ray Serve - Ray Serve is a scalable model serving library for building online inference APIs (Note from the repo: from the [Ray]() project)
      • OpenLLM from BentoML - an open-source platform designed to facilitate the deployment and operation of large language models (LLMs) in real-world applications.
      • Langfuse - Open source observability and analytics for LLM applications
      • vLLM - A high-throughput and memory-efficient inference and serving engine for LLMs
      • mlc-llm - Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
      • llm-awq - Efficient and accurate low-bit weight quantization (INT3/4) for LLMs, supporting instruction-tuned models and multi-modal LMs.
      • streaming-llm - deploy LLMs for infinite-length inputs without sacrificing efficiency and performance.
      • llama2.c - run LLMs on minimum hardware
      • TensorRT-LLM - an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.
      • text-generation-inference - Large Language Model Text Generation Inference
    • LLM Boilerplate

      • Zep - a fast, scalable building blocks for production LLM apps
      • LlamaGPT - A self-hosted, offline, ChatGPT-like chatbot.
      • Ollama - Get up and running with Llama 2 and other large language models locally
    • LLM Monitoring

      • OpenObserve - OpenObserve is a cloud native observability platform built specifically for logs, metrics, traces and analytics designed to work at petabyte scale.
      • AuditNLG - an open-source library that can help reduce the risks associated with using generative AI systems for language. The library supports three aspects of trust detection and improvement: Factualness, Safety, and Constraint.
    • Use Cases

      • MetaGPT - The Multi-Agent Framework: Given one line Requirement, return PRD, Design, Tasks, Repo;
      • Doctor Dignity - a Large Language Model that can pass the US Medical Licensing Exam
    • General MLOps Tools

      • Awesome MLOps - A curated list of awesome MLOps tools
      • MLflow - A Machine Learning Lifecycle Platform
      • dvc - data and model versioning tool
      • dbt - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
      • ml-ops - Some good acticles on machine learning operations
  • The Survey Paper