Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/Barnacle-ai/awesome-llm-list

An overview of Large Language Model (LLM) options
https://github.com/Barnacle-ai/awesome-llm-list
List: awesome-llm-list
Last synced: 4 months ago
JSON representation
An overview of Large Language Model (LLM) options
Host: GitHub
URL: https://github.com/Barnacle-ai/awesome-llm-list
Owner: Barnacle-ai
License: apache-2.0
Created: 2023-04-20T08:09:18.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-02-25T17:51:03.000Z (12 months ago)
Last Synced: 2024-05-23T01:06:26.012Z (9 months ago)
Size: 46.9 KB
Stars: 64
Watchers: 4
Forks: 7
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

ultimate-awesome - awesome-llm-list - An overview of Large Language Model (LLM) options. (Other Lists / Julia Lists)
README

        # Large Language Models (LLMs)

The world of Large Language Models (LLMs) is complex and varied. This resource collates together the things that matter, helping to make sense of this increasing important topic.

## CONTENTS

- [LLM Chat](#LLM-CHAT)

- [Custom GPTs](#CUSTOM-GPTs)

- [Papers](#RESEARCH-PAPERS)

- [Education](#EDUCATION)

- [Benchmarks](#BENCHMARKS)

- [Leaderboards](#LEADERBOARDS)

- [Gen-AI for developers](#GEN-AI-FOR-DEVELOPERS)

- [Inferencing Frameworks](#INFERENCING-FRAMEWORKS)

- [GPT4-V Alternatives](#GPT4V-ALTERNATIVES)

- [Cloud GPUs](#CLOUD-GPUs)

- [Open Source Models](#OPEN-SOURCE-MODELS)

- [Commercial Models](#COMMERCIAL-MODELS)

## LLM CHAT

Everyone knows ChatGPT, but do you know these others?

- [OpenAI ChatGPT](https://chat.openai.com)

- [Google Bard](https://bard.google.com)

- [Anthropic Claude](https://claude.ai/)

- [Inflection Pi](https://pi.ai/)

- [Hugging Chat](https://huggingface.co/chat/)

- [Microsoft Bing](https://www.bing.com/new)

- [You](https://you.com)

- [Perplexity](https://www.perplexity.ai)

- [Chatsonic](https://writesonic.com/chat)

## CUSTOM GPTs

OpenAI's custom GPTs are on fire - checkout what people are developing:

- [Awesome GPT Store](https://github.com/Anil-matcha/Awesome-GPT-Store)

- [Google site search](https://www.google.com/search?client=safari&rls=en&q=site%3Achat.openai.com%2Fg%2F&ie=UTF-8&oe=UTF-8#ip=1)

- [GPT Builders](https://www.skool.com/gpt-builders-9568/about)

- [GPT Index](https://gptsdex.com)

- [GPTs-base](https://gpts-base.com)

## RESEARCH PAPERS

A selection of interesting & noteworthy research papers related to LLMs.

- 2023: [Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4](https://arxiv.org/abs/2312.16171)

- 2023: [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/abs/2305.18290)

- 2023: [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978)

- 2023: [Memory Augmented Large Language Models are Computationally Universal](https://arxiv.org/abs/2301.04589)

- 2023: [S-LoRA: Serving Thousands of Concurrent LoRA Adapters](https://arxiv.org/abs/2311.03285)

- 2023: [A Survey of Large Language Models](https://arxiv.org/abs/2303.18223)

- 2023: [A Comprehensive Overview of Large Language Models](https://arxiv.org/abs/2307.06435)

- 2023: [Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback](https://arxiv.org/abs/2307.15217)

- 2023: [Aligning Large Language Models with Human: A Survey](https://arxiv.org/abs/2307.12966?ref=txt.cohere.com)

- 2023: [ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs](https://arxiv.org/abs/2307.16789v1?ref=txt.cohere.com)

- 2023: [Generative Agents: Interactive Simulacra of Human Behavior](https://arxiv.org/pdf/2304.03442.pdf)

- 2023: [QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314)

- 2022: [Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned](https://arxiv.org/pdf/2209.07858.pdf)

- 2022: [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)

- 2022: [Constitutional AI: Harmlessness from AI Feedback](https://arxiv.org/abs/2212.08073)

- 2022: [What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?](https://arxiv.org/abs/2204.05832)

- 2022: [Training Compute-Optimal Large Language Models](https://doi.org/10.48550/arXiv.2203.15556)

- 2022: [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)

- 2022: [Emergent Abilities of Large Language Models](https://arxiv.org/abs/2206.07682)

- 2022: [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903)

- 2021: [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)

- 2021: [The Power of Scale for Parameter-Efficient Prompt Tuning](https://aclanthology.org/2021.emnlp-main.243/)

- 2021: [On the Opportunities and Risks of Foundation Models](https://arxiv.org/abs/2108.07258)

- 2020: [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)

- 2020: [Scaling Laws for Neural Language Models](https://arxiv.org/abs/2001.08361)

- 2018: [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)

- 2017: [Attention is all you need](https://arxiv.org/abs/1706.03762)

## EDUCATION

Get skilled up with these free and paid-for courses.

- [OpenAI Prompt Engineering Guide](https://platform.openai.com/docs/guides/prompt-engineering)

- [Generative AI for Beginners - Microsoft](https://github.com/microsoft/generative-ai-for-beginners)

- [Prompt Engineering Guide](https://github.com/dair-ai/Prompt-Engineering-Guide)

- [LLM Bootcamp](https://fullstackdeeplearning.com/llm-bootcamp/)

- [Best practices for prompt engineering with OpenAI API](https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api)

- [Lil'Log: Prompt Engineering](https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/)

- [Prompt Engineering Guide](https://learnprompting.org/docs/intro)

- [Cohere LLM University](https://docs.cohere.com/docs/llmu)

- [Deep Learning: ChatGPT Prompt Engineering for Developers](https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/)

- [Deep Learning: Learn the fundamentals of generative AI for real-world applications](https://www.deeplearning.ai/courses/generative-ai-with-llms/)

- [Deep Learning: LangChain for LLM Application Development](https://www.deeplearning.ai/short-courses/langchain-for-llm-application-development/)

- [Princeton: COS 597G - Understanding Large Language Models](https://www.cs.princeton.edu/courses/archive/fall22/cos597G/)

- [Stanford: CS324 - Large Language Models](https://stanford-cs324.github.io/winter2022/)

- [Machine Learning Engineering Online Book](https://github.com/stas00/ml-engineering/tree/master)

- [Large Language Model Course](https://github.com/mlabonne/llm-course)

- [Brex's Prompt Engineering Guide](https://github.com/brexhq/prompt-engineering)

## BENCHMARKS

These various benchmarks are commonly used to compare LLM performance.

- [GAIA](https://arxiv.org/abs/2311.12983)

- [ARC](https://arxiv.org/abs/2305.18354)

- [HellaSwag](https://arxiv.org/abs/1905.07830)

- [MMLU](https://arxiv.org/abs/2009.03300)

- [TruthfulQA](https://arxiv.org/abs/2109.07958)

- [Winogrande](https://winogrande.allenai.org)

- [GSM8K](https://arxiv.org/abs/2110.14168)

- [DROP](https://arxiv.org/abs/1903.00161)

- [IDEFICS](https://huggingface.co/blog/idefics)

## LEADERBOARDS

These leaderboards show how LLMs compare relative to each other.

- [Hallucination Leaderboard](https://github.com/vectara/hallucination-leaderboard)

- [Hugging Face GAIA Leaderboard](https://huggingface.co/spaces/gaia-benchmark/leaderboard)

- [Hugging Face Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)

- [Chatbot Arena Leaderboard](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)

- [AlpacaEval Leaderboard](https://tatsu-lab.github.io/alpaca_eval/)

- [AllenAI CommonGen-Eval](https://inklab.usc.edu/CommonGen/leaderboard.html)

- [OpenCompass](https://opencompass.org.cn/leaderboard-llm)

## GEN-AI FOR DEVELOPERS

Coding assistants and the like can have a major positive impact on development productivity. There's now a burgeoning market of such tools with integration into popular IDEs.

- [Code LLaMA](https://about.fb.com/news/2023/08/code-llama-ai-for-coding/)

- [GitHub Copilot](https://github.com/features/copilot)

- [Replit Code](https://blog.replit.com/ai4all)

- [Amazon CodeWhisperer](https://aws.amazon.com/codewhisperer/)

- [IBM watsonx Code Assistant](https://www.ibm.com/products/watsonx-code-assistant)

- [Tabnine](https://www.tabnine.com)

- [mutable.ai](https://mutable.ai)

- [phind](https://www.phind.com)

## INFERENCING FRAMEWORKS

If you want to host an LLM yourself, you're going to need one of these frameworks.

- [vLLM](https://github.com/vllm-project/vllm)

- [Hugging Face's Text Generation Inference](https://github.com/huggingface/text-generation-inference)

- [CTranslate2](https://github.com/OpenNMT/CTranslate2)

- [OpenLLM](https://github.com/bentoml/OpenLLM)

- [Microsoft's DeepSpeed MII](https://github.com/microsoft/DeepSpeed-MII)

## GPT4V ALTERNATIVES

Turn images into text just like GPT-4V with these models.

- [LLaVA](https://llava-vl.github.io)

- [BakLLaVA](https://github.com/SkunkworksAI/BakLLaVA)

- [CogVLM](https://github.com/THUDM/CogVLM)

- [Fuyu-8B](https://www.adept.ai/blog/fuyu-8b)

- [Qwen-VL](https://github.com/QwenLM/Qwen-VL)

## CLOUD GPUs

Training and inferencing your own model needs GPUs. You can get these on any cloud provider, but there's some specialist ones that are worth considering.

- [RunPod](https://www.runpod.io)

- [LambdaLabs](https://lambdalabs.com)

- [Vast.ai](https://vast.ai)

## OPEN SOURCE MODELS

Open source models are generally understood to be free to use, but some models have restrictive licensing that prohibits commercial use or restricts usage in some way. Be careful to check out the exact license for the model you want to use, making sure you understand exactly what is permissable.

### G [Gemma](https://blog.google/technology/developers/gemma-open-models/)

Parameters: 2B, 7B  

Origin: [Google](https://ai.google.dev/gemma/docs)  

License: [Gemma](https://ai.google.dev/gemma/terms)  

Release date: February 2024  

Paper:  

Commercial use possible: YES  

GitHub: https://huggingface.co/models?search=google/gemma  

Training cost:

### φ [Phi-2](https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/)

Parameters: 2.7B  

Origin: [Microsoft](https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/)  

License: [MIT](https://choosealicense.com/licenses/mit/)  

Release date: December 2023  

Paper:  

Commercial use possible: YES  

GitHub: https://huggingface.co/microsoft/phi-2  

Training cost:

### 🌬️ [DeciLM-7B-Instruct](https://deci.ai/model-zoo/decilm-7b-instruct/)

Parameters: 7B  

Origin: [Deci.ai](https://deci.ai/)  

License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  

Release date: December 2023  

Paper:  

Commercial use possible: YES  

GitHub: https://huggingface.co/Deci/DeciLM-7B-instruct  

Training cost:

### 🌬️ [Mistral 8x7B](https://mistral.ai)

Parameters: 8x7B Mixture of Experts  

Origin: [Mistral](https://mistral.ai/news/mixtral-of-experts/)  

License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  

Release date: December 2023  

Paper: https://arxiv.org/abs/2401.04088  

Commercial use possible: YES  

GitHub: https://huggingface.co/mistralai  

Training cost:  

Comment: Seems to rival GPT-3.5 in benchmarks at a fraction of the size

### 🌬️ [Notus](https://argilla.io/blog/notus7b/)

Parameters: 7B  

Origin: [Argilla](https://argilla.io/), fine-tuneed from Mistral  

License: [MIT](https://choosealicense.com/licenses/mit/)  

Release date: December 2023  

Paper:  

Commercial use possible: No - uses synthetic data from OpenAI GPT models  

GitHub: https://huggingface.co/argilla/notus-7b-v1  

Training cost:  

Comment: Strong perforamnce for small size, uses DPO fine-tuning.

### 🍃 [Zephyr](https://huggingface.co/collections/HuggingFaceH4/zephyr-7b-6538c6d6d5ddd1cbb1744a66)

Parameters: 7B  

Origin: [HuggingFace](https://huggingface.co), fine-tuneed from Mistral  

License: [MIT](https://choosealicense.com/licenses/mit/)  

Release date: November 2023  

Paper:  

Commercial use possible: No - uses synthetic data from OpenAI GPT models  

GitHub: https://huggingface.co/collections/HuggingFaceH4/zephyr-7b-6538c6d6d5ddd1cbb1744a66  

Training cost:  

Comment: Strong perforamnce for small size, uses DPO fine-tuning.

### 🐦‍⬛ [Starling](https://starling.cs.berkeley.edu)

Parameters: 7B  

Origin: [Berkely, based on LLaMA2](https://starling.cs.berkeley.edu)  

License: [LLaMA2 Community License](https://github.com/facebookresearch/llama/blob/main/LICENSE)  

Release date: November 2023  

Paper:  

Commercial use possible: No - uses synthetic data from OpenAI GPT models  

GitHub: https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha  

Training cost:  

Comment: Strong reasoning perforamnce for small size.

### 1️⃣ [Yi](https://01.ai)

Parameters: 7B, 34B  

Origin: [01.AI](https://01.ai)  

License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  

Release date: November 2023  

Paper:  

Commercial use possible: Via [request](https://www.lingyiwanwu.com/yi-license)  

GitHub: https://github.com/01-ai/Yi  

Training cost:  

Comment: Strong performance for small size.

### 🐳 [Orca 2](https://www.microsoft.com/en-us/research/blog/orca-2-teaching-small-language-models-how-to-reason/)

Parameters: 7B, 13B  

Origin: [MS](https://www.microsoft.com/), fine-tuned LLaMA2  

License: [MS Research License](https://huggingface.co/microsoft/Orca-2-7b/blob/main/LICENSE)  

Release date: November 2023  

Paper: https://arxiv.org/abs/2311.11045  

Commercial use possible: NO  

GitHub: 7B: https://huggingface.co/microsoft/Orca-2-7b, 13B: https://huggingface.co/microsoft/Orca-2-13b  

Training cost: Orca 2 trained on 32 NVIDIA A100 GPUs with 80GB memory. For the 13B checkpoint, it took ~17 hours to train Orca 2 on FLAN dataset for one epoch, ~40 hours to train on 5 million ChatGPT data for 3 epochs and ~23 hours to continue training on ~1.8 million GPT-4 data for 4 epochs.

Comment: Strong reasoning abilities for a small model

### 🌬️ [Mistral](https://mistral.ai)

Parameters: 7B  

Origin: [Mistral](https://mistral.ai)  

License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  

Release date: October 2023  

Paper: https://arxiv.org/abs/2310.06825  

Commercial use possible: YES  

GitHub: https://huggingface.co/mistralai  

Training cost:  

Comment: Outperforms LLaMA2 13B

### 📏 [LongChat](https://lmsys.org/blog/2023-06-29-longchat/)

Parameters: 7B  

Origin: [UC Berkeley, CMU, Stanford, and UC San Diego](https://vicuna.lmsys.org/)  

License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  

Release date: August 2023  

Paper:  

Commercial use possible: YES  

GitHub: https://github.com/DachengLi1/LongChat  

Training cost:  

Comment: 32k context length!

### 🏯 [Qwen](https://huggingface.co/Qwen)

Parameters: 7B, 14B, 72B  

Origin: Alibaba  

License: [Tongyi Qianwen](https://github.com/QwenLM/Qwen-7B/blob/main/LICENSE)  

Release date: August 2023  

Paper:  

Commercial use possible: YES  

GitHub: https://github.com/QwenLM/Qwen-7B  

Training cost:

### 🦙 [Vicuna 1.5](https://vicuna.lmsys.org/)

Parameters: 13B  

Origin: [UC Berkeley, CMU, Stanford, and UC San Diego](https://vicuna.lmsys.org/)  

License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  

Release date: August 2023 (v1.5 uses LLaMA2 instead of LLaMA of prior releases)  

Paper:  

Commercial use possible: NO (trained on https://sharegpt.com conversations that potentially breaches OpenAI license)  

GitHub: https://github.com/lm-sys/FastChat

Training cost:

### 🐋 [Stable Beluga](https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models)

Parameters: 7B, 40B  

Origin: Stability AI.  

License: [CC BY-NC-4.0](https://creativecommons.org/licenses/by-nc/4.0/)  

Release date: July 2023  

Paper:  

Commercial use possible: NO  

GitHub: https://huggingface.co/stabilityai/StableBeluga2  

Training cost:

### 🦙 [LLaMA2](https://ai.meta.com/llama/)

Parameters: 7B, 40B  

Origin: Meta.  

License: [Llama 2 Community License Agreement](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)  

Release date: July 2023  

Paper: https://arxiv.org/abs/2307.09288  

Commercial use possible: YES  

GitHub: https://huggingface.co/meta-llama  

Training cost: A cumulative of 3.3M GPU hours of computation was performed on hardware of type A100-80GB (TDP of 400W or 350W). We estimate the total emissions for training to be 539 tCO2eq, of which 100% were directly offset by Meta’s sustainability program.

### 🦅 [Falcon](https://falconllm.tii.ae)

Parameters: 7B, 40B  

Origin: UAE Technology Innovation Institute.  

License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  

Release date: May 2023  

Paper: https://arxiv.org/abs/2311.16867  

Commercial use possible: YES  

GitHub: https://huggingface.co/tiiuae/falcon-7b  

GitHub: https://huggingface.co/tiiuae/falcon-7b-instruct  

GitHub: https://huggingface.co/tiiuae/falcon-40b  

GitHub: https://huggingface.co/tiiuae/falcon-40b-instruct  

Training cost: Falcon-40B was trained on AWS SageMaker, on 384 A100 40GB GPUs in P4d instances.

### 🧩 [MosaicML MPT-30B](https://www.mosaicml.com/blog/mpt-30b)

Parameters: 30B  

Origin: Open souce from MosaicML.  

License (MPT-30B Base, Instruct): [Attribution-ShareAlike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/)  

License (MPT-30B Chat): [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/)  

Release date: June 2023  

Paper:  

Commercial use possible: YES(Base & Instruct), NO(Chat)  

GitHub: Base: https://huggingface.co/mosaicml/mpt-30b  

GitHub: Instruct: https://huggingface.co/mosaicml/mpt-30b-instruct  

GitHub: Chat: https://huggingface.co/mosaicml/mpt-30b-chat  

Training cost: From Scratch: 512xA100-40GB, 28.3 Days, ~ $871,000.  

Traingin cost: Finetune 30B Base model: 16xA100-40GB, 21.8 Hours, $871

### 🧩 [MosaicML MPT-7B](https://www.mosaicml.com/blog/mpt-7b)

Parameters: 7B  

Origin: Open souce from MosaicML. Claimed to be competitive with LLaMA-7B. Base, storywriter, instruct, chat fine-tunings available.  

License (MPT-7B Base): [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  

License (MPT-7B-StoryWriter-65k+): [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  

License (MPT-7B-Instruct): [CC-By-SA-3.0](https://creativecommons.org/licenses/by-sa/3.0/)  

License (MPT-7B-Chat): [CC-By-NC-SA-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)  

Release date: May 2023  

Paper:  

Commercial use possible: YES  

GitHub: https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1  

Training cost: Nearly all of the training budget was spent on the base MPT-7B model, which took ~9.5 days to train on 440xA100-40GB GPUs, and cost ~$200k

### 🦙 [Together RedPajama-INCITE](https://www.together.xyz/blog/redpajama-models-v1)

Parameters: 3B, 7B  

Origin: "Official" version of the Open Source recreation of LLaMA + chat/instruction-tuned versions  

License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  

Release date: May 2023  

Paper:  

Commercial use possible: YES  

GitHub: https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1  

Training cost: The training of the first collection of RedPajama-INCITE models is performed on 3,072 V100 GPUs provided as part of the INCITE compute grant on Summit supercomputer at the Oak Ridge Leadership Computing Facility (OLCF).

### 🦙 [OpenAlpaca](https://github.com/yxuansu/OpenAlpaca)

Parameters: 7B  

Origin: An instruction tuned version of OpenLLaMA  

License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  

Release date: May 2023  

Paper:  

Commercial use possible: YES  

GitHub: https://github.com/yxuansu/OpenAlpaca

### 🦙 [OpenLLaMA](https://github.com/openlm-research/open_llama)

Parameters: 7B  

Origin: A claimed recreation of Meta's LLaMA without the licensing restrictions

License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  

Release date: May 2023  

Paper:  

Commercial use possible: YES  

GitHub: https://github.com/openlm-research/open_llama

### 🐪 [Camel](https://huggingface.co/Writer/camel-5b-hf)

Parameters: 5B, (20B coming)  

Origin: [Writer](https://writer.com)  

License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  

Release date: April 2023  

Paper:  

Commercial use possible: YES  

GitHub: https://github.com/basetenlabs/camel-5b-truss

### 🏛️ [Palmyra](https://huggingface.co/Writer/palmyra-base)

Parameters: 5B, (20B coming)  

Origin: [Writer](https://writer.com)  

License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  

Release date: April 2023  

Paper:  

Commercial use possible: YES  

GitHub:

### 🐎 [StableLM](https://stability.ai/blog/stability-ai-launches-the-first-of-its-stablelm-suite-of-language-models)

Parameters: 3B, 7B, (15B, 65B coming)  

Origin: [Stability.ai](https://stability.ai)  

License: [CC BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)  

Release date: April 2023  

Paper:  

Commercial use possible: YES  

GitHub: https://github.com/Stability-AI/StableLM

### 🧱 [Databricks Dolly 2](https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm)

Parameters: 12B  

Origin: [Databricks](https://www.databricks.com), an instruction tuned version of EleutherAI pythia  

License: [CC BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)  

Release date: April 2023  

Paper:  

Commercial use possible: YES  

GitHub: https://github.com/databrickslabs/dolly  

Training cost: Databricks cite "for thousands of dollars and in a few hours, Dolly 2.0 was built by fine tuning a 12B parameter open-source model (EleutherAI's Pythia) on a human-generated dataset of 15K Q&A pairs". This, of course, is just for the fine-tuning and the cost of training the underlying Pythia model also needs to be taken into account when estimating total training cost.

### 🦙 [Vicuna](https://vicuna.lmsys.org/)

Parameters: 13B  

Origin: [UC Berkeley, CMU, Stanford, and UC San Diego](https://vicuna.lmsys.org/)  

License: Requires access to LlaMA, trained on https://sharegpt.com conversations that potentially breaches OpenAI license  

Release date: April 2023  

Paper:  

Commercial use possible: NO  

GitHub: https://github.com/lm-sys/FastChat

### 🧠 [Cerebras-GPT](https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/)

Parameters: 111M, 256M, 590M, 1.3B, 2.7B, 6.7B, and 13B  

Origin: [Cerebras](https://www.cerebras.net)  

License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  

Release date: March 2023  

Paper: [https://arxiv.org/abs/2304.03208](https://arxiv.org/abs/2304.03208)  

Commercial use possible: YES

### 🦙 [Stanford Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html)

Parameters: 7B  

Origin: [Stanford](https://crfm.stanford.edu/2023/03/13/alpaca.html), based on Meta's LLaMA  

License: Requires access to LlaMA, trained on GPT conversations against OpenAI license  

Release date: March 2023  

Paper:  

Commercial use possible: NO  

GitHub: https://github.com/tatsu-lab/stanford_alpaca  

Training cost: Replicate posted a [blog post](https://replicate.com/blog/replicate-alpaca) where they replicated the Alpaca fine-tuning process. They used 4x A100 80GB GPUs for 1.5 hours. For total training cost, the cost of training the underlying LLaMA model also needs to be taken into account.

### 🔮 [EleutherAI pythia](https://github.com/EleutherAI/pythia)

Parameters: 70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, 12B  

Origin: [EleutherAI](https://www.eleuther.ai)  

License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  

Release date: February 2023  

Paper: https://arxiv.org/pdf/2304.01373.pdf  

Commercial use possible: YES

### 🦙 [LLaMA](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)

Parameters: 7B, 33B, 65B  

Origin: [Meta](https://ai.facebook.com/tools/)  

License: Model weights available for non-commercial use by application to Meta  

Release date: February 2023  

Paper: https://arxiv.org/abs/2302.13971  

Commercial use possible: NO  

Training cost: Meta cite "When training a 65B-parameter model, our code processes around 380 tokens/sec/GPU on 2048 A100 GPU with 80GB of RAM. This means that training over our dataset containing 1.4T tokens takes approximately 21 days... Finally, we estimate that we used 2048 A100-80GB for a period of approximately 5 months to develop our models." However, that cost is for all the different model sizes combined. Separately in the LLaMA paper Meta cite 1,022,362 GPU hours on A100-80GB GPUs.

### 🌸 [Bloom](https://bigscience.huggingface.co/blog/bloom)

Parameters: 176B  

Origin: [BigScience](https://bigscience.huggingface.co)  

License: [BigScience Rail License](https://bigscience.huggingface.co/blog/the-bigscience-rail-license)  

Release date: July 2022  

Paper: https://arxiv.org/abs/2211.05100  

Commercial use possible: YES

### 🌴 [Google PaLM](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html)

Parameters: 540B  

Origin: [Google](https://www.google.com)  

License: Unknown - only announcement of intent to open  

Release date: April 2022  

Paper: https://arxiv.org/abs/2204.02311  

Commercial use possible: Awaiting more information

### 🤖 [GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b)

Parameters: 20B  

Origin: [EleutherAI](https://www.eleuther.ai)  

License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  

Release date: January 2022  

Paper: https://aclanthology.org/2022.bigscience-1.9/  

Commercial use possible: YES  

GitHub: https://github.com/EleutherAI/gpt-neox

### 🤖 [GPT-J](https://huggingface.co/EleutherAI/gpt-j-6b)

Parameters: 6B  

Origin: [EleutherAI](https://www.eleuther.ai)  

License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  

Release date: June 2021  

Paper:  

Commercial use possible: YES

### 🍮 [Google FLAN-T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)

Parameters: 80M, 250M, 780M, 3B, 11B  

Origin: [Google](https://www.google.com)  

License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  

Release date: October 2021  

Paper: https://arxiv.org/pdf/2210.11416.pdf  

Commercial use possible: YES  

GitHub: https://github.com/google-research/t5x

### 🦙 [IBM Dromedary](https://github.com/IBM/Dromedary)

Parameters: 7B, 13B, 33B and 65B  

Origin: [IBM](https://research.ibm.com/artificial-intelligence), based on Meta's LLaMA  

License: GNU General Public License v3.0  

Release date:  

Paper: https://arxiv.org/abs/2305.03047  

Commercial use possible: NO  

GitHub: https://github.com/IBM/Dromedary

## COMMERCIAL MODELS

These commercial models are generally available through some form of usage-based payment model - you use more, you pay more.

### [OpenAI](https://openai.com)

**GPT-4**  

Parameters: undeclared  

Availability: Wait-list https://openai.com/waitlist/gpt-4-api  

Fine-tuning: No fine-tuning yet available or announced.  

Paper: https://arxiv.org/abs/2303.08774  

Pricing: https://openai.com/pricing  

Endpoints: Chat API endpoint, which also serves as a completions endpoint.  

Privacy: Data from API calls not collected or used to train models https://openai.com/policies/api-data-usage-policies

**GPT-3.5**  

Parameters: undeclared (GPT-3 had 175B)  

Availability: GA  

Fine-tuning: Yes, fine-tuning available through APIs.  

Paper: https://arxiv.org/pdf/2005.14165.pdf  

Pricing: https://openai.com/pricing  

Endpoints: A variety of endpoints available, including: chat, embeddings, fine-tuning, moderation, completions.  

Privacy: Data from API calls not collected or used to train models.

**ChatGPT**  

Parameters: undeclared (uses GPT-3.5 model)  

Availability: GA  

Fine-tuning: N/A - consumer web-based solution.  

Paper:  

Pricing: https://openai.com/pricing  

Endpoints: N/A - consumer web-based solution.  

Privacy: Data submitted on the web-based ChatGPT service is collected and used to train models https://openai.com/policies/api-data-usage-policies

### [AI21Labs](https://www.ai21.com)

**Jurassic-2**  

Parameters: undeclared (jurassic-1 had 178B)  

Availability: GA  

Fine-tuning: Yes, fine-tuning available through APIs.  

Paper:  

Pricing: https://www.ai21.com/studio/pricing  

Endpoints: A variety of endpoints available, including: task-specific endpoints including paraphrase, gramtical errors, text improvements, summarisation, text segmentation, contextual answers.  

Privacy:

### [Anthropic](https://www.anthropic.com)

**Claude**  

Parameters: undeclared  

Availability: Waitlist https://www.anthropic.com/product  

Fine-tuning: Not standard, large enterprise may contact via https://www.anthropic.com/earlyaccess to discuss.  

Paper: https://arxiv.org/abs/2204.05862  

Pricing: https://cdn2.assets-servd.host/anthropic-website/production/images/apr-pricing-tokens.pdf  

Endpoints: Completions endpoint.  

Privacy: Data sent to/from is not used to train models unless feedback is given - https://vault.pactsafe.io/s/9f502c93-cb5c-4571-b205-1e479da61794/legal.html#terms

### [Google](https:google.com)

**Google Bard**  

Parameters: 770M  

Availability: Waitlist https://bard.google.com  

Fine-tuning: No  

Paper:  

Pricing:  

Endpoints: Consumer UI only, API via PaLM  

Privacy:

**Google PaLM API**  

Parameters: Upto 540B  

Availability: Announced but not yet available – https://blog.google/technology/ai/ai-developers-google-cloud-workspace/  

Fine-tuning: unknown  

Paper: https://arxiv.org/abs/2204.02311  

Pricing: unknown  

Endpoints: unknown  

Privacy: unknown

### [Amazon](https://aws.amazon.com)

**Amazon Titan**  

Parameters: unknown  

Availability: Announced but not yet available – https://aws.amazon.com/bedrock/titan/ai-developers-google-cloud-workspace/  

Fine-tuning: unknown  

Paper:  

Pricing: unknown  

Endpoints: unknown  

Privacy: unknown

### [Cohere](https://cohere.ai)

**Cohere**  

Parameters: 52B  

Availability: GA  

Fine-tuning:  

Paper:  

Pricing: https://cohere.com/pricing  

Endpoints: A variety of endpoints including embedding, text completion, classification, summarisation, tokensisation, language detection.  

Privacy: Data submitted is used to train models - https://cohere.com/terms-of-use

### [IBM](https://www.ibm.com/products/watsonx-ai)

**Granite**  

Parameters: 13B, 20B  

Availability: GA (granite.13b - .instruct, .chat; granite.20b.code - .ansible, .cobol; other variants on roadmap)  

Fine-tuning: Currently Prompt Tuning via APIs  

Paper: [https://www.ibm.com/downloads/cas/X9W4O6BM](https://www.ibm.com/downloads/cas/X9W4O6BM)  

Pricing: [https://www.ibm.com/products/watsonx-ai/pricing](https://www.ibm.com/products/watsonx-ai/pricing)  

Endpoints: Various endpoints - Q&A; Generate; Extract; Summarise; Classify  

Privacy: IBM curated 6.48 TB of data before pre-processing, 2.07 TB after pre-processing. 1T tokens generated from a total of 14 datasets. Detail in [paper](https://www.ibm.com/downloads/cas/X9W4O6BM). Prompt data is not saved by IBM for other training purposes. Users have complete control in their storage area of any saved prompts, prompt sessions, or prompt tuned models.  

Training cost: granite.13b trained on 256 A100 for 1056 GPU hours.  

Legal: IBM indemnifies customer use of these models on the watsonx platform