The release of the LLaMA base model in February 2023 marked a new era in language modelling. Since then, several fine-tuned LLaMA models have been released, including Alpaca, Vicuna, Koala, GPT4-x-Alpaca, WizardLM, and OpenAssistant. In this blog, we will provide an overview of these models and briefly touch on the tools used to run them.

Model

Size

Training Data

LLaMA (base model)

7B, 13B, 33B, 65B

Various

Alpaca

7B, 13B

52k GPT-3 instructions

Vicuna

7B, 13B

70k ChatGPT conversations

Koala-distill

7B, 13B

117k cleaned ChatGPT conversations

GPT4-x-Alpaca

13B

20k GPT4 instructions

WizardML

7B

70k instructions synthesized with ChatGPT/GPT-3

OpenAssistant LLaMA

13B, 30B

600k human interactions (OpenAssistant Conversations)

LLaMA, which stands for Large Language Model Meta AI, is an open-source language model released by Meta (Facebook). It is designed to be a general-purpose foundational model suitable for further fine-tuning. LLaMA models come in four different variants, with 7B, 13B, 33B, and 65B parameters. The larger the number of parameters, the more powerful the model, but it also takes up more resources to run.

Unlike GPT, LLaMA is an open-source model that you can download, study, and run locally. The pre-training data used in LLaMA includes English CommonCrawl, C4, Github, Wikipedia, Gutenberg and Books3, ArXiv, and StackExchange. The tokenizer is with byte-pair encoding using SentencePiece. The training data has 1.4T tokens.

LLaMA is a transformer model similar to GPT, but with a few modifications, such as normalizing the input of each transformer sub-layer to improve training stability, using SwiGLU instead of ReLU to improve performance, and using rotary embedding instead of absolute positioning to improve performance.

The table below summarizes the model parameters

Parameters

Layers

Attention heads

Embedding dimension

6.7B

32

32

4,096

13B

40

40

5,120

33B

60

52

6,656

65B

80

64

8,192

For reference, GPT-3 has 175B parameters. LLaMA models are small.

LLaMA models have been evaluated with tasks such as common sense reasoning, reading comprehension, and code generation. The performance of larger models is generally better, and more examples in the prompt are better. Smaller models can perform well if trained with enough data. The LLaMA 13B model's performance is similar to GPT-3, despite being 10 times smaller (13B vs. 175B parameters).

LLaMA is not very good at quantitative reasoning, especially the smaller 7B and 13B models. It is not tuned for instruction following like ChatGPT, but the 65B model can follow basic instructions.

In summary, LLaMA is designed to be a base model for further fine-tuning. Its advantages are its small size and performant nature, thanks to extensive training. Open-source accessibility allows for the possibility of running a "local ChatGPT" on a PC. However, the LLaMA base model was not trained to follow instructions, which is saved for later development.

Read more:

A brief history of LLaMA models - AGI Sphere
The LLaMA base model was released in February 2023. Now we have seen a handful of new fine-tuned LLaMA models released.

We research, curate and publish daily updates from the field of AI. Paid subscription gives you access to paid articles, a platform to build your own generative AI tools, invitations to closed events and open-source tools.
Consider becoming a paying subscriber to get the latest!