LLaMA Models

The release of the LLaMA base model in February 2023 marked a new era in language modelling. Since then, several fine-tuned LLaMA models have been released, including Alpaca, Vicuna, Koala, GPT4-x-Alpaca, WizardLM, and OpenAssistant. In this blog, we will provide an overview of these models and briefly touch on the tools used to run them.

Model	Size	Training Data
LLaMA (base model)	7B, 13B, 33B, 65B	Various
Alpaca	7B, 13B	52k GPT-3 instructions
Vicuna	7B, 13B	70k ChatGPT conversations
Koala-distill	7B, 13B	117k cleaned ChatGPT conversations
GPT4-x-Alpaca	13B	20k GPT4 instructions
WizardML	7B	70k instructions synthesized with ChatGPT/GPT-3
OpenAssistant LLaMA	13B, 30B	600k human interactions (OpenAssistant Conversations)

LLaMA, which stands for Large Language Model Meta AI, is an open-source language model released by Meta (Facebook). It is designed to be a general-purpose foundational model suitable for further fine-tuning. LLaMA models come in four different variants, with 7B, 13B, 33B, and 65B parameters. The larger the number of parameters, the more powerful the model, but it also takes up more resources to run.

Unlike GPT, LLaMA is an open-source model that you can download, study, and run locally. The pre-training data used in LLaMA includes English CommonCrawl, C4, Github, Wikipedia, Gutenberg and Books3, ArXiv, and StackExchange. The tokenizer is with byte-pair encoding using SentencePiece. The training data has 1.4T tokens.

LLaMA is a transformer model similar to GPT, but with a few modifications, such as normalizing the input of each transformer sub-layer to improve training stability, using SwiGLU instead of ReLU to improve performance, and using rotary embedding instead of absolute positioning to improve performance.

The table below summarizes the model parameters

Parameters	Layers	Attention heads	Embedding dimension
6.7B	32	32	4,096
13B	40	40	5,120
33B	60	52	6,656
65B	80	64	8,192

For reference, GPT-3 has 175B parameters. LLaMA models are small.

LLaMA models have been evaluated with tasks such as common sense reasoning, reading comprehension, and code generation. The performance of larger models is generally better, and more examples in the prompt are better. Smaller models can perform well if trained with enough data. The LLaMA 13B model's performance is similar to GPT-3, despite being 10 times smaller (13B vs. 175B parameters).

LLaMA is not very good at quantitative reasoning, especially the smaller 7B and 13B models. It is not tuned for instruction following like ChatGPT, but the 65B model can follow basic instructions.

In summary, LLaMA is designed to be a base model for further fine-tuning. Its advantages are its small size and performant nature, thanks to extensive training. Open-source accessibility allows for the possibility of running a "local ChatGPT" on a PC. However, the LLaMA base model was not trained to follow instructions, which is saved for later development.

Read more:

We research, curate and publish daily updates from the field of AI. Paid subscription gives you access to paid articles, a platform to build your own generative AI tools, invitations to closed events and open-source tools.
Consider becoming a paying subscriber to get the latest!

Subscribe Now