The release of the LLaMA base model in February 2023 marked a new era in language modelling. Since then, several fine-tuned LLaMA models have been released, including Alpaca, Vicuna, Koala, GPT4-x-Alpaca, WizardLM, and OpenAssistant. In this blog, we will provide an overview of these models and briefly touch on the tools used to run them.
Model |
Size |
Training Data |
LLaMA (base model) |
7B, 13B, 33B, 65B |
Various |
Alpaca |
7B, 13B |
52k GPT-3 instructions |
Vicuna |
7B, 13B |
70k ChatGPT conversations |
Koala-distill |
7B, 13B |
117k cleaned ChatGPT
conversations |
GPT4-x-Alpaca |
13B |
20k GPT4 instructions |
WizardML |
7B |
70k instructions synthesized
with ChatGPT/GPT-3 |
OpenAssistant LLaMA |
13B, 30B |
600k human interactions (OpenAssistant
Conversations) |
LLaMA, which stands for Large Language Model Meta AI, is an open-source language model released by Meta (Facebook). It is designed to be a general-purpose foundational model suitable for further fine-tuning. LLaMA models come in four different variants, with 7B, 13B, 33B, and 65B parameters. The larger the number of parameters, the more powerful the model, but it also takes up more resources to run.
Unlike GPT, LLaMA is an open-source model that you can download, study, and run locally. The pre-training data used in LLaMA includes English CommonCrawl, C4, Github, Wikipedia, Gutenberg and Books3, ArXiv, and StackExchange. The tokenizer is with byte-pair encoding using SentencePiece. The training data has 1.4T tokens.
LLaMA is a transformer model similar to GPT, but with a few modifications, such as normalizing the input of each transformer sub-layer to improve training stability, using SwiGLU instead of ReLU to improve performance, and using rotary embedding instead of absolute positioning to improve performance.
The table below summarizes the model parameters
Parameters |
Layers |
Attention heads |
Embedding dimension |
6.7B |
32 |
32 |
4,096 |
13B |
40 |
40 |
5,120 |
33B |
60 |
52 |
6,656 |
65B |
80 |
64 |
8,192 |
For reference, GPT-3 has 175B parameters. LLaMA models are small.
LLaMA models have been evaluated with tasks such as common sense reasoning, reading comprehension, and code generation. The performance of larger models is generally better, and more examples in the prompt are better. Smaller models can perform well if trained with enough data. The LLaMA 13B model's performance is similar to GPT-3, despite being 10 times smaller (13B vs. 175B parameters).
LLaMA is not very good at quantitative reasoning, especially the smaller 7B and 13B models. It is not tuned for instruction following like ChatGPT, but the 65B model can follow basic instructions.
In summary, LLaMA is designed to be a base model for further fine-tuning. Its advantages are its small size and performant nature, thanks to extensive training. Open-source accessibility allows for the possibility of running a "local ChatGPT" on a PC. However, the LLaMA base model was not trained to follow instructions, which is saved for later development.
Read more:
We research, curate and publish daily updates from the field of AI. Paid subscription gives you access to paid articles, a platform to build your own generative AI tools, invitations to closed events and open-source tools.
Consider becoming a paying subscriber to get the latest!