Falcon

Falcon LLM is a foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. TII has now released Falcon LLM – a 40B model. The model uses only 75 percent of GPT-3’s training compute, 40 percent of Chinchilla’s, and 80 percent of PaLM-62B’s.

What makes Falcon Unique?
Falcon was built using custom tooling and leverages a unique data pipeline that can extract high-quality content out of web data and use it for training a custom codebase, independent from the works of NVIDIA, Microsoft, or HuggingFace.

A particular focus was put on data quality at scale. LLMs are notoriously sensitive to the quality of their training data, so significant care was taken in building a data pipeline that would both scale to tens of thousands of CPU cores for fast processing, and that would extract high-quality content from the web using extensive filtering and deduplication.

The architecture of Falcon was optimized for performance and efficiency. Combining high-quality data with these optimizations, Falcon significantly outperforms GPT-3 for only 75% of the training compute budget—and requires a fifth of the compute at inference time.

Falcon matches the performance of state-of-the-art LLMs from DeepMind, Google, and Anthropic.

How Falcon was Trained?
Falcon is a 40 billion parameters autoregressive decoder-only model trained on 1 trillion tokens. It was trained on 384 GPUs on AWS over the course of two months.
Pretraining data was collected from public crawls of the web to build the pretraining dataset of Falcon.

Using dumps from CommonCrawl, after significant filtering (to remove machine-generated text and adult content) and deduplication, a pretraining dataset of nearly five trillion tokens was assembled.

To broaden Falcon's abilities, this dataset was then extended with a few curated sources such as research papers and conversations from social media.
Finally, Falcon’s performance was validated against open-source benchmarks such as EAI Harness, HELM, and BigBench.

What can it be used for?

Generate creative text and solve complex problems.
Used in chatbots, customer service operations, virtual assistants, language translation, content generation, and sentiment analysis.
Broad use cases are foreseen by Falcon, with a particular focus on reducing and automating repetitive work.
Falcon will help Emirati companies and startups become more efficient, streamlining internal processes and freeing up time for employees to focus on important tasks.
At an individual level, chatbots embedding Falcon will be able to assist users in their daily lives.

Falcon 40B is Open Sourced Technology Innovation Institute and has publicly released the source code and the model’s weights for research and commercial use. For researchers and developers, this will make Falcon 40B much more accessible and easier to understand the model’s behavior within hours rather than weeks. For use of Falcon 40B, see our TII Falcon LLM License Version 1.0.

To summarise,

Falcon:
💥 outperforms comparable open-source models (e.g., MPT-7B, StableLM, RedPajama, etc.), thanks to being trained on 1,500B tokens of RefinedWeb enhanced with curated corpora.
🏎 uses FlashAttention and multi-query Attention
🔠 has a 2048 context window
💰 comes with a license allowing commercial use, but with limitations. Make sure to check the license‼
🧠 was trained on Amazon SageMaker, on 384 A100 40GB GPUs in P4d instances.
🌍 40B was trained on a multi-lingual dataset, including German, Spanish, French

Models are available on Hugging Face

We research, curate and publish daily updates from the field of AI. Paid subscription gives you access to paid articles, a platform to build your own generative AI tools, invitations to closed events and open-source tools.
Consider becoming a paying subscriber to get the latest!

Subscribe Now