FasterTransformer

Transformer-related optimization, including BERT, GPT
FasterTransformer
abstract photo of highly optimized encoder and decoder component


If you like our work, please consider supporting us so we can keep doing what we do. And as a current subscriber, enjoy this nice discount!

Also: if you haven’t yet, follow us on Twitter, TikTok, or YouTube!


Today we are starting a section on open-source libraries and repositories used in building applications around artificial intelligence. We will try and cover at least one library every week. The posts around the library will be smaller than usual and the GitHub repository link will be provided with the idea to encourage readers to read more about the library at its source. It is also encouraged to use the library by downloading and building products around them.

Today's library: FasterTransformer

License: Apache-2.0 license

About:

There are two important components in NLP: encoders and decoders, with the transformer layer emerging as a popular architecture for both. With FasterTransformer, a highly optimized transformer layer is implemented for both encoders and decoders. The computing power of Tensor Cores is automatically utilized on Volta, Turing, and Ampere GPUs when the precision of the data and weights is FP16.

The FasterTransformer software is built on top of CUDA, cuBLAS, cuBLASLt, and C++. At least one API is provided for the following frameworks: TensorFlow, PyTorch, and Triton backend. It is possible for users to integrate FasterTransformer directly into these frameworks.

Directory Structure:

/src/fastertransformer: source code of FasterTransformer
    |--/models: Implementation of different models, like BERT, GPT.
    |--/layers: Implementation of layer modules, like attention layer, ffn layer.
    |--/kernels: CUDA kernels for different models/layers and operations, like addBiasResiual.
    |--/tensorrt_plugin: encapluate FasterTransformer into TensorRT plugin.
    |--/tf_op: custom Tensorflow OP implementation
    |--/th_op: custom PyTorch OP implementation
    |--/triton_backend: custom triton backend implementation
    |--/utils: Contains common cuda utils, like cublasMMWrapper, memory_utils
/examples: C++, tensorflow and pytorch interface examples
    |--/cpp: C++ interface examples
    |--/pytorch: PyTorch OP examples
    |--/tensorflow: TensorFlow OP examples
    |--tensorrt: TensorRT examples
/docs: Documents to explain the details of implementation of different models, and show the benchmark
/benchmark: Contains the scripts to run the benchmarks of different models
/tests: Unit tests
/templates: Documents to explain how to add a new model/example into FasterTransformer repo

Link:

GitHub - NVIDIA/FasterTransformer: Transformer related optimization, including BERT, GPT
Transformer related optimization, including BERT, GPT - GitHub - NVIDIA/FasterTransformer: Transformer related optimization, including BERT, GPT

Do you like our work?
Consider becoming a paying subscriber to support us!

Signup to stay updated

No spam, no sharing to third party. Only you and me.