Environment variables

Some environment variables can be configured to customize the execution. When using Python, these variables should be set before importing the ctranslate2 module, e.g.:

import os
os.environ["CT2_VERBOSE"] = "1"

import ctranslate2

Note

Boolean environment variables can be enabled with "1" or "true".

CT2_CUDA_ALLOCATOR

Allocating memory on the GPU with cudaMalloc is costly and is best avoided in high-performance code. For this reason CTranslate2 integrates caching allocators which enable a fast reuse of previously allocated buffers. The following allocators are integrated:

CT2_CUDA_ALLOW_BF16

Allow using BF16 computation on GPU even if the device does not have efficient BF16 support.

CT2_CUDA_ALLOW_FP16

Allow using FP16 computation on GPU even if the device does not have efficient FP16 support.

CT2_CUDA_TRUE_FP16_GEMM

Allow using true FP16 computation in GEMM operations. When disabled, the computation or accumulation may use FP32 instead.

This flag is enabled by default, but some models may automatically disable it when they are known to work better with the increased precision.

CT2_CUDA_CACHING_ALLOCATOR_CONFIG

The cub_caching allocator can be configured to tradeoff memory usage and speed. By default, CTranslate2 uses the following values which have been selected experimentally:

  • bin_growth = 4

  • min_bin = 3

  • max_bin = 12

  • max_cached_bytes = 209715200 (200MB)

You can override these parameters with comma-separated values in the same order as the list above:

export CT2_CUDA_CACHING_ALLOCATOR_CONFIG=8,3,7,6291455

See the description of each parameter in the allocator implementation.

CT2_FORCE_CPU_ISA

Force CTranslate2 to select a specific instruction set architecture (ISA). Possible values are:

  • GENERIC

  • AVX

  • AVX2

  • AVX512

Attention

This does not impact backend libraries (such as Intel MKL) which usually have their own environment variables to configure ISA dispatching.

CT2_USE_EXPERIMENTAL_PACKED_GEMM

Enable the packed GEMM API for Intel MKL which can improve performance for single-core decoding. See Intel’s article to learn more about packed GEMM.

CT2_USE_MKL

Force CTranslate2 to use (or not) Intel MKL. By default, the runtime automatically decides whether to use Intel MKL or not based on the CPU vendor.

CT2_VERBOSE

Configure the default logs verbosity:

  • -3 = off

  • -2 = critical

  • -1 = error

  • 0 = warning (default)

  • 1 = info

  • 2 = debug

  • 3 = trace

Tip

The log level can also be controlled by API. See for example the Python function ctranslate2.set_log_level.