Some environment variables can be configured to customize the execution. When using Python, these variables should be set before importing the
ctranslate2 module, e.g.:
import os os.environ["CT2_VERBOSE"] = "1" import ctranslate2
Boolean environment variables can be enabled with
Allocating memory on the GPU with
cudaMalloc is costly and is best avoided in high-performance code. For this reason CTranslate2 integrates caching allocators which enable a fast reuse of previously allocated buffers. The following allocators are integrated:
Allow using FP16 computation on GPU even if the device does not have efficient FP16 support.
cub_caching allocator can be configured to tradeoff memory usage and speed. By default, CTranslate2 uses the following values which have been selected experimentally:
bin_growth = 4
min_bin = 3
max_bin = 12
max_cached_bytes = 209715200(200MB)
You can override these parameters with comma-separated values in the same order as the list above:
See the description of each parameter in the allocator implementation.
Force CTranslate2 to select a specific instruction set architecture (ISA). Possible values are:
This does not impact backend libraries (such as Intel MKL) which usually have their own environment variables to configure ISA dispatching.
If set to a non negative value, parallel translators are pinned to CPU cores in the range
[offset, offset + inter_threads].
Requires compiling with
Enable the packed GEMM API for Intel MKL which can improve performance for single-core decoding. See Intel’s article to learn more about packed GEMM.
Force CTranslate2 to use (or not) Intel MKL. By default, the runtime automatically decides whether to use Intel MKL or not based on the CPU vendor.
Configure the logs verbosity:
-3 = off
-2 = critical
-1 = error
0 = warning (default)
1 = info
2 = debug
3 = trace