Wav2Vec2Bert
- class ctranslate2.models.Wav2Vec2Bert
Implements the Wav2Vec2Bert speech recognition model published by Facebook.
Inherits from:
pybind11_builtins.pybind11_object
Attributes:
Methods:
- __init__(model_path: str, device: str = 'cpu', *, device_index: Union[int, List[int]] = 0, compute_type: Union[str, Dict[str, str]] = 'default', inter_threads: int = 1, intra_threads: int = 0, max_queued_batches: int = 0, flash_attention: bool = False, tensor_parallel: bool = False, files: object = None) None
Initializes a Wav2Vec2Bert model from a converted model.
- Parameters
model_path – Path to the CTranslate2 model directory.
device – Device to use (possible values are: cpu, cuda, auto).
device_index – Device IDs where to place this model on.
compute_type – Model computation type or a dictionary mapping a device name to the computation type (possible values are: default, auto, int8, int8_float32, int8_float16, int8_bfloat16, int16, float16, bfloat16, float32).
inter_threads – Number of workers to allow executing multiple batches in parallel.
intra_threads – Number of OpenMP threads per worker (0 to use a default value).
max_queued_batches – Maximum numbers of batches in the worker queue (-1 for unlimited, 0 for an automatic value). When the queue is full, future requests will block until a free slot is available.
flash_attention – run model with flash attention 2 for self-attention layer
tensor_parallel – run model with tensor parallel mode
files – Load model files from the memory. This argument is a dictionary mapping file names to file contents as file-like or bytes objects. If this is set,
model_path
acts as an identifier for this model.
- encode(features: StorageView, to_cpu: bool = False) StorageView
Encodes the input features.
- Parameters
features – Mel spectogram of the audio, as a float array with shape
[batch_size, 80, 3000]
.to_cpu – Copy the encoder output to the CPU before returning the value.
- Returns
The encoder output.
- load_model(keep_cache: bool = False) None
Loads the model back to the initial device.
- Parameters
keep_cache – If
True
, the model cache in the CPU memory is not deleted if it exists.
- unload_model(to_cpu: bool = False) None
Unloads the model attached to this wav2vec2bert but keep enough runtime context to quickly resume wav2vec2bert on the initial device.
- Parameters
to_cpu – If
True
, the model is moved to the CPU memory and not fully unloaded.
- property compute_type
Computation type used by the model.
- property device
Device this model is running on.
- property device_index
List of device IDs where this model is running on.
- property model_is_loaded
Whether the model is loaded on the initial device and ready to be used.
- property num_active_batches
Number of batches waiting to be processed or currently processed.
- property num_queued_batches
Number of batches waiting to be processed.
- property num_workers
Number of model workers backing this instance.
- property tensor_parallel
Run model with tensor parallel mode.