Translator
- class ctranslate2.Translator
A text translator.
Example
>>> translator = ctranslate2.Translator("model/", device="cpu") >>> translator.translate_batch([["▁Hello", "▁world", "!"]])
Inherits from:
pybind11_builtins.pybind11_object
Attributes:
Methods:
- __init__(model_path: str, device: str = 'cpu', *, device_index: Union[int, List[int]] = 0, compute_type: Union[str, Dict[str, str]] = 'default', inter_threads: int = 1, intra_threads: int = 0, max_queued_batches: int = 0, flash_attention: bool = False, tensor_parallel: bool = False, files: object = None) None
Initializes the translator.
- Parameters
model_path – Path to the CTranslate2 model directory.
device – Device to use (possible values are: cpu, cuda, auto).
device_index – Device IDs where to place this generator on.
compute_type – Model computation type or a dictionary mapping a device name to the computation type (possible values are: default, auto, int8, int8_float32, int8_float16, int8_bfloat16, int16, float16, bfloat16, float32).
inter_threads – Maximum number of parallel translations.
intra_threads – Number of OpenMP threads per translator (0 to use a default value).
max_queued_batches – Maximum numbers of batches in the queue (-1 for unlimited, 0 for an automatic value). When the queue is full, future requests will block until a free slot is available.
flash_attention – run model with flash attention 2 for self-attention layer
tensor_parallel – run model with tensor parallel mode
files – Load model files from the memory. This argument is a dictionary mapping file names to file contents as file-like or bytes objects. If this is set,
model_path
acts as an identifier for this model.
- generate_tokens(source: List[str], target_prefix: Optional[List[str]] = None, *, max_decoding_length: int = 256, min_decoding_length: int = 1, sampling_topk: int = 1, sampling_topp: float = 1, sampling_temperature: float = 1, return_log_prob: bool = False, repetition_penalty: float = 1, no_repeat_ngram_size: int = 0, disable_unk: bool = False, suppress_sequences: Optional[List[List[str]]] = None, end_token: Optional[Union[str, List[str], List[int]]] = None, max_input_length: int = 1024, use_vmap: bool = False) Iterable[GenerationStepResult]
Yields tokens as they are generated by the model.
- Parameters
source – Source tokens.
target_prefix – Optional target prefix tokens.
max_decoding_length – Maximum prediction length.
min_decoding_length – Minimum prediction length.
sampling_topk – Randomly sample predictions from the top K candidates.
sampling_topp – Keep the most probable tokens whose cumulative probability exceeds this value.
sampling_temperature – Sampling temperature to generate more random samples.
return_log_prob – Include the token log probability in the result.
repetition_penalty – Penalty applied to the score of previously generated tokens (set > 1 to penalize).
no_repeat_ngram_size – Prevent repetitions of ngrams with this size (set 0 to disable).
disable_unk – Disable the generation of the unknown token.
suppress_sequences – Disable the generation of some sequences of tokens.
end_token – Stop the decoding on one of these tokens (defaults to the model EOS token).
max_input_length – Truncate inputs after this many tokens (set 0 to disable).
use_vmap – Use the vocabulary mapping file saved in this model
- Returns
A generator iterator over
ctranslate2.GenerationStepResult
instances.
Note
This generation method is not compatible with beam search which requires a complete decoding.
- load_model(keep_cache: bool = False) None
Loads the model back to the initial device.
- Parameters
keep_cache – If
True
, the model cache in the CPU memory is not deleted if it exists.
- score_batch(source: List[List[str]], target: List[List[str]], *, max_batch_size: int = 0, batch_type: str = 'examples', max_input_length: int = 1024, offset: int = 0, asynchronous: bool = False) Union[List[ScoringResult], List[AsyncScoringResult]]
Scores a batch of parallel tokens.
- Parameters
source – Batch of source tokens.
target – Batch of target tokens.
max_batch_size – The maximum batch size. If the number of inputs is greater than
max_batch_size
, the inputs are sorted by length and split by chunks ofmax_batch_size
examples so that the number of padding positions is minimized.batch_type – Whether
max_batch_size
is the number of “examples” or “tokens”.max_input_length – Truncate inputs after this many tokens (0 to disable).
offset – Ignore the first n tokens in target in score calculation.
asynchronous – Run the scoring asynchronously.
- Returns
A list of scoring results.
- score_file(source_path: str, target_path: str, output_path: str, *, max_batch_size: int = 32, read_batch_size: int = 0, batch_type: str = 'examples', max_input_length: int = 1024, offset: int = 0, with_tokens_score: bool = False, source_tokenize_fn: Callable[[str], List[str]] = None, target_tokenize_fn: Callable[[str], List[str]] = None, target_detokenize_fn: Callable[[List[str]], str] = None) ExecutionStats
Scores a parallel tokenized text file.
Each line in
output_path
will have the format:<score> ||| <target> [||| <score_token_0> <score_token_1> ...]
The score is normalized by the target length which includes the end of sentence token
</s>
.- Parameters
source_path – Path to the source file.
target_path – Path to the target file.
output_path – Path to the output file.
max_batch_size – The maximum batch size.
read_batch_size – The number of examples to read from the file before sorting by length and splitting by chunks of
max_batch_size
examples (set 0 for an automatic value).batch_type – Whether
max_batch_size
andread_batch_size
are the number of “examples” or “tokens”.max_input_length – Truncate inputs after this many tokens (0 to disable).
offset – Ignore the first n tokens in target in score calculation.
with_tokens_score – Include the token-level scores in the output file.
source_tokenize_fn – Function to tokenize source lines.
target_tokenize_fn – Function to tokenize target lines.
target_detokenize_fn – Function to detokenize target outputs.
- Returns
A statistics object.
- score_iterable(source: Iterable[List[str]], target: Iterable[List[str]], max_batch_size: int = 64, batch_type: str = 'examples', **kwargs) Iterable[ScoringResult]
Scores an iterable of tokenized examples.
This method is built on top of
ctranslate2.Translator.score_batch()
to efficiently score an arbitrarily large stream of data. It enables the following optimizations:stream processing (the iterable is not fully materialized in memory)
parallel scoring (if the translator has multiple workers)
asynchronous batch prefetching
local sorting by length
- Parameters
source – An iterable of tokenized source examples.
target – An iterable of tokenized target examples.
max_batch_size – The maximum batch size.
batch_type – Whether
max_batch_size
is the number of “examples” or “tokens”.**kwargs – Any scoring options accepted by
ctranslate2.Translator.score_batch()
.
- Returns
A generator iterator over
ctranslate2.ScoringResult
instances.
- translate_batch(source: List[List[str]], target_prefix: Optional[List[Optional[List[str]]]] = None, *, max_batch_size: int = 0, batch_type: str = 'examples', asynchronous: bool = False, beam_size: int = 2, patience: float = 1, num_hypotheses: int = 1, length_penalty: float = 1, coverage_penalty: float = 0, repetition_penalty: float = 1, no_repeat_ngram_size: int = 0, disable_unk: bool = False, suppress_sequences: Optional[List[List[str]]] = None, end_token: Optional[Union[str, List[str], List[int]]] = None, return_end_token: bool = False, prefix_bias_beta: float = 0, max_input_length: int = 1024, max_decoding_length: int = 256, min_decoding_length: int = 1, use_vmap: bool = False, return_scores: bool = False, return_logits_vocab: bool = False, return_attention: bool = False, return_alternatives: bool = False, min_alternative_expansion_prob: float = 0, sampling_topk: int = 1, sampling_topp: float = 1, sampling_temperature: float = 1, replace_unknowns: bool = False, callback: Callable[[GenerationStepResult], bool] = None) Union[List[TranslationResult], List[AsyncTranslationResult]]
Translates a batch of tokens.
- Parameters
source – Batch of source tokens.
target_prefix – Optional batch of target prefix tokens.
max_batch_size – The maximum batch size. If the number of inputs is greater than
max_batch_size
, the inputs are sorted by length and split by chunks ofmax_batch_size
examples so that the number of padding positions is minimized.batch_type – Whether
max_batch_size
is the number of “examples” or “tokens”.asynchronous – Run the translation asynchronously.
beam_size – Beam size (1 for greedy search).
patience – Beam search patience factor, as described in https://arxiv.org/abs/2204.05424. The decoding will continue until beam_size*patience hypotheses are finished.
num_hypotheses – Number of hypotheses to return.
length_penalty – Exponential penalty applied to the length during beam search.
coverage_penalty – Coverage penalty weight applied during beam search.
repetition_penalty – Penalty applied to the score of previously generated tokens (set > 1 to penalize).
no_repeat_ngram_size – Prevent repetitions of ngrams with this size (set 0 to disable).
disable_unk – Disable the generation of the unknown token.
suppress_sequences – Disable the generation of some sequences of tokens.
end_token – Stop the decoding on one of these tokens (defaults to the model EOS token).
return_end_token – Include the end token in the results.
prefix_bias_beta – Parameter for biasing translations towards given prefix.
max_input_length – Truncate inputs after this many tokens (set 0 to disable).
max_decoding_length – Maximum prediction length.
min_decoding_length – Minimum prediction length.
use_vmap – Use the vocabulary mapping file saved in this model
return_scores – Include the scores in the output.
return_logits_vocab – Include the log probs of each token in the output
return_attention – Include the attention vectors in the output.
return_alternatives – Return alternatives at the first unconstrained decoding position.
min_alternative_expansion_prob – Minimum initial probability to expand an alternative.
sampling_topk – Randomly sample predictions from the top K candidates.
sampling_topp – Keep the most probable tokens whose cumulative probability exceeds this value.
sampling_temperature – Sampling temperature to generate more random samples.
replace_unknowns – Replace unknown target tokens by the source token with the highest attention.
callback – Optional function that is called for each generated token when
beam_size
is 1. If the callback function returnsTrue
, the decoding will stop for this batch.
- Returns
A list of translation results.
See also
TranslationOptions structure in the C++ library.
- translate_file(source_path: str, output_path: str, target_path: Optional[str] = None, *, max_batch_size: int = 32, read_batch_size: int = 0, batch_type: str = 'examples', beam_size: int = 2, patience: float = 1, num_hypotheses: int = 1, length_penalty: float = 1, coverage_penalty: float = 0, repetition_penalty: float = 1, no_repeat_ngram_size: int = 0, disable_unk: bool = False, suppress_sequences: Optional[List[List[str]]] = None, end_token: Optional[Union[str, List[str], List[int]]] = None, prefix_bias_beta: float = 0, max_input_length: int = 1024, max_decoding_length: int = 256, min_decoding_length: int = 1, use_vmap: bool = False, with_scores: bool = False, sampling_topk: int = 1, sampling_topp: float = 1, sampling_temperature: float = 1, replace_unknowns: bool = False, source_tokenize_fn: Callable[[str], List[str]] = None, target_tokenize_fn: Callable[[str], List[str]] = None, target_detokenize_fn: Callable[[List[str]], str] = None) ExecutionStats
Translates a tokenized text file.
- Parameters
source_path – Path to the source file.
output_path – Path to the output file.
target_path – Path to the target prefix file.
max_batch_size – The maximum batch size.
read_batch_size – The number of examples to read from the file before sorting by length and splitting by chunks of
max_batch_size
examples (set 0 for an automatic value).batch_type – Whether
max_batch_size
andread_batch_size
are the numbers of “examples” or “tokens”.asynchronous – Run the translation asynchronously.
beam_size – Beam size (1 for greedy search).
patience – Beam search patience factor, as described in https://arxiv.org/abs/2204.05424. The decoding will continue until beam_size*patience hypotheses are finished.
num_hypotheses – Number of hypotheses to return.
length_penalty – Exponential penalty applied to the length during beam search.
coverage_penalty – Coverage penalty weight applied during beam search.
repetition_penalty – Penalty applied to the score of previously generated tokens (set > 1 to penalize).
no_repeat_ngram_size – Prevent repetitions of ngrams with this size (set 0 to disable).
disable_unk – Disable the generation of the unknown token.
suppress_sequences – Disable the generation of some sequences of tokens.
end_token – Stop the decoding on one of these tokens (defaults to the model EOS token).
prefix_bias_beta – Parameter for biasing translations towards given prefix.
max_input_length – Truncate inputs after this many tokens (set 0 to disable).
max_decoding_length – Maximum prediction length.
min_decoding_length – Minimum prediction length.
use_vmap – Use the vocabulary mapping file saved in this model
with_scores – Include the scores in the output.
sampling_topk – Randomly sample predictions from the top K candidates.
sampling_topp – Keep the most probable tokens whose cumulative probability exceeds this value.
sampling_temperature – Sampling temperature to generate more random samples.
replace_unknowns – Replace unknown target tokens by the source token with the highest attention.
source_tokenize_fn – Function to tokenize source lines.
target_tokenize_fn – Function to tokenize target lines.
target_detokenize_fn – Function to detokenize target outputs.
- Returns
A statistics object.
See also
TranslationOptions structure in the C++ library.
- translate_iterable(source: Iterable[List[str]], target_prefix: Optional[Iterable[List[str]]] = None, max_batch_size: int = 32, batch_type: str = 'examples', **kwargs) Iterable[TranslationResult]
Translates an iterable of tokenized examples.
This method is built on top of
ctranslate2.Translator.translate_batch()
to efficiently translate an arbitrarily large stream of data. It enables the following optimizations:stream processing (the iterable is not fully materialized in memory)
parallel translations (if the translator has multiple workers)
asynchronous batch prefetching
local sorting by length
- Parameters
source – An iterable of tokenized source examples.
target_prefix – An optional iterable of tokenized target prefixes.
max_batch_size – The maximum batch size.
batch_type – Whether
max_batch_size
is the number of “examples” or “tokens”.**kwargs – Any translation options accepted by
ctranslate2.Translator.translate_batch()
.
- Returns
A generator iterator over
ctranslate2.TranslationResult
instances.
Example
This method can be used to efficiently translate text files:
# Replace by your own tokenization and detokenization functions. tokenize_fn = lambda line: line.strip().split() detokenize_fn = lambda tokens: " ".join(tokens) with open("input.txt") as input_file: source = map(tokenize_fn, input_file) results = translator.translate_iterable(source, max_batch_size=64) for result in results: tokens = result.hypotheses[0] target = detokenize_fn(tokens) print(target)
- unload_model(to_cpu: bool = False) None
Unloads the model attached to this translator but keep enough runtime context to quickly resume translation on the initial device. The model is not guaranteed to be unloaded if translations are running concurrently.
- Parameters
to_cpu – If
True
, the model is moved to the CPU memory and not fully unloaded.
- property compute_type
Computation type used by the model.
- property device
Device this translator is running on.
- property device_index
List of device IDs where this translator is running on.
- property model_is_loaded
Whether the model is loaded on the initial device and ready to be used.
- property num_active_batches
Number of batches waiting to be processed or currently processed.
- property num_queued_batches
Number of batches waiting to be processed.
- property num_translators
Number of translators backing this instance.
- property tensor_parallel
Run model with tensor parallel mode.