By default, translation is done using beam search. The
-beam_size option can be used to trade-off translation time and search accuracy, with
-beam_size 1 giving greedy search. The small default beam size is often enough in practice.
Beam search can also be used to provide an approximate n-best list of translations by setting
-n_best greater than 1. For analysis, the translation command also takes an oracle/gold
-tgt file and will output a comparison of scores.
The beam search provides a built-in filter based on unknown words:
-max_num_unks. Hypotheses with more unknown words than this value are dropped.
As dropped hypotheses temporarily reduce the beam size, the
-pre_filter_factor is a way to increase the number of considered hypotheses before applying filters.
The beam search also supports various normalization techniques that are disabled by default and can be used to biased the scores generated by the model:
where is the source, is the current target, and the functions as defined below. An additional penalty on end of sentence tokens can also be added to prioritize longer sentences.
Scores are normalized by the following formula as defined in Wu et al. (2016):
where is the current target length and is the length normalization coefficient
Scores are penalized by the following formula as defined in Wu et al. (2016):
where is the attention probability of the -th target word on the -th source word , is the source length, is the current target length and is the coverage normalization coefficient
End of sentence normalization¶
The score of the end of sentence token is penalized by the following formula:
where is the source length, is the current target length and is the coverage normalization coefficient