Translation¶
Translations¶
- class onmt.translate.Translation(src, srclen, pred_sents, attn, pred_scores, tgt_sent, gold_score, word_aligns, ind_in_bucket)[source]¶
Bases:
object
Container for a translated sentence.
- Variables:
src (LongTensor) – Source word IDs.
srclen (List[int]) – Source lengths.
pred_sents (List[List[str]]) – Words from the n-best translations.
pred_scores (List[List[float]]) – Log-probs of n-best translations.
attns (List[FloatTensor]) – Attention distribution for each translation.
gold_sent (List[str]) – Words from gold translation.
gold_score (List[float]) – Log-prob of gold translation.
word_aligns (List[FloatTensor]) – Words Alignment distribution for each translation.
Translator Class¶
- class onmt.translate.Translator(model, vocabs, gpu=-1, n_best=1, min_length=0, max_length=100, max_length_ratio=1.5, ratio=0.0, beam_size=30, random_sampling_topk=0, random_sampling_topp=0.0, random_sampling_temp=1.0, stepwise_penalty=None, dump_beam=False, block_ngram_repeat=0, ignore_when_blocking=frozenset({}), replace_unk=False, ban_unk_token=False, tgt_file_prefix=False, phrase_table='', data_type='text', verbose=False, report_time=False, copy_attn=False, global_scorer=None, out_file=None, report_align=False, gold_align=False, report_score=True, logger=None, seed=-1, with_score=False, return_gold_log_probs=False)[source]¶
Bases:
Inference
- class onmt.translate.TranslationBuilder(vocabs, n_best=1, replace_unk=False, phrase_table='')[source]¶
Bases:
object
Build a word-based translation from the batch output of translator and the underlying dictionaries.
Replacement based on “Addressing the Rare Word Problem in Neural Machine Translation” [LSL+15]
- Parameters:
() (vocabs) –
() –
n_best (int) – number of translations produced
replace_unk (bool) – replace unknown words using attention
Decoding Strategies¶
- class onmt.translate.DecodeStrategy(pad, bos, eos, unk, start, batch_size, parallel_paths, global_scorer, min_length, block_ngram_repeat, exclusion_tokens, return_attention, max_length, ban_unk_token)[source]¶
Bases:
object
Base class for generation strategies.
- Parameters:
pad (int) – Magic integer in output vocab.
bos (int) – Magic integer in output vocab.
eos (int) – Magic integer in output vocab.
unk (int) – Magic integer in output vocab.
start (int) – Magic integer in output vocab.
batch_size (int) – Current batch size.
parallel_paths (int) – Decoding strategies like beam search use parallel paths. Each batch is repeated
parallel_paths
times in relevant state tensors.min_length (int) – Shortest acceptable generation, not counting begin-of-sentence or end-of-sentence.
max_length (int) – Longest acceptable sequence, not counting begin-of-sentence (presumably there has been no EOS yet if max_length is used as a cutoff).
ban_unk_token (Boolean) – Whether unk token is forbidden
block_ngram_repeat (int) – Block beams where
block_ngram_repeat
-grams repeat.exclusion_tokens (set[int]) – If a gram contains any of these tokens, it may repeat.
return_attention (bool) – Whether to work with attention too. If this is true, it is assumed that the decoder is attentional.
- Variables:
pad (int) – See above.
bos (int) – See above.
eos (int) – See above.
unk (int) – See above.
start (int) – See above.
predictions (list[list[LongTensor]]) – For each batch, holds a list of beam prediction sequences. scores (list[list[FloatTensor]]): For each batch, holds a list of scores.
attention (list[list[FloatTensor or list[]]]) – For each batch, holds a list of attention sequence tensors (or empty lists) having shape
(step, inp_seq_len)
whereinp_seq_len
is the length of the sample (not the max length of all inp seqs).alive_seq (LongTensor) – Shape
(B x parallel_paths, step)
. This sequence grows in thestep
axis on each call to :func:advance()
.is_finished (ByteTensor or NoneType) – Shape
(B, parallel_paths)
. Initialized toNone
.alive_attn (FloatTensor or NoneType) – If tensor, shape is
(B x parallel_paths, step, inp_seq_len)
, whereinp_seq_len
is the (max) length of the input sequence.target_prefix (LongTensor or NoneType) – If tensor, shape is
(B x parallel_paths, prefix_seq_len)
, whereprefix_seq_len
is the (max) length of the pre-fixed prediction.min_length (int) – See above.
max_length (int) – See above.
ban_unk_token (Boolean) – See above.
block_ngram_repeat (int) – See above.
exclusion_tokens (set[int]) – See above.
return_attention (bool) – See above.
done (bool) – See above.
- advance(log_probs, attn)[source]¶
DecodeStrategy subclasses should override
advance()
.Advance is used to update
self.alive_seq
,self.is_finished
, and, when appropriate,self.alive_attn
.
- block_ngram_repeats(log_probs)[source]¶
We prevent the beam from going in any direction that would repeat any ngram of size <block_ngram_repeat> more thant once.
The way we do it: we maintain a list of all ngrams of size <block_ngram_repeat> that is updated each time the beam advances, and manually put any token that would lead to a repeated ngram to 0.
This improves on the previous version’s complexity: - previous version’s complexity: batch_size * beam_size * len(self) - current version’s complexity: batch_size * beam_size
This improves on the previous version’s accuracy; - Previous version blocks the whole beam, whereas here we only block specific tokens. - Before the translation would fail when all beams contained repeated ngrams. This is sure to never happen here.
- initialize(device=None, target_prefix=None)[source]¶
DecodeStrategy subclasses should override
initialize()
.initialize should be called before all actions. used to prepare necessary ingredients for decode.
- target_prefixing(log_probs)[source]¶
Fix the first part of predictions with self.target_prefix.
Args: log_probs (FloatTensor): logits of size
(B, vocab_size)
.Returns: log_probs (FloatTensor): modified logits in
(B, vocab_size)
.
- update_finished()[source]¶
DecodeStrategy subclasses should override
update_finished()
.update_finished
is used to updateself.predictions
,self.scores
, and other “output” attributes.
- class onmt.translate.BeamSearch(beam_size, batch_size, pad, bos, eos, unk, start, n_best, global_scorer, min_length, max_length, return_attention, block_ngram_repeat, exclusion_tokens, stepwise_penalty, ratio, ban_unk_token)[source]¶
Bases:
BeamSearchBase
Beam search for seq2seq/encoder-decoder models
- onmt.translate.greedy_search.sample_with_temperature(logits, sampling_temp, keep_topk, keep_topp)[source]¶
Select next tokens randomly from the top k possible next tokens.
Samples from a categorical distribution over the
keep_topk
words using the category probabilitieslogits / sampling_temp
.- Parameters:
logits (FloatTensor) – Shaped
(batch_size, vocab_size)
. These can be logits ((-inf, inf)
) or log-probs ((-inf, 0]
). (The distribution actually uses the log-probabilitieslogits - logits.logsumexp(-1)
, which equals the logits if they are log-probabilities summing to 1.)sampling_temp (float) – Used to scale down logits. The higher the value, the more likely it is that a non-max word will be sampled.
keep_topk (int) – This many words could potentially be chosen. The other logits are set to have probability 0.
keep_topp (float) – Keep most likely words until the cumulated probability is greater than p. If used with keep_topk: both conditions will be applied
- Returns:
topk_ids: Shaped
(batch_size, 1)
. These are the sampled word indices in the output vocab.topk_scores: Shaped
(batch_size, 1)
. These are essentially(logits / sampling_temp)[topk_ids]
.
- Return type:
(LongTensor, FloatTensor)
- class onmt.translate.GreedySearch(pad, bos, eos, unk, start, n_best, batch_size, global_scorer, min_length, block_ngram_repeat, exclusion_tokens, return_attention, max_length, sampling_temp, keep_topk, keep_topp, beam_size, ban_unk_token)[source]¶
Bases:
DecodeStrategy
Select next tokens randomly from the top k possible next tokens.
The
scores
attribute’s lists are the score, after applying temperature, of the final prediction (either EOS or the final token in the event thatmax_length
is reached)- Parameters:
pad (int) – See base.
bos (int) – See base.
eos (int) – See base.
unk (int) – See base.
start (int) – See base.
n_best (int) – Don’t stop until at least this many beams have reached EOS.
batch_size (int) – See base.
global_scorer (onmt.translate.GNMTGlobalScorer) – Scorer instance.
min_length (int) – See base.
max_length (int) – See base.
ban_unk_token (Boolean) – See base.
block_ngram_repeat (int) – See base.
exclusion_tokens (set[int]) – See base.
return_attention (bool) – See base.
max_length – See base.
sampling_temp (float) – See
sample_with_temperature()
.keep_topk (int) – See
sample_with_temperature()
.keep_topp (float) – See
sample_with_temperature()
.beam_size (int) – Number of beams to use.
- advance(log_probs, attn)[source]¶
Select next tokens randomly from the top k possible next tokens.
- Parameters:
log_probs (FloatTensor) – Shaped
(batch_size, vocab_size)
. These can be logits ((-inf, inf)
) or log-probs ((-inf, 0]
). (The distribution actually uses the log-probabilitieslogits - logits.logsumexp(-1)
, which equals the logits if they are log-probabilities summing to 1.)attn (FloatTensor) – Shaped
(1, B, inp_seq_len)
.
Scoring¶
- class onmt.translate.penalties.PenaltyBuilder(cov_pen, length_pen)[source]¶
Bases:
object
Returns the Length and Coverage Penalty function for Beam Search.
- Parameters:
length_pen (str) – option name of length pen
cov_pen (str) – option name of cov pen
- Variables:
has_cov_pen (bool) – Whether coverage penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting beta to 0 should force coverage length to be a no-op.
has_len_pen (bool) – Whether length penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting alpha to 1 should force length penalty to be a no-op.
coverage_penalty (callable[[FloatTensor, float], FloatTensor]) – Calculates the coverage penalty.
length_penalty (callable[[int, float], float]) – Calculates the length penalty.
- coverage_wu(cov, beta=0.0)[source]¶
GNMT coverage re-ranking score.
See “Google’s Neural Machine Translation System” [WSC+16].
cov
is expected to be sized(*, seq_len)
, where*
is probablybatch_size x beam_size
but could be several dimensions like(batch_size, beam_size)
. Ifcov
is attention, then theseq_len
axis probably sums to (almost) 1.
- class onmt.translate.GNMTGlobalScorer(alpha, beta, length_penalty, coverage_penalty)[source]¶
Bases:
object
NMT re-ranking.
- Parameters:
alpha (float) – Length parameter.
beta (float) – Coverage parameter.
length_penalty (str) – Length penalty strategy.
coverage_penalty (str) – Coverage penalty strategy.
- Variables:
alpha (float) – See above.
beta (float) – See above.
length_penalty (callable) – See
penalties.PenaltyBuilder
.coverage_penalty (callable) – See
penalties.PenaltyBuilder
.has_cov_pen (bool) – See
penalties.PenaltyBuilder
.has_len_pen (bool) – See
penalties.PenaltyBuilder
.