Translation¶

Translations¶

class onmt.translate.Translation(src, srclen, pred_sents, attn, pred_scores, tgt_sent, gold_score, word_aligns, ind_in_bucket)[source]¶

Bases: object

Container for a translated sentence.

Variables:

src (LongTensor) – Source word IDs.
srclen (List[int]) – Source lengths.
pred_sents (List[List[str]]) – Words from the n-best translations.
pred_scores (List[List[float]]) – Log-probs of n-best translations.
attns (List[FloatTensor]) – Attention distribution for each translation.
gold_sent (List[str]) – Words from gold translation.
gold_score (List[float]) – Log-prob of gold translation.
word_aligns (List[FloatTensor]) – Words Alignment distribution for each translation.

log(sent_number, src_raw='')[source]¶: Log translation.

Translator Class¶

class onmt.translate.Translator(model, vocabs, gpu=-1, n_best=1, min_length=0, max_length=100, max_length_ratio=1.5, ratio=0.0, beam_size=30, random_sampling_topk=0, random_sampling_topp=0.0, random_sampling_temp=1.0, stepwise_penalty=None, dump_beam=False, block_ngram_repeat=0, ignore_when_blocking=frozenset({}), replace_unk=False, ban_unk_token=False, tgt_file_prefix=False, phrase_table='', data_type='text', verbose=False, report_time=False, copy_attn=False, global_scorer=None, out_file=None, report_align=False, gold_align=False, report_score=True, logger=None, seed=-1, with_score=False, return_gold_log_probs=False)[source]¶

Bases: Inference

translate_batch(batch, attn_debug)[source]¶: Translate a batch of sentences.

class onmt.translate.TranslationBuilder(vocabs, n_best=1, replace_unk=False, phrase_table='')[source]¶

Bases: object

Build a word-based translation from the batch output of translator and the underlying dictionaries.

Replacement based on “Addressing the Rare Word Problem in Neural Machine Translation” [LSL+15]

Parameters:

() (vocabs) –
() –
n_best (int) – number of translations produced
replace_unk (bool) – replace unknown words using attention

Decoding Strategies¶

class onmt.translate.DecodeStrategy(pad, bos, eos, unk, start, batch_size, parallel_paths, global_scorer, min_length, block_ngram_repeat, exclusion_tokens, return_attention, max_length, ban_unk_token)[source]¶

Bases: object

Base class for generation strategies.

Parameters:

pad (int) – Magic integer in output vocab.
bos (int) – Magic integer in output vocab.
eos (int) – Magic integer in output vocab.
unk (int) – Magic integer in output vocab.
start (int) – Magic integer in output vocab.
batch_size (int) – Current batch size.
parallel_paths (int) – Decoding strategies like beam search use parallel paths. Each batch is repeated parallel_paths times in relevant state tensors.
min_length (int) – Shortest acceptable generation, not counting begin-of-sentence or end-of-sentence.
max_length (int) – Longest acceptable sequence, not counting begin-of-sentence (presumably there has been no EOS yet if max_length is used as a cutoff).
ban_unk_token (Boolean) – Whether unk token is forbidden
block_ngram_repeat (int) – Block beams where block_ngram_repeat-grams repeat.
exclusion_tokens (set[int]) – If a gram contains any of these tokens, it may repeat.
return_attention (bool) – Whether to work with attention too. If this is true, it is assumed that the decoder is attentional.

Variables:

pad (int) – See above.
bos (int) – See above.
eos (int) – See above.
unk (int) – See above.
start (int) – See above.
predictions (list[list[LongTensor]]) – For each batch, holds a list of beam prediction sequences. scores (list[list[FloatTensor]]): For each batch, holds a list of scores.
attention (list[list[FloatTensor or list[]]]) – For each batch, holds a list of attention sequence tensors (or empty lists) having shape (step, inp_seq_len) where inp_seq_len is the length of the sample (not the max length of all inp seqs).
alive_seq (LongTensor) – Shape (B x parallel_paths, step). This sequence grows in the step axis on each call to :func:advance().
is_finished (ByteTensor or NoneType) – Shape (B, parallel_paths). Initialized to None.
alive_attn (FloatTensor or NoneType) – If tensor, shape is (B x parallel_paths, step, inp_seq_len), where inp_seq_len is the (max) length of the input sequence.
target_prefix (LongTensor or NoneType) – If tensor, shape is (B x parallel_paths, prefix_seq_len), where prefix_seq_len is the (max) length of the pre-fixed prediction.
min_length (int) – See above.
max_length (int) – See above.
ban_unk_token (Boolean) – See above.
block_ngram_repeat (int) – See above.
exclusion_tokens (set[int]) – See above.
return_attention (bool) – See above.
done (bool) – See above.

advance(log_probs, attn)[source]¶

DecodeStrategy subclasses should override advance().

Advance is used to update self.alive_seq, self.is_finished, and, when appropriate, self.alive_attn.

block_ngram_repeats(log_probs)[source]¶

We prevent the beam from going in any direction that would repeat any ngram of size <block_ngram_repeat> more thant once.

The way we do it: we maintain a list of all ngrams of size <block_ngram_repeat> that is updated each time the beam advances, and manually put any token that would lead to a repeated ngram to 0.

This improves on the previous version’s complexity: - previous version’s complexity: batch_size * beam_size * len(self) - current version’s complexity: batch_size * beam_size

This improves on the previous version’s accuracy; - Previous version blocks the whole beam, whereas here we only block specific tokens. - Before the translation would fail when all beams contained repeated ngrams. This is sure to never happen here.

initialize(device=None, target_prefix=None)[source]¶

DecodeStrategy subclasses should override initialize().

initialize should be called before all actions. used to prepare necessary ingredients for decode.

maybe_update_forbidden_tokens()[source]¶: We complete and reorder the list of forbidden_tokens

maybe_update_target_prefix(select_index)[source]¶: We update / reorder target_prefix for alive path.

target_prefixing(log_probs)[source]¶

Fix the first part of predictions with self.target_prefix.

Args: log_probs (FloatTensor): logits of size (B, vocab_size).

Returns: log_probs (FloatTensor): modified logits in (B, vocab_size).

update_finished()[source]¶

DecodeStrategy subclasses should override update_finished().

update_finished is used to update self.predictions, self.scores, and other “output” attributes.

class onmt.translate.BeamSearch(beam_size, batch_size, pad, bos, eos, unk, start, n_best, global_scorer, min_length, max_length, return_attention, block_ngram_repeat, exclusion_tokens, stepwise_penalty, ratio, ban_unk_token)[source]¶

Bases: BeamSearchBase

Beam search for seq2seq/encoder-decoder models

initialize(enc_out, src_len, src_map=None, device=None, target_prefix=None)[source]¶: Initialize for decoding. Repeat src objects beam_size times.

onmt.translate.greedy_search.sample_with_temperature(logits, sampling_temp, keep_topk, keep_topp)[source]¶

Select next tokens randomly from the top k possible next tokens.

Samples from a categorical distribution over the keep_topk words using the category probabilities logits / sampling_temp.

Parameters:

logits (FloatTensor) – Shaped (batch_size, vocab_size). These can be logits ((-inf, inf)) or log-probs ((-inf, 0]). (The distribution actually uses the log-probabilities logits - logits.logsumexp(-1), which equals the logits if they are log-probabilities summing to 1.)
sampling_temp (float) – Used to scale down logits. The higher the value, the more likely it is that a non-max word will be sampled.
keep_topk (int) – This many words could potentially be chosen. The other logits are set to have probability 0.
keep_topp (float) – Keep most likely words until the cumulated probability is greater than p. If used with keep_topk: both conditions will be applied

Returns:

topk_ids: Shaped (batch_size, 1). These are the sampled word indices in the output vocab.
topk_scores: Shaped (batch_size, 1). These are essentially (logits / sampling_temp)[topk_ids].

Return type:

(LongTensor, FloatTensor)

class onmt.translate.GreedySearch(pad, bos, eos, unk, start, n_best, batch_size, global_scorer, min_length, block_ngram_repeat, exclusion_tokens, return_attention, max_length, sampling_temp, keep_topk, keep_topp, beam_size, ban_unk_token)[source]¶

Bases: DecodeStrategy

Select next tokens randomly from the top k possible next tokens.

The scores attribute’s lists are the score, after applying temperature, of the final prediction (either EOS or the final token in the event that max_length is reached)

Parameters:

pad (int) – See base.
bos (int) – See base.
eos (int) – See base.
unk (int) – See base.
start (int) – See base.
n_best (int) – Don’t stop until at least this many beams have reached EOS.
batch_size (int) – See base.
global_scorer (onmt.translate.GNMTGlobalScorer) – Scorer instance.
min_length (int) – See base.
max_length (int) – See base.
ban_unk_token (Boolean) – See base.
block_ngram_repeat (int) – See base.
exclusion_tokens (set[int]) – See base.
return_attention (bool) – See base.
max_length – See base.
sampling_temp (float) – See sample_with_temperature().
keep_topk (int) – See sample_with_temperature().
keep_topp (float) – See sample_with_temperature().
beam_size (int) – Number of beams to use.

advance(log_probs, attn)[source]¶

Select next tokens randomly from the top k possible next tokens.

Parameters:

log_probs (FloatTensor) – Shaped (batch_size, vocab_size). These can be logits ((-inf, inf)) or log-probs ((-inf, 0]). (The distribution actually uses the log-probabilities logits - logits.logsumexp(-1), which equals the logits if they are log-probabilities summing to 1.)
attn (FloatTensor) – Shaped (1, B, inp_seq_len).

initialize(enc_out, src_len, src_map=None, device=None, target_prefix=None)[source]¶: Initialize for decoding.

update_finished()[source]¶: Finalize scores and predictions.

Scoring¶

class onmt.translate.penalties.PenaltyBuilder(cov_pen, length_pen)[source]¶

Bases: object

Returns the Length and Coverage Penalty function for Beam Search.

Parameters:

length_pen (str) – option name of length pen
cov_pen (str) – option name of cov pen

Variables:

has_cov_pen (bool) – Whether coverage penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting beta to 0 should force coverage length to be a no-op.
has_len_pen (bool) – Whether length penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting alpha to 1 should force length penalty to be a no-op.
coverage_penalty (callable[[FloatTensor, float], FloatTensor]) – Calculates the coverage penalty.
length_penalty (callable[[int, float], float]) – Calculates the length penalty.

coverage_none(cov, beta=0.0)[source]¶: Returns zero as penalty

coverage_summary(cov, beta=0.0)[source]¶: Our summary penalty.

coverage_wu(cov, beta=0.0)[source]¶

GNMT coverage re-ranking score.

See “Google’s Neural Machine Translation System” [WSC+16]. cov is expected to be sized (*, seq_len), where * is probably batch_size x beam_size but could be several dimensions like (batch_size, beam_size). If cov is attention, then the seq_len axis probably sums to (almost) 1.

length_average(cur_len, alpha=1.0)[source]¶: Returns the current sequence length.

length_none(cur_len, alpha=0.0)[source]¶: Returns unmodified scores.

length_wu(cur_len, alpha=0.0)[source]¶

GNMT length re-ranking score.

See “Google’s Neural Machine Translation System” [WSC+16].

class onmt.translate.GNMTGlobalScorer(alpha, beta, length_penalty, coverage_penalty)[source]¶

Bases: object

NMT re-ranking.

Parameters:

alpha (float) – Length parameter.
beta (float) – Coverage parameter.
length_penalty (str) – Length penalty strategy.
coverage_penalty (str) – Coverage penalty strategy.

Variables:

alpha (float) – See above.
beta (float) – See above.
length_penalty (callable) – See penalties.PenaltyBuilder.
coverage_penalty (callable) – See penalties.PenaltyBuilder.
has_cov_pen (bool) – See penalties.PenaltyBuilder.
has_len_pen (bool) – See penalties.PenaltyBuilder.