Translation

Translations

class onmt.translate.Translation(src, srclen, pred_sents, attn, pred_scores, tgt_sent, gold_score, word_aligns, ind_in_bucket)[source]

Bases: object

Container for a translated sentence.

Variables:
  • src (LongTensor) – Source word IDs.

  • srclen (List[int]) – Source lengths.

  • pred_sents (List[List[str]]) – Words from the n-best translations.

  • pred_scores (List[List[float]]) – Log-probs of n-best translations.

  • attns (List[FloatTensor]) – Attention distribution for each translation.

  • gold_sent (List[str]) – Words from gold translation.

  • gold_score (List[float]) – Log-prob of gold translation.

  • word_aligns (List[FloatTensor]) – Words Alignment distribution for each translation.

log(sent_number, src_raw='')[source]

Log translation.

Translator Class

class onmt.translate.Translator(model, vocabs, gpu=-1, n_best=1, min_length=0, max_length=100, max_length_ratio=1.5, ratio=0.0, beam_size=30, random_sampling_topk=0, random_sampling_topp=0.0, random_sampling_temp=1.0, stepwise_penalty=None, dump_beam=False, block_ngram_repeat=0, ignore_when_blocking=frozenset({}), replace_unk=False, ban_unk_token=False, tgt_file_prefix=False, phrase_table='', data_type='text', verbose=False, report_time=False, copy_attn=False, global_scorer=None, out_file=None, report_align=False, gold_align=False, report_score=True, logger=None, seed=-1, with_score=False, return_gold_log_probs=False)[source]

Bases: Inference

translate_batch(batch, attn_debug)[source]

Translate a batch of sentences.

class onmt.translate.TranslationBuilder(vocabs, n_best=1, replace_unk=False, phrase_table='')[source]

Bases: object

Build a word-based translation from the batch output of translator and the underlying dictionaries.

Replacement based on “Addressing the Rare Word Problem in Neural Machine Translation” [LSL+15]

Parameters:
  • () (vocabs) –

  • ()

  • n_best (int) – number of translations produced

  • replace_unk (bool) – replace unknown words using attention

Decoding Strategies

class onmt.translate.DecodeStrategy(pad, bos, eos, unk, start, batch_size, parallel_paths, global_scorer, min_length, block_ngram_repeat, exclusion_tokens, return_attention, max_length, ban_unk_token)[source]

Bases: object

Base class for generation strategies.

Parameters:
  • pad (int) – Magic integer in output vocab.

  • bos (int) – Magic integer in output vocab.

  • eos (int) – Magic integer in output vocab.

  • unk (int) – Magic integer in output vocab.

  • start (int) – Magic integer in output vocab.

  • batch_size (int) – Current batch size.

  • parallel_paths (int) – Decoding strategies like beam search use parallel paths. Each batch is repeated parallel_paths times in relevant state tensors.

  • min_length (int) – Shortest acceptable generation, not counting begin-of-sentence or end-of-sentence.

  • max_length (int) – Longest acceptable sequence, not counting begin-of-sentence (presumably there has been no EOS yet if max_length is used as a cutoff).

  • ban_unk_token (Boolean) – Whether unk token is forbidden

  • block_ngram_repeat (int) – Block beams where block_ngram_repeat-grams repeat.

  • exclusion_tokens (set[int]) – If a gram contains any of these tokens, it may repeat.

  • return_attention (bool) – Whether to work with attention too. If this is true, it is assumed that the decoder is attentional.

Variables:
  • pad (int) – See above.

  • bos (int) – See above.

  • eos (int) – See above.

  • unk (int) – See above.

  • start (int) – See above.

  • predictions (list[list[LongTensor]]) – For each batch, holds a list of beam prediction sequences. scores (list[list[FloatTensor]]): For each batch, holds a list of scores.

  • attention (list[list[FloatTensor or list[]]]) – For each batch, holds a list of attention sequence tensors (or empty lists) having shape (step, inp_seq_len) where inp_seq_len is the length of the sample (not the max length of all inp seqs).

  • alive_seq (LongTensor) – Shape (B x parallel_paths, step). This sequence grows in the step axis on each call to :func:advance().

  • is_finished (ByteTensor or NoneType) – Shape (B, parallel_paths). Initialized to None.

  • alive_attn (FloatTensor or NoneType) – If tensor, shape is (B x parallel_paths, step, inp_seq_len), where inp_seq_len is the (max) length of the input sequence.

  • target_prefix (LongTensor or NoneType) – If tensor, shape is (B x parallel_paths, prefix_seq_len), where prefix_seq_len is the (max) length of the pre-fixed prediction.

  • min_length (int) – See above.

  • max_length (int) – See above.

  • ban_unk_token (Boolean) – See above.

  • block_ngram_repeat (int) – See above.

  • exclusion_tokens (set[int]) – See above.

  • return_attention (bool) – See above.

  • done (bool) – See above.

advance(log_probs, attn)[source]

DecodeStrategy subclasses should override advance().

Advance is used to update self.alive_seq, self.is_finished, and, when appropriate, self.alive_attn.

block_ngram_repeats(log_probs)[source]

We prevent the beam from going in any direction that would repeat any ngram of size <block_ngram_repeat> more thant once.

The way we do it: we maintain a list of all ngrams of size <block_ngram_repeat> that is updated each time the beam advances, and manually put any token that would lead to a repeated ngram to 0.

This improves on the previous version’s complexity: - previous version’s complexity: batch_size * beam_size * len(self) - current version’s complexity: batch_size * beam_size

This improves on the previous version’s accuracy; - Previous version blocks the whole beam, whereas here we only block specific tokens. - Before the translation would fail when all beams contained repeated ngrams. This is sure to never happen here.

initialize(device=None, target_prefix=None)[source]

DecodeStrategy subclasses should override initialize().

initialize should be called before all actions. used to prepare necessary ingredients for decode.

maybe_update_forbidden_tokens()[source]

We complete and reorder the list of forbidden_tokens

maybe_update_target_prefix(select_index)[source]

We update / reorder target_prefix for alive path.

target_prefixing(log_probs)[source]

Fix the first part of predictions with self.target_prefix.

Args: log_probs (FloatTensor): logits of size (B, vocab_size).

Returns: log_probs (FloatTensor): modified logits in (B, vocab_size).

update_finished()[source]

DecodeStrategy subclasses should override update_finished().

update_finished is used to update self.predictions, self.scores, and other “output” attributes.

class onmt.translate.BeamSearch(beam_size, batch_size, pad, bos, eos, unk, start, n_best, global_scorer, min_length, max_length, return_attention, block_ngram_repeat, exclusion_tokens, stepwise_penalty, ratio, ban_unk_token)[source]

Bases: BeamSearchBase

Beam search for seq2seq/encoder-decoder models

initialize(enc_out, src_len, src_map=None, device=None, target_prefix=None)[source]

Initialize for decoding. Repeat src objects beam_size times.

onmt.translate.greedy_search.sample_with_temperature(logits, sampling_temp, keep_topk, keep_topp)[source]

Select next tokens randomly from the top k possible next tokens.

Samples from a categorical distribution over the keep_topk words using the category probabilities logits / sampling_temp.

Parameters:
  • logits (FloatTensor) – Shaped (batch_size, vocab_size). These can be logits ((-inf, inf)) or log-probs ((-inf, 0]). (The distribution actually uses the log-probabilities logits - logits.logsumexp(-1), which equals the logits if they are log-probabilities summing to 1.)

  • sampling_temp (float) – Used to scale down logits. The higher the value, the more likely it is that a non-max word will be sampled.

  • keep_topk (int) – This many words could potentially be chosen. The other logits are set to have probability 0.

  • keep_topp (float) – Keep most likely words until the cumulated probability is greater than p. If used with keep_topk: both conditions will be applied

Returns:

  • topk_ids: Shaped (batch_size, 1). These are the sampled word indices in the output vocab.

  • topk_scores: Shaped (batch_size, 1). These are essentially (logits / sampling_temp)[topk_ids].

Return type:

(LongTensor, FloatTensor)

class onmt.translate.GreedySearch(pad, bos, eos, unk, start, n_best, batch_size, global_scorer, min_length, block_ngram_repeat, exclusion_tokens, return_attention, max_length, sampling_temp, keep_topk, keep_topp, beam_size, ban_unk_token)[source]

Bases: DecodeStrategy

Select next tokens randomly from the top k possible next tokens.

The scores attribute’s lists are the score, after applying temperature, of the final prediction (either EOS or the final token in the event that max_length is reached)

Parameters:
  • pad (int) – See base.

  • bos (int) – See base.

  • eos (int) – See base.

  • unk (int) – See base.

  • start (int) – See base.

  • n_best (int) – Don’t stop until at least this many beams have reached EOS.

  • batch_size (int) – See base.

  • global_scorer (onmt.translate.GNMTGlobalScorer) – Scorer instance.

  • min_length (int) – See base.

  • max_length (int) – See base.

  • ban_unk_token (Boolean) – See base.

  • block_ngram_repeat (int) – See base.

  • exclusion_tokens (set[int]) – See base.

  • return_attention (bool) – See base.

  • max_length – See base.

  • sampling_temp (float) – See sample_with_temperature().

  • keep_topk (int) – See sample_with_temperature().

  • keep_topp (float) – See sample_with_temperature().

  • beam_size (int) – Number of beams to use.

advance(log_probs, attn)[source]

Select next tokens randomly from the top k possible next tokens.

Parameters:
  • log_probs (FloatTensor) – Shaped (batch_size, vocab_size). These can be logits ((-inf, inf)) or log-probs ((-inf, 0]). (The distribution actually uses the log-probabilities logits - logits.logsumexp(-1), which equals the logits if they are log-probabilities summing to 1.)

  • attn (FloatTensor) – Shaped (1, B, inp_seq_len).

initialize(enc_out, src_len, src_map=None, device=None, target_prefix=None)[source]

Initialize for decoding.

update_finished()[source]

Finalize scores and predictions.

Scoring

class onmt.translate.penalties.PenaltyBuilder(cov_pen, length_pen)[source]

Bases: object

Returns the Length and Coverage Penalty function for Beam Search.

Parameters:
  • length_pen (str) – option name of length pen

  • cov_pen (str) – option name of cov pen

Variables:
  • has_cov_pen (bool) – Whether coverage penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting beta to 0 should force coverage length to be a no-op.

  • has_len_pen (bool) – Whether length penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting alpha to 1 should force length penalty to be a no-op.

  • coverage_penalty (callable[[FloatTensor, float], FloatTensor]) – Calculates the coverage penalty.

  • length_penalty (callable[[int, float], float]) – Calculates the length penalty.

coverage_none(cov, beta=0.0)[source]

Returns zero as penalty

coverage_summary(cov, beta=0.0)[source]

Our summary penalty.

coverage_wu(cov, beta=0.0)[source]

GNMT coverage re-ranking score.

See “Google’s Neural Machine Translation System” [WSC+16]. cov is expected to be sized (*, seq_len), where * is probably batch_size x beam_size but could be several dimensions like (batch_size, beam_size). If cov is attention, then the seq_len axis probably sums to (almost) 1.

length_average(cur_len, alpha=1.0)[source]

Returns the current sequence length.

length_none(cur_len, alpha=0.0)[source]

Returns unmodified scores.

length_wu(cur_len, alpha=0.0)[source]

GNMT length re-ranking score.

See “Google’s Neural Machine Translation System” [WSC+16].

class onmt.translate.GNMTGlobalScorer(alpha, beta, length_penalty, coverage_penalty)[source]

Bases: object

NMT re-ranking.

Parameters:
  • alpha (float) – Length parameter.

  • beta (float) – Coverage parameter.

  • length_penalty (str) – Length penalty strategy.

  • coverage_penalty (str) – Coverage penalty strategy.

Variables: