Doc: Translation

Translations

class onmt.translate.Translation(src, src_raw, pred_sents, attn, pred_scores, tgt_sent, gold_score)[source]

Container for a translated sentence.

src

LongTensor – src word ids

src_raw

[str] – raw src words

pred_sents

[[str]] – words from the n-best translations

pred_scores

[[float]] – log-probs of n-best translations

attns

[FloatTensor] – attention dist for each translation

gold_sent

[str] – words from gold translation

gold_score

[float] – log-prob of gold translation

log(sent_number)[source]

Log translation.

Translator Class

class onmt.translate.Translator(model, fields, beam_size, n_best=1, max_length=100, global_scorer=None, copy_attn=False, logger=None, gpu=False, dump_beam='', min_length=0, stepwise_penalty=False, block_ngram_repeat=0, ignore_when_blocking=[], sample_rate='16000', window_size=0.02, window_stride=0.01, window='hamming', use_filter_pred=False, data_type='text', replace_unk=False, report_score=True, report_bleu=False, report_rouge=False, verbose=False, out_file=None, fast=False)[source]

Uses a model to translate a batch of sentences.

Parameters:
  • model (onmt.modules.NMTModel) – NMT model to use for translation
  • fields (dict of Fields) – data fields
  • beam_size (int) – size of beam to use
  • n_best (int) – number of translations produced
  • max_length (int) – maximum length output to produce
  • global_scores (GlobalScorer) – object to rescore final translations
  • copy_attn (bool) – use copy attention during translation
  • cuda (bool) – use cuda
  • beam_trace (bool) – trace beam search for debugging
  • logger (logging.Logger) – logger.
translate(src_path=None, src_data_iter=None, tgt_path=None, tgt_data_iter=None, src_dir=None, batch_size=None, attn_debug=False)[source]

Translate content of src_data_iter (if not None) or src_path and get gold scores if one of tgt_data_iter or tgt_path is set.

Note: batch_size must not be None Note: one of (‘src_path’, ‘src_data_iter’) must not be None

Parameters:
  • src_path (str) – filepath of source data
  • src_data_iter (iterator) – an interator generating source data e.g. it may be a list or an openned file
  • tgt_path (str) – filepath of target data
  • tgt_data_iter (iterator) – an interator generating target data
  • src_dir (str) – source directory path (used for Audio and Image datasets)
  • batch_size (int) – size of examples per mini-batch
  • attn_debug (bool) – enables the attention logging
Returns:

(list, list)

  • all_scores is a list of batch_size lists of n_best scores
  • all_predictions is a list of batch_size lists
    of n_best predictions

translate_batch(batch, data, fast=False)[source]

Translate a batch of sentences.

Mostly a wrapper around Beam.

Parameters:
  • batch (Batch) – a batch from a dataset object
  • data (Dataset) – the dataset object
  • fast (bool) – enables fast beam search (may not support all features)
class onmt.translate.TranslationBuilder(data, fields, n_best=1, replace_unk=False, has_tgt=False)[source]

Build a word-based translation from the batch output of translator and the underlying dictionaries.

Replacement based on “Addressing the Rare Word Problem in Neural Machine Translation” [LSL+15]

Parameters:
  • data (DataSet) –
  • fields (dict of Fields) – data fields
  • n_best (int) – number of translations produced
  • replace_unk (bool) – replace unknown words using attention
  • has_tgt (bool) – will the batch have gold targets