opennmt.models.transformer module

Define the Google’s Transformer model.

class opennmt.models.transformer.Transformer(source_inputter, target_inputter, num_layers, num_units, num_heads, ffn_inner_dim, dropout=0.1, attention_dropout=0.1, relu_dropout=0.1, position_encoder=<opennmt.layers.position.SinusoidalPositionEncoder object>, decoder_self_attention_type='scaled_dot', share_embeddings=0, share_encoders=False, alignment_file_key='train_alignments', name='transformer')[source]

Bases: opennmt.models.sequence_to_sequence.SequenceToSequence

Attention-based sequence-to-sequence model as described in https://arxiv.org/abs/1706.03762.

__init__(source_inputter, target_inputter, num_layers, num_units, num_heads, ffn_inner_dim, dropout=0.1, attention_dropout=0.1, relu_dropout=0.1, position_encoder=<opennmt.layers.position.SinusoidalPositionEncoder object>, decoder_self_attention_type='scaled_dot', share_embeddings=0, share_encoders=False, alignment_file_key='train_alignments', name='transformer')[source]

Initializes a Transformer model.

Parameters:
  • source_inputter – A opennmt.inputters.inputter.Inputter to process the source data. If this inputter returns parallel inputs, a multi source Transformer architecture will be constructed.
  • target_inputter – A opennmt.inputters.inputter.Inputter to process the target data. Currently, only the opennmt.inputters.text_inputter.WordEmbedder is supported.
  • num_layers – The shared number of layers.
  • num_units – The number of hidden units.
  • num_heads – The number of heads in each self-attention layers.
  • ffn_inner_dim – The inner dimension of the feed forward layers.
  • dropout – The probability to drop units in each layer output.
  • attention_dropout – The probability to drop units from the attention.
  • relu_dropout – The probability to drop units from the ReLU activation in the feed forward layer.
  • position_encoder – A opennmt.layers.position.PositionEncoder to apply on the inputs.
  • decoder_self_attention_type – Type of self attention in the decoder, “scaled_dot” or “average” (case insensitive).
  • share_embeddings – Level of embeddings sharing, see opennmt.models.sequence_to_sequence.EmbeddingsSharingLevel for possible values.
  • share_encoders – In case of multi source architecture, whether to share the separate encoders parameters or not.
  • alignment_file_key – The data configuration key of the training alignment file to support guided alignment.
  • name – The name of this model.
auto_config(num_devices=1)[source]

Returns automatic configuration values specific to this model.

Parameters:num_devices – The number of devices used for the training.
Returns:A partial training configuration.