opennmt.decoders.self_attention_decoder module

Define self-attention decoder.

class opennmt.decoders.self_attention_decoder.SelfAttentionDecoder(num_layers, num_units=512, num_heads=8, ffn_inner_dim=2048, dropout=0.1, attention_dropout=0.1, relu_dropout=0.1, position_encoder=<opennmt.layers.position.SinusoidalPositionEncoder object>, self_attention_type='scaled_dot')[source]

Bases: opennmt.decoders.decoder.Decoder

Decoder using self-attention as described in https://arxiv.org/abs/1706.03762.

__init__(num_layers, num_units=512, num_heads=8, ffn_inner_dim=2048, dropout=0.1, attention_dropout=0.1, relu_dropout=0.1, position_encoder=<opennmt.layers.position.SinusoidalPositionEncoder object>, self_attention_type='scaled_dot')[source]

Initializes the parameters of the decoder.

Parameters:
  • num_layers – The number of layers.
  • num_units – The number of hidden units.
  • num_heads – The number of heads in the multi-head attention.
  • ffn_inner_dim – The number of units of the inner linear transformation in the feed forward layer.
  • dropout – The probability to drop units from the outputs.
  • attention_dropout – The probability to drop units from the attention.
  • relu_dropout – The probability to drop units from the ReLU activation in the feed forward layer.
  • position_encoder – A opennmt.layers.position.PositionEncoder to apply on inputs or None.
  • self_attention_type – Type of self attention, “scaled_dot” or “average” (case insensitive).
Raises:

ValueError – if self_attention_type is invalid.

output_size

Returns the decoder output size.

support_alignment_history

Returns True if this decoder can return the attention as alignment history.

support_multi_source

Returns True if this decoder supports multiple source context.

decode_from_inputs(inputs, sequence_length, initial_state=None, mode='train', memory=None, memory_sequence_length=None)[source]

Decodes from full inputs.

Parameters:
  • inputs – The input to decode of shape \([B, T, ...]\).
  • sequence_length – The length of each input with shape \([B]\).
  • initial_state – The initial state as a (possibly nested tuple of…) tensors.
  • mode – A tf.estimator.ModeKeys mode.
  • memory – (optional) Memory values to query.
  • memory_sequence_length – (optional) Memory values length.
Returns:

A tuple (outputs, state) or (outputs, state, attention) if self.support_alignment_history.

step_fn(mode, batch_size, initial_state=None, memory=None, memory_sequence_length=None, dtype=tf.float32)[source]

Callable to run decoding steps.

Parameters:
  • mode – A tf.estimator.ModeKeys mode.
  • batch_size – The batch size.
  • initial_state – The initial state to start from as a (possibly nested tuple of…) tensors.
  • memory – (optional) Memory values to query.
  • memory_sequence_length – (optional) Memory values length.
  • dtype – The data type.
Returns:

A callable with the signature (step, inputs, state, mode) -> (outputs, state) or (outputs, state, attention) if self.support_alignment_history.

class opennmt.decoders.self_attention_decoder.SelfAttentionDecoderV2(num_layers, num_units=512, num_heads=8, ffn_inner_dim=2048, dropout=0.1, attention_dropout=0.1, ffn_dropout=0.1, ffn_activation=<function relu>, position_encoder=<opennmt.layers.position.SinusoidalPositionEncoder object>, num_sources=1, **kwargs)[source]

Bases: opennmt.decoders.decoder.DecoderV2

Encoder using self-attention as described in https://arxiv.org/abs/1706.03762.

Note

TensorFlow 2.0 version.

__init__(num_layers, num_units=512, num_heads=8, ffn_inner_dim=2048, dropout=0.1, attention_dropout=0.1, ffn_dropout=0.1, ffn_activation=<function relu>, position_encoder=<opennmt.layers.position.SinusoidalPositionEncoder object>, num_sources=1, **kwargs)[source]

Initializes the parameters of the decoder.

Parameters:
  • num_layers – The number of layers.
  • num_units – The number of hidden units.
  • num_heads – The number of heads in the multi-head attention.
  • ffn_inner_dim – The number of units of the inner linear transformation in the feed forward layer.
  • dropout – The probability to drop units from the outputs.
  • attention_dropout – The probability to drop units from the attention.
  • ffn_dropout – The probability to drop units from the activation output in the feed forward layer.
  • ffn_activation – The activation function to apply between the two linear transformations of the feed forward layer.
  • position_encoder – The opennmt.layers.position.PositionEncoder to apply on inputs.
  • num_sources – The number of source contexts expected by this decoder.
  • **kwargs – Additional layer arguments.
minimum_sources

The minimum number of source contexts supported by this decoder.

maximum_sources

The maximum number of source contexts supported by this decoder.

forward(inputs, sequence_length=None, initial_state=None, memory=None, memory_sequence_length=None, training=None)[source]

Runs the decoder on full sequences.

Parameters:
  • inputs – The 3D decoder input.
  • sequence_length – The length of each input sequence.
  • initial_state – The initial decoder state.
  • memory – Memory values to query.
  • memory_sequence_length – Memory values length.
  • training – Run in training mode.
Returns:

A tuple with the decoder outputs, the decoder state, and the attention vector.

step(inputs, timestep, state=None, memory=None, memory_sequence_length=None, training=None)[source]

Runs one decoding step.

Parameters:
  • inputs – The 2D decoder input.
  • timestep – The current decoding step.
  • state – The decoder state.
  • memory – Memory values to query.
  • memory_sequence_length – Memory values length.
  • training – Run in training mode.
Returns:

A tuple with the decoder outputs, the decoder state, and the attention vector.