SelfAttentionDecoder

class opennmt.decoders.SelfAttentionDecoder(*args, **kwargs)[source]

Encoder using self-attention as described in https://arxiv.org/abs/1706.03762.

Inherits from: opennmt.decoders.Decoder

__init__(num_layers, num_units=512, num_heads=8, ffn_inner_dim=2048, dropout=0.1, attention_dropout=0.1, ffn_dropout=0.1, ffn_activation=<function relu>, mha_bias=True, position_encoder_class=<class 'opennmt.layers.position.SinusoidalPositionEncoder'>, num_sources=1, maximum_relative_position=None, attention_reduction=MultiHeadAttentionReduction.FIRST_HEAD_LAST_LAYER, pre_norm=True, **kwargs)[source]

Initializes the parameters of the decoder.

Parameters
  • num_layers – The number of layers.

  • num_units – The number of hidden units.

  • num_heads – The number of heads in the multi-head attention.

  • ffn_inner_dim – The number of units of the inner linear transformation in the feed forward layer.

  • dropout – The probability to drop units from the outputs.

  • attention_dropout – The probability to drop units from the attention.

  • ffn_dropout – The probability to drop units from the activation output in the feed forward layer.

  • ffn_activation – The activation function to apply between the two linear transformations of the feed forward layer.

  • mha_bias – Add bias after linear layers in the multi-head attention.

  • position_encoder_class – The opennmt.layers.PositionEncoder class to use for position encoding (or a callable that returns an instance).

  • num_sources – The number of source contexts expected by this decoder.

  • maximum_relative_position – Maximum relative position representation (from https://arxiv.org/abs/1803.02155).

  • attention_reduction – A opennmt.layers.MultiHeadAttentionReduction value to specify how to reduce multi-head attention matrices.

  • pre_norm – If True, layer normalization is applied before each sub-layer. Otherwise it is applied after.

  • **kwargs – Additional layer arguments.

property minimum_sources

The minimum number of source contexts supported by this decoder.

property maximum_sources

The maximum number of source contexts supported by this decoder.

property support_alignment_history

Returns True if this decoder can return the attention as alignment history.

map_v1_weights(weights)[source]
forward(inputs, sequence_length=None, initial_state=None, memory=None, memory_sequence_length=None, input_fn=None, sampling_probability=None, training=None)[source]

Runs the decoder on full sequences.

Parameters
  • inputs – The 3D decoder input.

  • sequence_length – The length of each input sequence.

  • initial_state – The initial decoder state.

  • memory – Memory values to query.

  • memory_sequence_length – Memory values length.

  • input_fn – A callable taking sampled ids and returning the decoding inputs.

  • sampling_probability – The probability to read from the last sample instead of the true target.

  • training – Run in training mode.

Returns

A tuple with the logits, the decoder state, and the attention vector.

step(inputs, timestep, state=None, memory=None, memory_sequence_length=None, training=None)[source]

Runs one decoding step.

Parameters
  • inputs – The 2D decoder input.

  • timestep – The current decoding step.

  • state – The decoder state.

  • memory – Memory values to query.

  • memory_sequence_length – Memory values length.

  • training – Run in training mode.

Returns

A tuple with the decoder outputs, the decoder state, and the attention vector.