SelfAttentionDecoderLayer

class opennmt.layers.SelfAttentionDecoderLayer(*args, **kwargs)[source]

Implements one self-attention decoding layer.

Inherits from: keras.src.engine.base_layer.Layer

__init__(num_units, num_heads, ffn_inner_dim, num_sources=1, dropout=0.1, attention_dropout=0.1, ffn_dropout=0.1, ffn_activation=<function relu>, mha_bias=True, maximum_relative_position=None, pre_norm=True, **kwargs)[source]

Initializes the layer.

Parameters
  • num_units – The number of hidden units.

  • num_heads – The number of heads in the multi-head attention.

  • ffn_inner_dim – The number of units of the inner linear transformation in the feed forward layer.

  • num_sources – The number of source contexts.

  • dropout – The probability to drop units from the outputs.

  • attention_dropout – The probability to drop units from the attention.

  • ffn_dropout – The probability to drop units from the activation output in the feed forward layer.

  • ffn_activation – The activation function to apply between the two linear transformations of the feed forward layer.

  • mha_bias – Add bias after linear layers in the multi-head attention.

  • maximum_relative_position – Maximum relative position representation (from https://arxiv.org/abs/1803.02155).

  • pre_norm – If True, layer normalization is applied before each sub-layer. Otherwise it is applied after.

  • **kwargs – Additional layer arguments.

map_v1_weights(weights)[source]
call(inputs, mask=None, memory=None, memory_mask=None, cache=None, training=None)[source]

Runs the decoder layer.