In addition to standard dimension settings like the number of layers, the hidden dimension size, etc., OpenNMT also provides various model architecture.
The default encoder is a simple recurrent neural network (LSTM or GRU).
The bidirectional encoder (
-encoder_type brnn) consists of two independent encoders: one encoding the normal sequence and the other the reversed sequence. The output and final states are concatenated or summed depending on the
Pyramidal deep bidirectional encoder¶
The pyramidal deep bidirectional encoder (
-encoder_type pdbrnn) is an alternative bidirectional encoder that reduces the time dimension after each layer based on the
-pdbrnn_reduction factor and using
-pdbrnn_merge as the reduction action (sum or concatenation).
Deep bidirectional encoder¶
The deep bidirectional encoder (
-encoder_type dbrnn) is an alternative bidirectional encoder where the outputs of every layers are summed (or concatenated) prior feeding to the next layer. It is a special case of a pyramidal deep bidirectional encoder without time reduction (i.e.
-pdbrnn_reduction = 1).
Google's NMT encoder¶
The Google encoder (
-encoder_type gnmt) is an encoder with a single bidirectional layer as described in Wu et al. (2016). The bidirectional states are concatenated and residual connections are enabled by default.
The convolutional encoder (
-encoder_type cnn) is an encoder based on several convolutional layers as described in Gehring et al. (2017).
In sequence-to-sequence models, it should be used either without a bridge or with a dense bridge (options
-bridge dense_nonlinear, or
-bridge none). The default
copy bridge is not compatible with this encoder.
It is also recommended to set a small learning rate when using SGD (e.g.
-learning_rate 0.1) or use Adam instead (e.g.
-optim adam -learning_rate 0.0002).
The default decoder applies attention over the source sequence and implements input feeding by default.
Input feeding is an approach to feed attentional vectors "as inputs to the next time steps to inform the model about past alignment decisions" (Luong et al. (2015)). This can be disabled by setting
With residual connections the input of a layer is element-wise added to the output before feeding to the next layer. This approach proved to be useful for the gradient flow with deep RNN stacks (more than 4 layers).
The following components support residual connections with the
- default encoder
- bidirectional encoder
- default decoder
A bridge is an additional layer between the encoder and the decoder that defines how to pass the encoder states to the decoder. It can be one of the following:
-bridge copy(default): the encoder states are copied
-bridge dense: the encoder states are forwaded through a dense layer
-bridge dense_nonlinear: the encoder states are forwaded through a dense layer followed by a non-linearity, here
-bridge none: the encoder states are not passed and the decoder initial states are set to zero
copy bridge, encoder and decoder should have the same structure (number of layers, final hidden size, etc.).
Different models are available from Luong (2015) "Global Attention Model".
and the score function is one of these:
The model is selected using
-global_attention option or can be disabled with
-attention none option. The default attention model is