opennmt.utils.decay module

Define learning rate decay functions.

opennmt.utils.decay.noam_decay(learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None)[source]

Defines the decay function described in https://arxiv.org/abs/1706.03762.

The semantic of the arguments are changed accordingly.

Parameters:
  • learning_rate – The scale constant.
  • global_step – The current learning step.
  • decay_steps – The warmup steps.
  • decay_rate – The model dimension.
  • staircase – Ignored.
  • name – Ignored.
Returns:

The learning rate for the step global_step.

opennmt.utils.decay.noam_decay_v2(scale, step, model_dim, warmup_steps)[source]

Defines the decay function described in https://arxiv.org/abs/1706.03762.

Parameters:
  • scale – The scale constant.
  • step – The current step.
  • model_dim – The model dimension.
  • warmup_steps – The number of warmup steps.
Returns:

The learning rate for the step global_step.

opennmt.utils.decay.rsqrt_decay(learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None)[source]

Decay based on the reciprocal of the step square root.

The semantic of the arguments are changed accordingly.

Parameters:
  • learning_rate – The scale constant.
  • global_step – The current learning step.
  • decay_steps – The warmup steps.
  • decay_rate – Ignored.
  • staircase – Ignored.
  • name – Ignored.
Returns:

The learning rate for the step global_step.

opennmt.utils.decay.rsqrt_decay_v2(scale, step, warmup_steps)[source]

Decay based on the reciprocal of the step square root.

Parameters:
  • scale – The scale constant.
  • step – The current step.
  • warmup_steps – The number of warmup steps.
Returns:

The learning rate for the step global_step.

opennmt.utils.decay.cosine_annealing(scale, step, max_step=1000000, warmup_steps=None)[source]

Decay using a cosine annealing schedule.

Parameters:
  • scale – The initial learning rate.
  • step – The current step.
  • max_step – The last step of the scedule.
  • warmup_steps – The number of steps to increment the learning rate linearly from 0 to scale before annealing.
Returns:

The learning rate for the step step.

opennmt.utils.decay.rnmtplus_decay(scale, step, num_replicas, warmup_steps=500, start_step=600000, end_step=1200000)[source]

Defines the decay function described in https://arxiv.org/abs/1804.09849.

Parameters:
  • scale – The scale constant.
  • step – The current step.
  • num_replicas – The number of concurrent model replicas.
  • warmup_steps – The number of warmup steps.
  • start_step – The start step of the exponential decay.
  • end_step – The end step of the exponential decay.
Returns:

The learning rate for the step step.