InvSqrtDecay

class opennmt.schedules.InvSqrtDecay(learning_rate, warmup_steps, initial_learning_rate=0)[source]

Decay based on the reciprocal of the step square root. This corresponds to inverse_sqrt in Fairseq and --lr-decay-inv-sqrt in Marian.

During warmup (linear increase of the learning rate):

\[\text{schedule}(\text{step}) = \text{init_lr} + (\text{lr} - \text{init_lr}) \times \frac{\text{step}}{\text{warmup_steps}}\]

After warmup:

\[\text{schedule}(\text{step}) = \text{lr} \times \sqrt{\frac{\text{warmup_steps}}{\text{step}}}\]

Inherits from: keras.src.optimizers.schedules.learning_rate_schedule.LearningRateSchedule

__init__(learning_rate, warmup_steps, initial_learning_rate=0)[source]

Initializes the decay function.

Parameters
  • learning_rate – The base learning rate.

  • warmup_steps – The number of warmup steps.

  • initial_learning_rate – Initial learning rate during warmup.