opennmt.optimizers.multistep_adam module

Optimizer variants which make it possible to use very large batch sizes with limited GPU memory. Optimizers in this module accumulate the gradients for n batches, and call the optimizer’s update rule every n batches with the accumulated gradients. See [Saunders et al., 2018]( for details.

class opennmt.optimizers.multistep_adam.MultistepAdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False, name='Adam', n=1)[source]


Adam with SGD updates every n steps with accumulated gradients.