2.0 Transition Guide

This document details some changes introduced in OpenNMT-tf 2.0 and actions required by the user.

New requirements

Python 3

Python 3.5 (or above) is now required by OpenNMT-tf. See python3statement.org for more context about this decision.

TensorFlow 2.0

OpenNMT-tf has been completely redesigned for TensorFlow 2.0 which is now the minimal required TensorFlow version.

The correct TensorFlow version is declared as a dependency of OpenNMT-tf and will be automatically installed as part of the pip installation:

pip install --upgrade pip
pip install OpenNMT-tf

Changed checkpoint layout

TensorFlow 2.0 introduced a new way to save checkpoints: variables are no longer matched by their name but by where they are stored relative to the root model object. Consequently, OpenNMT-tf V1 checkpoints are no longer compatible without conversion.

To smooth this transition, V1 checkpoints of the following models are automatically upgraded on load:

  • NMTBigV1

  • NMTMediumV1

  • NMTSmallV1

  • Transformer

  • TransformerBig

Improved main script command line

The command line parser has been improved to better manage task specifc options that are now located after the run type:

onmt-main <general options> train <train options>

Also, some options have changed:

  • the run type train_and_eval has been replaced by train --with_eval

  • the main script now includes the average_checkpoints and update_vocab tasks

  • distributed training options are currently missing in 2.0

  • --session_config has been removed as no longer applicable in TensorFlow 2.0

See onmt-main -h for more details.

Changed vocabulary configuration

In OpenNMT-tf V1, models were responsible to declare the name of the vocabulary to look for, e.g.:

# V1 model inputter.
source_inputter=onmt.inputters.WordEmbedder(
    vocabulary_file_key="source_words_vocabulary",
    embedding_size=512)

meant that the user should configure the vocabulary like this:

# V1 vocabulary configuration.
data:
  source_words_vocabulary: src_vocab.txt

This is no longer the case in V2 where vocabulary configuration now follows a general pattern, the same that is currently used for “embedding” and “tokenization” configurations.

  • Single vocabulary (e.g. language model):

data:
  vocabulary: vocab.txt
  • Source and target vocabularies (e.g. sequence to sequence, tagging, etc.):

data:
  source_vocabulary: src_vocab.txt
  target_vocabulary: tgt_vocab.txt
  • Multi-source and target vocabularies:

data:
  source_1_vocabulary: src_1_vocab.txt
  source_2_vocabulary: src_2_vocab.txt
  • Nested inputs:

data:
  source_1_1_vocabulary: src_1_1_vocab.txt
  source_1_2_vocabulary: src_1_2_vocab.txt
  source_2_vocabulary: src_2_vocab.txt

Changed predefined models

Predefined models do not require a model definition file and can be directly set to the --model_type command line argument. Some of them have been changed for clarity:

V1

V2

Comment

NMTBig

NMTBigV1

NMTMedium

NMTMediumV1

NMTSmall

NMTSmallV1

SeqTagger

LstmCnnCrfTagger

TransformerAAN

Not considered useful compared to the standard Transformer

TransformerBigFP16

Use TransformerBig with --mixed_precision flag on the command line

TransformerFP16

Use Transformer with --mixed_precision flag on the command line

Changed parameters

Some parameters in the YAML configuration have been renamed or removed:

V1

V2

Comment

*/bucket_width

*/length_bucket_width

*/num_threads

Automatic value

*/prefetch_buffer_size

Automatic value

eval/eval_delay

eval/steps

Use steps instead of seconds to set the evaluation frequency

eval/exporters

Not implemented

params/clip_gradients

Set clipnorm or clipvalue in params/optimizer_params/

params/freeze_variables

params/freeze_layers

Use layer names instead of variable regexps

params/gradients_accum

Use train/effective_batch_size instead

params/horovod

Not implemented

params/loss_scale

Dynamic loss scaling by default

params/maximum_iterations

params/maximum_decoding_length

params/maximum_learning_rate

Not implemented

params/param_init

Not implemented

params/weight_decay

params/optimizer_params/weight_decay

train/save_checkpoints_secs

Not implemented

train/train_steps

train/max_step

Parameters taking reference to Python classes should also be revised when upgrading to V2, as the class likely changed in the process. This concerns:

  • params/optimizer and params/optimizer_params

  • params/decay_type and params/decay_params

New mixed precision workflow

OpenNMT-tf 2.0 is using the newly introduced graph rewriter to automatically convert parts of the graph from float32 to float16.

Variables are casted on the fly and checkpoints no longer need to be converted for inference or to continue training in float32. This means mixed precision is no longer a property of the model but should be enabled on the command line instead, e.g.:

onmt-main --model_type Transformer --config data.yml --auto_config --mixed_precision train