Serving

Exporting SavedModel

OpenNMT-tf periodically exports models for inference in other environments, for example with TensorFlow Serving. A model export contains all information required for inference: the graph definition, the weights, and external assets such as vocabulary files. It typically looks like this on disk:

toy-ende/export/latest/1507109306/
├── assets
│   ├── src-vocab.txt
│   └── tgt-vocab.txt
├── saved_model.pb
└── variables
    ├── variables.data-00000-of-00001
    └── variables.index

Automatic export

In the train_and_eval run type, models can be automatically exported following one or several export schedules:

  • last: a model is exported to export/latest after each evaluation (default);
  • final: a model is exported to export/final at the end of the training;
  • best: a model is exported to export/best only if it achieves the best evaluation loss so far.

Export schedules are set by the exporters field in the eval section of the configuration file, e.g.:

eval:
  exporters: best

Manual export

Additionally, models can be manually exported using the export run type. Manually exported models are located by default in export/manual/ within the model directory; a custom destination can be configured with the command line option --export_dir_base, e.g.:

onmt-main export --export_dir_base ~/my-models/ende --auto-config --config my_config.yml

Running SavedModel

When using an exported model, you need to know the input and output nodes of your model. You can use the saved_model_cli script provided by TensorFlow for inspection, e.g.:

saved_model_cli show --dir toy-ende/export/latest/1507109306/ \
    --tag_set serve --signature_def serving_default

Some examples using exported models are available in the examples/ directory:

  • examples/serving to serve a model with TensorFlow Serving
  • examples/cpp to run inference with the TensorFlow C++ API

Note: because the Python function used in tf.py_func is not serialized in the graph, model exports do not support in-graph tokenization and text inputs are expected to be tokenized.