Model

OpenNMT-tf can be used to train several types of models thanks to a modular and extensible design. Here is a non exhaustive overview of supported models:

Maching translation

Speech recognition

Language modeling

Sequence tagging

and most ideas and modules coming from these papers can be reused for other models or tasks.

Catalog

OpenNMT-tf comes with a set of standard models that are defined in the catalog. These models can be directly selected with the --model_type command line option, e.g.:

onmt-main --model_type Transformer [...]

You can also get the list of predefined models by running onmt-main -h.

Custom models

If you don’t find the model you are looking for in the catalog, OpenNMT-tf can load custom model definitions from external Python files. They should include a callable model that returns a opennmt.models.Model instance. For example, the model definition below extends the Transformer model to enable embeddings sharing:

import opennmt

class MyCustomTransformer(opennmt.models.Transformer):
    def __init__(self):
        super().__init__(
            source_inputter=opennmt.inputters.WordEmbedder(embedding_size=512),
            target_inputter=opennmt.inputters.WordEmbedder(embedding_size=512),
            num_layers=6,
            num_units=512,
            num_heads=8,
            ffn_inner_dim=2048,
            dropout=0.1,
            attention_dropout=0.1,
            ffn_dropout=0.1,
            share_embeddings=opennmt.models.EmbeddingsSharingLevel.ALL,
        )

    # Here you can override any method from the Model class for a customized behavior.

model = MyCustomTransformer

The custom model file should then be selected with the --model command line option, e.g.:

onmt-main --model config/models/custom_model.py [...]

This approach offers a high level of modeling freedom without changing the core implementation. Additionally, some public modules are defined to contain other modules and can be used to design complex architectures:

For example, these container modules can be used to implement multi source inputs, multi modal training, mixed word/character embeddings, and arbitrarily complex encoder architectures (e.g. mixing convolution, RNN, self-attention, etc.).

Some examples are available in the directory config/models of the Git repository.