MixedInputter

class opennmt.inputters.MixedInputter(*args, **kwargs)[source]

An multi inputter that applies several transformation on the same data (e.g. combine word-level and character-level embeddings).

Inherits from: opennmt.inputters.MultiInputter

__init__(inputters, reducer=<opennmt.layers.reducer.ConcatReducer object>, dropout=0.0)[source]

Initializes a mixed inputter.

Parameters

inputters – A list of opennmt.inputters.Inputter.
reducer – A opennmt.layers.Reducer to merge all inputs.
dropout – The probability to drop units in the merged inputs.

make_dataset(data_file, training=None)[source]

Creates the base dataset required by this inputter.

Parameters

data_file – The data file.
training – Run in training mode.

Returns

A tf.data.Dataset instance or a list of tf.data.Dataset instances.

get_dataset_size(data_file)[source]

Returns the dataset size.

If the inputter can efficiently compute the dataset size from a training file on disk, it can optionally override this method. Otherwise, we may compute the size later with a generic and slower approach (iterating over the dataset instance).

Parameters: data_file – The data file.
Returns: The dataset size or None.

input_signature()[source]: Returns the input signature of this inputter.

get_length(features, ignore_special_tokens=False)[source]

Returns the length of the input features, if defined.

Parameters

features – The dictionary of input features.
ignore_special_tokens – Ignore special tokens that were added by the inputter (e.g. <s> and/or </s>).

Returns

The length.

make_features(element=None, features=None, training=None)[source]

Creates features from data.

This is typically called in a data pipeline (such as Dataset.map). Common transformation includes tokenization, parsing, vocabulary lookup, etc.

This method accepts both a single element from the dataset or a partially built dictionary of features.

Parameters

element – An element from the dataset returned by opennmt.inputters.Inputter.make_dataset().
features – An optional and possibly partial dictionary of features to augment.
training – Run in training mode.

Returns

A dictionary of tf.Tensor.

build(input_shape)[source]

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(features, training=None)[source]

Creates the model input from the features (e.g. word embeddings).

Parameters

features – A dictionary of tf.Tensor, the output of opennmt.inputters.Inputter.make_features().
training – Run in training mode.

Returns

The model input.