MultiInputter

class opennmt.inputters.MultiInputter(*args, **kwargs)[source]

An inputter that gathers multiple inputters, possibly nested.

Inherits from: opennmt.inputters.Inputter

Extended by:

property asset_prefix

The asset prefix is used to differentiate resources of parallel inputters. The most basic examples are the “source_” and “target_” prefixes.

  • When reading the data configuration, the inputter will read fields that start with this prefix (e.g. “source_vocabulary”).

  • Assets exported by this inputter start with this prefix.

property num_outputs

The number of parallel outputs produced by this inputter.

get_leaf_inputters()[source]

Returns a list of all leaf Inputter instances.

initialize(data_config)[source]

Initializes the inputter.

Parameters

data_config – A dictionary containing the data configuration set by the user.

export_assets(asset_dir)[source]

Exports assets used by this tokenizer.

Parameters

asset_dir – The directory where assets can be written.

Returns

A dictionary containing additional assets used by the inputter.

has_prepare_step()[source]

Returns True if this inputter implements a data preparation step in method opennmt.inputters.Inputter.prepare_elements().

prepare_elements(elements, training=None)[source]

Prepares dataset elements.

This method is called on a batch of dataset elements. For example, it can be overriden to apply an external pre-tokenization.

Note that the results of the method are unbatched and then passed to method opennmt.inputters.Inputter.make_features().

Parameters
  • elements – A batch of dataset elements.

  • training – Run in training mode.

Returns

A (possibly nested) structure of tf.Tensor.

visualize(model_root, log_dir)[source]

Visualizes the transformation, usually embeddings.

Parameters
  • model_root – The root model object.

  • log_dir – The active log directory.