TextInputter

class opennmt.inputters.TextInputter(*args, **kwargs)[source]

An abstract inputter that processes text.

Inherits from: opennmt.inputters.Inputter

Extended by:

initialize(data_config)[source]

Initializes the inputter.

Parameters

data_config – A dictionary containing the data configuration set by the user.

set_noise(noiser, in_place=True, probability=None)[source]

Enables noise to be applied to the input features.

Parameters
  • noiser – A opennmt.data.WordNoiser instance.

  • in_place – If False, the noisy version of the input will be stored as a separate feature prefixed with noisy_.

  • probability – When in_place is enabled, the probability to apply the noise.

Raises

ValueError – if in_place is enabled but a probability is not set.

export_assets(asset_dir)[source]

Exports assets used by this tokenizer.

Parameters

asset_dir – The directory where assets can be written.

Returns

A dictionary containing additional assets used by the inputter.

make_dataset(data_file, training=None)[source]

Creates the base dataset required by this inputter.

Parameters
  • data_file – The data file.

  • training – Run in training mode.

Returns

A tf.data.Dataset instance or a list of tf.data.Dataset instances.

get_dataset_size(data_file)[source]

Returns the dataset size.

If the inputter can efficiently compute the dataset size from a training file on disk, it can optionally override this method. Otherwise, we may compute the size later with a generic and slower approach (iterating over the dataset instance).

Parameters

data_file – The data file.

Returns

The dataset size or None.

has_prepare_step()[source]

Returns True if this inputter implements a data preparation step in method opennmt.inputters.Inputter.prepare_elements().

prepare_elements(elements, training=None)[source]

Prepares dataset elements.

This method is called on a batch of dataset elements. For example, it can be overriden to apply an external pre-tokenization.

Note that the results of the method are unbatched and then passed to method opennmt.inputters.Inputter.make_features().

Parameters
  • elements – A batch of dataset elements.

  • training – Run in training mode.

Returns

A (possibly nested) structure of tf.Tensor.

make_features(element=None, features=None, training=None)[source]

Tokenizes raw text.

input_signature()[source]

Returns the input signature of this inputter.