ExampleInputter

class opennmt.inputters.ExampleInputter(*args, **kwargs)[source]

An inputter that returns training examples (parallel features and labels).

Inherits from: opennmt.inputters.ParallelInputter

__init__(features_inputter, labels_inputter, share_parameters=False, accepted_annotations=None)[source]

Initializes this inputter.

Parameters

features_inputter – An inputter producing the features (source).
labels_inputter – An inputter producing the labels (target).
share_parameters – Share the inputters parameters.
accepted_annotations – An optional dictionary mapping annotation names in the data configuration (e.g. “train_alignments”) to a callable with signature (features, labels, annotations) -> (features, labels).

initialize(data_config)[source]

Initializes the inputter.

Parameters: data_config – A dictionary containing the data configuration set by the user.

make_dataset(data_file, training=None)[source]

Creates the base dataset required by this inputter.

Parameters

data_file – The data file.
training – Run in training mode.

Returns

A tf.data.Dataset instance or a list of tf.data.Dataset instances.

get_dataset_size(data_file)[source]

Returns the dataset size.

If the inputter can efficiently compute the dataset size from a training file on disk, it can optionally override this method. Otherwise, we may compute the size later with a generic and slower approach (iterating over the dataset instance).

Parameters: data_file – The data file.
Returns: The dataset size or None.

make_features(element=None, features=None, training=None)[source]

Creates features from data.

This is typically called in a data pipeline (such as Dataset.map). Common transformation includes tokenization, parsing, vocabulary lookup, etc.

This method accepts both a single element from the dataset or a partially built dictionary of features.

Parameters

element – An element from the dataset returned by opennmt.inputters.Inputter.make_dataset().
features – An optional and possibly partial dictionary of features to augment.
training – Run in training mode.

Returns

A dictionary of tf.Tensor.

make_inference_dataset(features_file, batch_size, batch_type='examples', length_bucket_width=None, num_threads=1, prefetch_buffer_size=None)[source]

Builds a dataset to be used for inference.

For evaluation and training datasets, see opennmt.inputters.ExampleInputter.

Parameters

features_file – The test file.
batch_size – The batch size to use.
batch_type – The batching strategy to use: can be “examples” or “tokens”.
length_bucket_width – The width of the length buckets to select batch candidates from (for efficiency). Set None to not constrain batch formation.
num_threads – The number of elements processed in parallel.
prefetch_buffer_size – The number of batches to prefetch asynchronously. If None, use an automatically tuned value.

Returns

A tf.data.Dataset.