ParallelInputter
- class opennmt.inputters.ParallelInputter(*args, **kwargs)[source]
A multi inputter that processes parallel data.
Inherits from:
opennmt.inputters.MultiInputter
Extended by:
- __init__(inputters, reducer=None, share_parameters=False, combine_features=True)[source]
Initializes a parallel inputter.
- Parameters
inputters – A list of
opennmt.inputters.Inputter
.reducer – A
opennmt.layers.Reducer
to merge all inputs. If set, parallel inputs are assumed to have the same length.share_parameters – Share the inputters parameters.
combine_features – Combine each inputter features in a single dict or return them separately. This is typically
True
for multi source inputs butFalse
for features/labels parallel data.
- make_dataset(data_file, training=None)[source]
Creates the base dataset required by this inputter.
- Parameters
data_file – The data file.
training – Run in training mode.
- Returns
A
tf.data.Dataset
instance or a list oftf.data.Dataset
instances.
- get_dataset_size(data_file)[source]
Returns the dataset size.
If the inputter can efficiently compute the dataset size from a training file on disk, it can optionally override this method. Otherwise, we may compute the size later with a generic and slower approach (iterating over the dataset instance).
- Parameters
data_file – The data file.
- Returns
The dataset size or
None
.
- get_length(features, ignore_special_tokens=False)[source]
Returns the length of the input features, if defined.
- Parameters
features – The dictionary of input features.
ignore_special_tokens – Ignore special tokens that were added by the inputter (e.g. <s> and/or </s>).
- Returns
The length.
- get_padded_shapes(element_spec, maximum_length=None)[source]
Returns the padded shapes for dataset elements.
For example, this is used during batch size autotuning to pad all batches to the maximum sequence length.
- Parameters
element_spec – A nested structure of
tf.TensorSpec
.maximum_length – Pad batches to this maximum length.
- Returns
A nested structure of
tf.TensorShape
.
- make_features(element=None, features=None, training=None)[source]
Creates features from data.
This is typically called in a data pipeline (such as
Dataset.map
). Common transformation includes tokenization, parsing, vocabulary lookup, etc.This method accepts both a single
element
from the dataset or a partially built dictionary offeatures
.- Parameters
element – An element from the dataset returned by
opennmt.inputters.Inputter.make_dataset()
.features – An optional and possibly partial dictionary of features to augment.
training – Run in training mode.
- Returns
A dictionary of
tf.Tensor
.
- keep_for_training(features, maximum_length=None)[source]
Returns
True
if this example should be kept for training.- Parameters
features – A dictionary of
tf.Tensor
.maximum_length – The maximum length used for training.
- Returns
A boolean.
- build(input_shape)[source]
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().
This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).
- Parameters
input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).
- call(features, training=None)[source]
Creates the model input from the features (e.g. word embeddings).
- Parameters
features – A dictionary of
tf.Tensor
, the output ofopennmt.inputters.Inputter.make_features()
.training – Run in training mode.
- Returns
The model input.