TextInputter
- class opennmt.inputters.TextInputter(*args, **kwargs)[source]
An abstract inputter that processes text.
Inherits from:
opennmt.inputters.Inputter
Extended by:
- initialize(data_config)[source]
Initializes the inputter.
- Parameters
data_config – A dictionary containing the data configuration set by the user.
- set_noise(noiser, in_place=True, probability=None)[source]
Enables noise to be applied to the input features.
- Parameters
noiser – A
opennmt.data.WordNoiser
instance.in_place – If
False
, the noisy version of the input will be stored as a separate feature prefixed withnoisy_
.probability – When
in_place
is enabled, the probability to apply the noise.
- Raises
ValueError – if
in_place
is enabled but aprobability
is not set.
- export_assets(asset_dir)[source]
Exports assets used by this tokenizer.
- Parameters
asset_dir – The directory where assets can be written.
- Returns
A dictionary containing additional assets used by the inputter.
- make_dataset(data_file, training=None)[source]
Creates the base dataset required by this inputter.
- Parameters
data_file – The data file.
training – Run in training mode.
- Returns
A
tf.data.Dataset
instance or a list oftf.data.Dataset
instances.
- get_dataset_size(data_file)[source]
Returns the dataset size.
If the inputter can efficiently compute the dataset size from a training file on disk, it can optionally override this method. Otherwise, we may compute the size later with a generic and slower approach (iterating over the dataset instance).
- Parameters
data_file – The data file.
- Returns
The dataset size or
None
.
- has_prepare_step()[source]
Returns
True
if this inputter implements a data preparation step in methodopennmt.inputters.Inputter.prepare_elements()
.
- prepare_elements(elements, training=None)[source]
Prepares dataset elements.
This method is called on a batch of dataset elements. For example, it can be overriden to apply an external pre-tokenization.
Note that the results of the method are unbatched and then passed to method
opennmt.inputters.Inputter.make_features()
.- Parameters
elements – A batch of dataset elements.
training – Run in training mode.
- Returns
A (possibly nested) structure of
tf.Tensor
.