WordNoiser
- class opennmt.data.WordNoiser(noises=None, subword_token='■', is_spacer=None)[source]
 Applies noise to words sequences.
Inherits from:
builtins.object- __init__(noises=None, subword_token='■', is_spacer=None)[source]
 Initializes the noising class.
- Parameters
 noises – A list of
opennmt.data.Noiseinstances to apply sequentially.subword_token – The special token used by the subword tokenizer. This is required when the noise should be applied at the word level and not the subword level.
is_spacer – Whether
subword_tokenis used as a spacer (as in SentencePiece) or a joiner (as in BPE). IfNone, will infer directly fromsubword_token.
See also
- __call__(tokens, sequence_length=None, keep_shape=False, probability=None)[source]
 Applies noise on
tokens.- Parameters
 tokens – A string
tf.Tensor, a batch of stringtf.Tensor, or a stringtf.RaggedTensor.sequence_length – When
tokensis a dense tensor, the length of each sequence in the batch.keep_shape – Ensure that the original dense shape is kept. Otherwise, fit the shape to the new lengths.
probability – Probability to apply noise on each example.
- Returns
 If
tokensis atf.RaggedTensor, the method returns the noisy tokens as atf.RaggedTensor, otherwise it returns a tuple with the noisy tokens as atf.Tensorand the new lengths.- Raises
 ValueError – if
tokensis a batch of string butsequence_lengthis not passed.ValueError – if
keep_shapeisTruebuttokensis atf.RaggedTensor.