WordNoiser
- class opennmt.data.WordNoiser(noises=None, subword_token='■', is_spacer=None)[source]
Applies noise to words sequences.
Inherits from:
builtins.object
- __init__(noises=None, subword_token='■', is_spacer=None)[source]
Initializes the noising class.
- Parameters
noises – A list of
opennmt.data.Noise
instances to apply sequentially.subword_token – The special token used by the subword tokenizer. This is required when the noise should be applied at the word level and not the subword level.
is_spacer – Whether
subword_token
is used as a spacer (as in SentencePiece) or a joiner (as in BPE). IfNone
, will infer directly fromsubword_token
.
See also
- __call__(tokens, sequence_length=None, keep_shape=False, probability=None)[source]
Applies noise on
tokens
.- Parameters
tokens – A string
tf.Tensor
, a batch of stringtf.Tensor
, or a stringtf.RaggedTensor
.sequence_length – When
tokens
is a dense tensor, the length of each sequence in the batch.keep_shape – Ensure that the original dense shape is kept. Otherwise, fit the shape to the new lengths.
probability – Probability to apply noise on each example.
- Returns
If
tokens
is atf.RaggedTensor
, the method returns the noisy tokens as atf.RaggedTensor
, otherwise it returns a tuple with the noisy tokens as atf.Tensor
and the new lengths.- Raises
ValueError – if
tokens
is a batch of string butsequence_length
is not passed.ValueError – if
keep_shape
isTrue
buttokens
is atf.RaggedTensor
.