# opennmt.layers.noise module¶

Noise layers.

class opennmt.layers.noise.WordNoiser(noises=None, subword_token='￭', is_spacer=False)[source]

Bases: object

Applies noise to words sequences.

__init__(noises=None, subword_token='￭', is_spacer=False)[source]

Initializes the noising class.

Parameters: noises – A list of opennmt.layers.noise.Noise instances to apply sequentially. subword_token – The special token used by the subword tokenizer. This is required when the noise should be applied at the word level and not the subword level. is_spacer – Whether subword_token is used as a spacer (as in SentencePiece) or a joiner (as in BPE).
add(noise)[source]

__call__(tokens, sequence_length=None, keep_shape=True)[source]

Applies noise on tokens.

Parameters: tokens – A string tf.Tensor or batch of string tf.Tensor. sequence_length – When tokens is ND, the length of each sequence in the batch. keep_shape – Ensure that the shape is kept. Otherwise, fit the shape to the new lengths. A tuple with the noisy version of tokens and the new lengths.
class opennmt.layers.noise.Noise[source]

Bases: object

Base class for noise modules.

__call__(words)[source]

Applies noise on a sequence of words.

Parameters: words – The sequence of words as a string tf.Tensor. If it has 2 dimensions, each row represents a word that possibly contains multiple tokens. A noisy version of words. ValueError – if words has a rank greater than 2.
class opennmt.layers.noise.WordDropout(dropout)[source]

Randomly drops words in a sequence.

__init__(dropout)[source]

Initializes the noise module.

Parameters: dropout – The probability to drop word.
class opennmt.layers.noise.WordReplacement(probability, filler='<unk>')[source]

Randomly replaces words.

__init__(probability, filler='<unk>')[source]

Initializes the noise module.

Parameters: probability – The probability to replace words. filler – The replacement token.
class opennmt.layers.noise.WordPermutation(max_distance)[source]

Randomly permutes words in a sequence with a maximum distance.

__init__(max_distance)[source]

Initializes the noise module.

Parameters: max_distance – The maximum permutation distance.
opennmt.layers.noise.tokens_to_words(tokens, subword_token='￭', is_spacer=False)[source]

Converts a sequence of tokens to a sequence of words.

For example, if a BPE tokenization produces this sequence:

[“He@@”, “llo”, “W@@”, “orld”, “@@!”]

this function will return the tensor:

[[“He@@”, “llo”, “”], [“W@@”, “orld”, “@@!”]]
Parameters: tokens – A 1D string tf.Tensor. subword_token – The special token used by the subword tokenizer. is_spacer – Whether subword_token is used as a spacer (as in SentencePiece) or a joiner (as in BPE). A 2D string tf.Tensor.
opennmt.layers.noise.random_mask(shape, probability)[source]

Parameters: shape – The mask shape. probability – The probability to select an element. A boolean mask with shape shape.