opennmt.layers.noise module

Noise layers.

class opennmt.layers.noise.WordNoiser(noises=None, subword_token='■', is_spacer=False)[source]

Bases: object

Applies noise to words sequences.

__init__(noises=None, subword_token='■', is_spacer=False)[source]

Initializes the noising class.

Parameters:
  • noises – A list of opennmt.layers.noise.Noise instances to apply sequentially.
  • subword_token – The special token used by the subword tokenizer. This is required when the noise should be applied at the word level and not the subword level.
  • is_spacer – Whether subword_token is used as a spacer (as in SentencePiece) or a joiner (as in BPE).
add(noise)[source]

Adds a noise to apply.

__call__(tokens, sequence_length=None, keep_shape=True)[source]

Applies noise on tokens.

Parameters:
  • tokens – A string tf.Tensor or batch of string tf.Tensor.
  • sequence_length – When tokens is ND, the length of each sequence in the batch.
  • keep_shape – Ensure that the shape is kept. Otherwise, fit the shape to the new lengths.
Returns:

A tuple with the noisy version of tokens and the new lengths.

class opennmt.layers.noise.Noise[source]

Bases: object

Base class for noise modules.

__call__(words)[source]

Applies noise on a sequence of words.

Parameters:words – The sequence of words as a string tf.Tensor. If it has 2 dimensions, each row represents a word that possibly contains multiple tokens.
Returns:A noisy version of words.
Raises:ValueError – if words has a rank greater than 2.
class opennmt.layers.noise.WordDropout(dropout)[source]

Bases: opennmt.layers.noise.Noise

Randomly drops words in a sequence.

__init__(dropout)[source]

Initializes the noise module.

Parameters:dropout – The probability to drop word.
class opennmt.layers.noise.WordReplacement(probability, filler='<unk>')[source]

Bases: opennmt.layers.noise.Noise

Randomly replaces words.

__init__(probability, filler='<unk>')[source]

Initializes the noise module.

Parameters:
  • probability – The probability to replace words.
  • filler – The replacement token.
class opennmt.layers.noise.WordPermutation(max_distance)[source]

Bases: opennmt.layers.noise.Noise

Randomly permutes words in a sequence with a maximum distance.

__init__(max_distance)[source]

Initializes the noise module.

Parameters:max_distance – The maximum permutation distance.
opennmt.layers.noise.tokens_to_words(tokens, subword_token='■', is_spacer=False)[source]

Converts a sequence of tokens to a sequence of words.

For example, if a BPE tokenization produces this sequence:

[“He@@”, “llo”, “W@@”, “orld”, “@@!”]

this function will return the tensor:

[[“He@@”, “llo”, “”], [“W@@”, “orld”, “@@!”]]
Parameters:
  • tokens – A 1D string tf.Tensor.
  • subword_token – The special token used by the subword tokenizer.
  • is_spacer – Whether subword_token is used as a spacer (as in SentencePiece) or a joiner (as in BPE).
Returns:

A 2D string tf.Tensor.

opennmt.layers.noise.random_mask(shape, probability)[source]

Generates a random boolean mask.

Parameters:
  • shape – The mask shape.
  • probability – The probability to select an element.
Returns:

A boolean mask with shape shape.