tokens_to_words

opennmt.data.tokens_to_words(tokens, subword_token='■', is_spacer=None)[source]

Converts a sequence of tokens to a sequence of words.

Example

>>> opennmt.data.tokens_to_words(["He@@", "llo", "W@@", "orld", "@@!"], subword_token="@@")
<tf.RaggedTensor [[b'He@@', b'llo'], [b'W@@', b'orld', b'@@!']]>
Parameters
  • tokens – A 1D string tf.Tensor.

  • subword_token – The special token used by the subword tokenizer.

  • is_spacer – Whether subword_token is used as a spacer (as in SentencePiece) or a joiner (as in BPE). If None, will infer directly from subword_token.

Returns

The words as a 2D string tf.RaggedTensor.