tokens_to_words
- opennmt.data.tokens_to_words(tokens, subword_token='■', is_spacer=None)[source]
Converts a sequence of tokens to a sequence of words.
Example
>>> opennmt.data.tokens_to_words(["He@@", "llo", "W@@", "orld", "@@!"], subword_token="@@") <tf.RaggedTensor [[b'He@@', b'llo'], [b'W@@', b'orld', b'@@!']]>
- Parameters
tokens – A 1D string
tf.Tensor
.subword_token – The special token used by the subword tokenizer.
is_spacer – Whether
subword_token
is used as a spacer (as in SentencePiece) or a joiner (as in BPE). IfNone
, will infer directly fromsubword_token
.
- Returns
The words as a 2D string
tf.RaggedTensor
.