SentencePieceTokenizer

class opennmt.tokenizers.SentencePieceTokenizer(model, nbest_size=0, alpha=1.0)[source]

In-graph SentencePiece tokenizer using tensorflow_text.SentencepieceTokenizer.

Inherits from: opennmt.tokenizers.tokenizer.TensorFlowTokenizer

__init__(model, nbest_size=0, alpha=1.0)[source]

Initializes the tokenizer.

Parameters
  • model – Path to the SentencePiece model.

  • nbest_size – Number of candidates to sample from (disabled during inference).

  • alpha – Smoothing parameter for the sampling.