OpenNMTTokenizer

class opennmt.tokenizers.OpenNMTTokenizer(**kwargs)[source]

Tokenizer based on the OpenNMT Tokenizer: https://github.com/OpenNMT/Tokenizer.

Inherits from: opennmt.tokenizers.Tokenizer

__init__(**kwargs)[source]

Initializes the tokenizer.

Parameters

**kwargs – Tokenization options, see https://github.com/OpenNMT/Tokenizer/blob/master/docs/options.md.

property config

The tokenization configuration.

property opennmt_tokenizer

The pyonmttok.Tokenizer instance.

export_assets(asset_dir, asset_prefix='')[source]

Exports assets for this tokenizer.

Parameters
  • asset_dir – The directory where assets can be written.

  • asset_prefix – The prefix to attach to assets filename.

Returns

A dictionary containing additional assets used by the tokenizer.