create_lookup_tables

opennmt.data.create_lookup_tables(vocabulary_path, num_oov_buckets=1, as_asset=True, unk_token=None)[source]

Creates TensorFlow lookup tables from a vocabulary file.

Parameters
  • vocabulary_path – Path to the vocabulary file.

  • num_oov_buckets – Number of out-of-vocabulary buckets.

  • as_asset – If True, the vocabulary file will be added as a graph asset. Otherwise, the content of the vocabulary will be embedded in the graph.

  • unk_token – The out-of-vocabulary token. Defaults to <unk>.

Returns

A tuple containing,

  • The final vocabulary size.

  • The tf.lookup table mapping tokens to ids.

  • The tf.lookup table mapping ids to tokens.