tools/build_vocab.lua

build_vocab.lua options:

  • -h [<boolean>] (default: false)
    This help.
  • -md [<boolean>] (default: false)
    Dump help in Markdown format.
  • -config <string> (default: '')
    Load options from this file.
  • -save_config <string> (default: '')
    Save options to this file.

Vocabulary options

  • -data <string> (required)
    Data file.
  • -save_vocab <string> (required)
    Vocabulary dictionary prefix.
  • -vocab_size <table> (default: 50000)
    List of source vocabularies size: word[ feat1[ feat2[ ...] ] ]. If = 0, vocabularies are not pruned.
  • -words_min_frequency <table> (default: 0)
    List of source words min frequency: word[ feat1[ feat2[ ...] ] ]. If = 0, vocabularies are pruned by size.
  • -keep_frequency [<boolean>] (default: false)
    Keep frequency of words in dictionary.
  • -idx_files [<boolean>] (default: false)
    If set, each line of the data file starts with a first field which is the index of the sentence.

Logger options

  • -log_file <string> (default: '')
    Output logs to a file under this path instead of stdout - if file name ending with json, output structure json.
  • -disable_logs [<boolean>] (default: false)
    If set, output nothing.
  • -log_level <string> (accepted: DEBUG, INFO, WARNING, ERROR, NONE; default: INFO)
    Output logs at this level and above.