tools/embeddings.lua

embeddings.lua options:

  • -h [<boolean>] (default: false)
    This help.
  • -md [<boolean>] (default: false)
    Dump help in Markdown format.
  • -config <string> (default: '')
    Load options from this file.
  • -save_config <string> (default: '')
    Save options to this file.

Data options

  • -dict_file <string> (required)
    Path to outputted dict file from preprocess.lua.
  • -embed_file <string> (default: '')
    Path to the embedding file. Ignored if -lang is used.
  • -save_data <string> (required)
    Output file path/label.
  • -save_unknown_dict <string> (default: '')
    Path to file for saving vocabs not found in embedding.

Embedding options

  • -lang <string> (default: '')
    Wikipedia Language Code to autoload embeddings.
  • -embed_type <string> (accepted: word2vec-bin, word2vec-txt, glove; default: word2vec-bin)
    Embeddings file origin. Ignored if -lang is used.
  • -normalize [<boolean>] (default: true)
    Boolean to normalize the word vectors, or not.
  • -approximate [<boolean>] (default: false)
    If set, will also look for variants (case, joiner annotate) to match dictionary and word embedding.
  • -report_every <number> (default: 100000)
    Print stats every this many lines read from embedding file.

Logger options

  • -log_file <string> (default: '')
    Output logs to a file under this path instead of stdout - if file name ending with json, output structure json.
  • -disable_logs [<boolean>] (default: false)
    If set, output nothing.
  • -log_level <string> (accepted: DEBUG, INFO, WARNING, ERROR, NONE; default: INFO)
    Output logs at this level and above.