preprocess.lua

  • -h: This help. [false]
  • -md: Dump help in Markdown format. [false]
  • -config: Read options from config file. []
  • -save_config: Save options from config file. []

Preprocess options

  • -data_type: (bitext, monotext) Type of text to preprocess. Use 'monotext' for monolingual text. This option impacts all options choices. [bitext]
  • -save_data: Output file for the prepared data. []

Preprocess options

  • -train_src: Path to the training source data. []
  • -train_tgt: Path to the training target data. []
  • -valid_src: Path to the validation source data. []
  • -valid_tgt: Path to the validation target data. []
  • -src_vocab: Path to an existing source vocabulary. []
  • -tgt_vocab: Path to an existing target vocabulary. []
  • -src_vocab_size: Comma-separated list of source vocabularies size: word[,feat1,feat2,...]. If = 0, vocabularies are not pruned. [50000]
  • -tgt_vocab_size: Comma-separated list of target vocabularies size: word[,feat1,feat2,...]. If = 0, vocabularies are not pruned. [50000]
  • -src_words_min_frequency: Comma-separated list of source words min frequency: word[,feat1,feat2,...]. If = 0, vocabularies are pruned by size. [0]
  • -tgt_words_min_frequency: Comma-separated list of target words min frequency: word[,feat1,feat2,...]. If = 0, vocabularies are pruned by size. [0]
  • -src_seq_length: Maximum source sequence length. [50]
  • -tgt_seq_length: Maximum target sequence length. [50]
  • -features_vocabs_prefix: Path prefix to existing features vocabularies. []
  • -shuffle: If 1, shuffle data. [1]

Other options

  • -seed: Random seed. [3425]
  • -report_every: Report status every this many sentences. [100000]
  • -log_file: Outputs logs to a file under this path instead of stdout. []
  • -disable_logs: When activated, output nothing. [false]
  • -log_level: (DEBUG, INFO, WARNING, ERROR) Outputs logs at this level and above. [INFO]