train.lua

  • -h: this help [false]
  • -md: Dump help in Markdown format [false]
  • -config: read options from config file. []
  • -save_config: save options from config file. []

Data options

  • -data: Path to the training *-train.t7 file from preprocess.lua []
  • -save_model: Model filename (the model will be saved as _epochN_PPL.t7 where PPL is the validation perplexity []
  • -model_type: (lm, seq2seq) Type of the model to train. This option impacts all options choices [seq2seq]
  • -param_init: Parameters are initialized over uniform distribution with support (-param_init, param_init) [0.1]

Sequence to Sequence with Attention options

  • -layers: Number of layers in the RNN encoder/decoder [2]
  • -rnn_size: Size of RNN hidden states [500]
  • -rnn_type: (LSTM, GRU) Type of RNN cell [LSTM]
  • -word_vec_size: Common word embedding size. If set, this overrides -src_word_vec_size and -tgt_word_vec_size. [0]
  • -src_word_vec_size: Comma-separated list of source embedding sizes: word[,feat1,feat2,...]. [500]
  • -tgt_word_vec_size: Comma-separated list of target embedding sizes: word[,feat1,feat2,...]. [500]
  • -feat_merge: (concat, sum) Merge action for the features embeddings [concat]
  • -feat_vec_exponent: When features embedding sizes are not set and using -feat_merge concat, their dimension will be set to N^exponent where N is the number of values the feature takes. [0.7]
  • -feat_vec_size: When features embedding sizes are not set and using -feat_merge sum, this is the common embedding size of the features [20]
  • -input_feed: (0, 1) Feed the context vector at each time step as additional input (via concatenation with the word embeddings) to the decoder. [1]
  • -residual: Add residual connections between RNN layers. [false]
  • -brnn: Use a bidirectional encoder [false]
  • -brnn_merge: (concat, sum) Merge action for the bidirectional hidden states [sum]
  • -pre_word_vecs_enc: If a valid path is specified, then this will load pretrained word embeddings on the encoder side. See README for specific formatting instructions. []
  • -pre_word_vecs_dec: If a valid path is specified, then this will load pretrained word embeddings on the decoder side. See README for specific formatting instructions. []
  • -fix_word_vecs_enc: Fix word embeddings on the encoder side [false]
  • -fix_word_vecs_dec: Fix word embeddings on the decoder side [false]
  • -dropout: Dropout probability. Dropout is applied between vertical LSTM stacks. [0.3]

Optimization options

  • -max_batch_size: Maximum batch size [64]
  • -optim: (sgd, adagrad, adadelta, adam) Optimization method. [sgd]
  • -learning_rate: Starting learning rate. If adagrad or adam is used, then this is the global learning rate. Recommended settings are: sgd = 1, adagrad = 0.1, adam = 0.0002 [1]
  • -max_grad_norm: If the norm of the gradient vector exceeds this renormalize it to have the norm equal to max_grad_norm [5]
  • -learning_rate_decay: Decay learning rate by this much if (i) perplexity does not decrease on the validation set or (ii) epoch has gone past the start_decay_at_limit [0.5]
  • -start_decay_at: Start decay after this epoch [9]

Trainer options

  • -save_every: Save intermediate models every this many iterations within an epoch. If = 0, will not save models within an epoch. [0]
  • -report_every: Print stats every this many iterations within an epoch. [50]
  • -async_parallel: Use asynchronous parallelism training. [false]
  • -async_parallel_minbatch: For async parallel computing, minimal number of batches before being parallel. [1000]
  • -start_iteration: If loading from a checkpoint, the iteration from which to start [1]
  • -end_epoch: The final epoch of the training [13]
  • -start_epoch: If loading from a checkpoint, the epoch from which to start [1]
  • -curriculum: For this many epochs, order the minibatches based on source sequence length. Sometimes setting this to 1 will increase convergence speed. [0]

Checkpoint options

  • -train_from: If training from a checkpoint then this is the path to the pretrained model. []
  • -continue: If training from a checkpoint, whether to continue the training in the same configuration or not. [false]

Other options

  • -gpuid: List of comma-separated GPU identifiers (1-indexed). CPU is used when set to 0. [0]
  • -fallback_to_cpu: If GPU can't be use, rollback on the CPU. [false]
  • -no_nccl: Disable usage of nccl in parallel mode. [false]
  • -disable_mem_optimization: Disable sharing internal of internal buffers between clones - which is in general safe, except if you want to look inside clones for visualization purpose for instance. [false]
  • -log_file: Outputs logs to a file under this path instead of stdout. []
  • -disable_logs: If = true, output nothing. [false]
  • -log_level: (DEBUG, INFO, WARNING, ERROR) Outputs logs at this level and above. [INFO]
  • -profiler: Generate profiling logs. [false]
  • -seed: Seed for random initialization [3435]