train.lua

  • -h: This help. [false]
  • -md: Dump help in Markdown format. [false]
  • -config: Read options from config file. []
  • -save_config: Save options from config file. []

Data options

  • -data: Path to the training *-train.t7 file from preprocess.lua []
  • -save_model: Model filename (the model will be saved as _epochN_PPL.t7 where PPL is the validation perplexity []
  • -model_type: (lm, seq2seq) Type of the model to train. This option impacts all options choices [seq2seq]
  • -param_init: Parameters are initialized over uniform distribution with support (-param_init, param_init) [0.1]

Sequence to Sequence with Attention options

  • -word_vec_size: Common word embedding size. If set, this overrides -src_word_vec_size and -tgt_word_vec_size. [0]
  • -src_word_vec_size: Comma-separated list of source embedding sizes: word[,feat1,feat2,...]. [500]
  • -tgt_word_vec_size: Comma-separated list of target embedding sizes: word[,feat1,feat2,...]. [500]
  • -pre_word_vecs_enc: If a valid path is specified, then this will load pretrained word embeddings on the encoder side. See README for specific formatting instructions. []
  • -pre_word_vecs_dec: If a valid path is specified, then this will load pretrained word embeddings on the decoder side. See README for specific formatting instructions. []
  • -fix_word_vecs_enc: Fix word embeddings on the encoder side [false]
  • -fix_word_vecs_dec: Fix word embeddings on the decoder side [false]
  • -feat_merge: (concat, sum) Merge action for the features embeddings [concat]
  • -feat_vec_exponent: When features embedding sizes are not set and using -feat_merge concat, their dimension will be set to N^exponent where N is the number of values the feature takes. [0.7]
  • -feat_vec_size: When features embedding sizes are not set and using -feat_merge sum, this is the common embedding size of the features [20]
  • -input_feed: (0, 1) Feed the context vector at each time step as additional input (via concatenation with the word embeddings) to the decoder. [1]
  • -layers: Number of layers in the RNN Encoder/decoder [2]
  • -rnn_size: Size of RNN hidden states [500]
  • -rnn_type: (LSTM, GRU) Type of RNN cell [LSTM]
  • -dropout: Dropout probability. Dropout is applied between vertical LSTM stacks. [0.3]
  • -dropout_input: Add dropout also on input. [false]
  • -residual: Add residual connections between RNN layers. [false]
  • -brnn: Use a bidirectional encoder. [false]
  • -dbrnn: Use a deep bidirectional encoder. [false]
  • -pdbrnn: Use pyramidal deep bidirectional encoder. [false]
  • -brnn_merge: (concat, sum) Merge action for the bidirectional hidden states [sum]
  • -pdbrnn_reduction: Time-Reduction factor at each layer. [2]

Optimization options

  • -max_batch_size: Maximum batch size [64]
  • -optim: (sgd, adagrad, adadelta, adam) Optimization method. [sgd]
  • -learning_rate: Starting learning rate. If adagrad or adam is used, then this is the global learning rate. Recommended settings are: sgd = 1, adagrad = 0.1, adam = 0.0002 [1]
  • -min_learning_rate: Do not continue the training past this learning rate [0]
  • -max_grad_norm: If the norm of the gradient vector exceeds this renormalize it to have the norm equal to max_grad_norm [5]
  • -learning_rate_decay: Learning rate decay factor [0.5]
  • -start_decay_at: With 'default' decay mode, start decay after this epoch [9]
  • -start_decay_ppl_delta: Start decay when validation perplexity improvement is lower than this value [0]
  • -decay: (default, perplexity_only) When to apply learning rate decay. 'default': decay after each epoch past start_decay_at or as soon as the validation perplexity is not improving more than start_decay_ppl_delta, 'perplexity_only': only decay when validation perplexity is not improving more than start_decay_ppl_delta [default]

Trainer options

  • -save_every: Save intermediate models every this many iterations within an epoch. If = 0, will not save models within an epoch. [0]
  • -report_every: Print stats every this many iterations within an epoch. [50]
  • -async_parallel: Use asynchronous parallelism training. [false]
  • -async_parallel_minbatch: For async parallel computing, minimal number of batches before being parallel. [1000]
  • -start_iteration: If loading from a checkpoint, the iteration from which to start [1]
  • -end_epoch: The final epoch of the training [13]
  • -start_epoch: If loading from a checkpoint, the epoch from which to start [1]
  • -curriculum: For this many epochs, order the minibatches based on source sequence length. Sometimes setting this to 1 will increase convergence speed. [0]

Checkpoint options

  • -train_from: If training from a checkpoint then this is the path to the pretrained model. []
  • -continue: If training from a checkpoint, whether to continue the training in the same configuration or not. [false]

Crayon options

  • -exp_host: Crayon server IP [127.0.0.1]
  • -exp_port: Crayon Server port [8889]
  • -exp: Crayon experiment name []

SampledDataset options

  • -sample: Number of instances to sample from train data in each epoch [0]
  • -sample_w_ppl: use ppl as probability distribution when sampling [false]
  • -sample_w_ppl_init: start perplexity-based sampling when average train perplexity per batch falls below this value [15]
  • -sample_w_ppl_max: when greater than 0, instances with perplexity above this value will be considered as noise and ignored; when less than 0, mode + (-sample_w_ppl_max) * stdev will be used as threshold [-1.5]

Other options

  • -gpuid: List of comma-separated GPU identifiers (1-indexed). CPU is used when set to 0. [0]
  • -fallback_to_cpu: If GPU can't be use, rollback on the CPU. [false]
  • -fp16: Use half-precision float on GPU. [false]
  • -no_nccl: Disable usage of nccl in parallel mode. [false]
  • -disable_mem_optimization: Disable sharing of internal buffers between clones for visualization or development. [false]
  • -log_file: Outputs logs to a file under this path instead of stdout. []
  • -disable_logs: When activated, output nothing. [false]
  • -log_level: (DEBUG, INFO, WARNING, ERROR) Outputs logs at this level and above. [INFO]
  • -profiler: Generate profiling logs. [false]
  • -seed: Seed for random initialization [3435]