edit

train.lua

train.lua options:

  • -h [<boolean>] (default: false)
    This help.
  • -md [<boolean>] (default: false)
    Dump help in Markdown format.
  • -config <string> (default: '')
    Load options from this file.
  • -save_config <string> (default: '')
    Save options to this file.

Data options

  • -data <string> (required)
    Path to the data package *-train.t7 generated by the preprocessing step.

Sampled dataset options

  • -sample <number> (default: 0)
    Number of instances to sample from train data in each epoch.
  • -sample_type <string> (accepted: uniform, perplexity, partition; default: uniform)
    Define the partition type. uniform draws randomly the sample, perplexity uses perplexity as a probability distribution when sampling (with -sample_perplexity_init and -sample_perplexity_max options), partition draws different subsets at each epoch.
  • -sample_perplexity_init <number> (default: 15)
    Start perplexity-based sampling when average train perplexity per batch falls below this value.
  • -sample_perplexity_max <number> (default: -1.5)
    When greater than 0, instances with perplexity above this value will be considered as noise and ignored; when less than 0, mode + -sample_perplexity_max * stdev will be used as threshold.
  • -sample_tgt_vocab [<boolean>] (default: false)
    Use importance sampling approach as approximation of full softmax: target vocabulary is built using sample.

Model options

  • -model_type <string> (accepted: lm, seq2seq, seqtagger; default: seq2seq)
    Type of model to train. This option impacts all options choices.
  • -param_init <number> (default: 0.1)
    Parameters are initialized over uniform distribution with support (-param_init, param_init).

Sequence to Sequence with Attention options

  • -enc_layers <number> (default: 0)
    If > 0, number of layers of the encoder. This overrides the global -layers option.
  • -dec_layers <number> (default: 0)
    If > 0, number of layers of the decoder. This overrides the global -layers option.
  • -word_vec_size <number> (default: 0)
    Shared word embedding size. If set, this overrides -src_word_vec_size and -tgt_word_vec_size.
  • -src_word_vec_size <table> (default: 500)
    List of source embedding sizes: word[ feat1[ feat2[ ...] ] ].
  • -tgt_word_vec_size <table> (default: 500)
    List of target embedding sizes: word[ feat1[ feat2[ ...] ] ].
  • -pre_word_vecs_enc <string> (default: '')
    Path to pretrained word embeddings on the encoder side serialized as a Torch tensor.
  • -pre_word_vecs_dec <string> (default: '')
    Path to pretrained word embeddings on the decoder side serialized as a Torch tensor.
  • -fix_word_vecs_enc [<boolean>/<string>] (accepted: false, true, pretrained; default: false)
    Fix word embeddings on the encoder side.
  • -fix_word_vecs_dec [<boolean>/<string>] (accepted: false, true, pretrained; default: false)
    Fix word embeddings on the decoder side.
  • -feat_merge <string> (accepted: concat, sum; default: concat)
    Merge action for the features embeddings.
  • -feat_vec_exponent <number> (default: 0.7)
    When features embedding sizes are not set and using -feat_merge concat, their dimension will be set to N^feat_vec_exponent where N is the number of values the feature takes.
  • -feat_vec_size <number> (default: 20)
    When features embedding sizes are not set and using -feat_merge sum, this is the common embedding size of the features
  • -layers <number> (default: 2)
    Number of recurrent layers of the encoder and decoder. See also -enc_layers, -dec_layers and -bridge to assign different layers to the encoder and decoder.
  • -rnn_size <number> (default: 500)
    Hidden size of the recurrent unit.
  • -rnn_type <string> (accepted: LSTM, GRU; default: LSTM)
    Type of recurrent cell.
  • -dropout <number> (default: 0.3)
    Dropout probability applied between recurrent layers.
  • -dropout_input [<boolean>] (default: false)
    Also apply dropout to the input of the recurrent module.
  • -residual [<boolean>] (default: false)
    Add residual connections between recurrent layers.
  • -bridge <string> (accepted: copy, dense, dense_nonlinear, none; default: copy)
    Define how to pass encoder states to the decoder. With copy, the encoder and decoder must have the same number of layers.
  • -input_feed [<boolean>] (default: true)
    Feed the context vector at each time step as additional input (via concatenation with the word embeddings) to the decoder.
  • -brnn [<boolean>] (default: false)
    Use a bidirectional encoder.
  • -dbrnn [<boolean>] (default: false)
    Use a deep bidirectional encoder.
  • -pdbrnn [<boolean>] (default: false)
    Use a pyramidal deep bidirectional encoder.
  • -attention <string> (accepted: none, global; default: global)
    Attention model.
  • -brnn_merge <string> (accepted: concat, sum; default: sum)
    Merge action for the bidirectional states.
  • -pdbrnn_reduction <number> (default: 2)
    Time-reduction factor at each layer.

Global Attention Model options

  • -global_attention <string> (accepted: general, dot, concat; default: general)
    Global attention model type.

Trainer options

  • -save_every <number> (default: 5000)
    Save intermediate models every this many iterations within an epoch. If = 0, will not save intermediate models.
  • -save_every_epochs <number> (default: 1)
    Save a model every this many epochs. If = 0, will not save a model at each epoch.
  • -report_every <number> (default: 50)
    Report progress every this many iterations within an epoch.
  • -async_parallel [<boolean>] (default: false)
    When training on multiple GPUs, update parameters asynchronously.
  • -async_parallel_minbatch <number> (default: 1000)
    In asynchronous training, minimal number of sequential batches before being parallel.
  • -start_iteration <number> (default: 1)
    If loading from a checkpoint, the iteration from which to start.
  • -start_epoch <number> (default: 1)
    If loading from a checkpoint, the epoch from which to start.
  • -end_epoch <number> (default: 13)
    The final epoch of the training. If = 0, train forever unless another stopping condition is met (e.g. -min_learning_rate is reached).
  • -curriculum <number> (default: 0)
    For this many epochs, order the minibatches based on source length (from smaller to longer). Sometimes setting this to 1 will increase convergence speed.

Optimization options

  • -max_batch_size <number> (default: 64)
    Maximum batch size.
  • -uneven_batches [<boolean>] (default: false)
    If set, batches are filled up to -max_batch_size even if the source lengths are different. Slower but needed for some tasks.
  • -optim <string> (accepted: sgd, adagrad, adadelta, adam; default: sgd)
    Optimization method.
  • -learning_rate <number> (default: 1)
    Initial learning rate. If adagrad or adam is used, then this is the global learning rate. Recommended settings are: sgd = 1, adagrad = 0.1, adam = 0.0002.
  • -min_learning_rate <number> (default: 0)
    Do not continue the training past this learning rate value.
  • -max_grad_norm <number> (default: 5)
    Clip the gradients norm to this value.
  • -learning_rate_decay <number> (default: 0.7)
    Learning rate decay factor: learning_rate = learning_rate * learning_rate_decay.
  • -start_decay_at <number> (default: 9)
    In "default" decay mode, start decay after this epoch.
  • -start_decay_ppl_delta <number> (default: 0)
    Start decay when validation perplexity improvement is lower than this value.
  • -decay <string> (accepted: default, epoch_only, perplexity_only; default: default)
    When to apply learning rate decay. default: decay after each epoch past -start_decay_at or as soon as the validation perplexity is not improving more than -start_decay_ppl_delta, epoch_only: only decay after each epoch past -start_decay_at, perplexity_only: only decay when validation perplexity is not improving more than -start_decay_ppl_delta.

Saver options

  • -save_model <string> (required)
    Model filename (the model will be saved as <save_model>_epochN_PPL.t7 where PPL is the validation perplexity.
  • -train_from <string> (default: '')
    Path to a checkpoint.
  • -continue [<boolean>] (default: false)
    If set, continue the training where it left off.

Crayon options

  • -exp_host <string> (default: 127.0.0.1)
    Crayon server IP.
  • -exp_port <string> (default: 8889)
    Crayon server port.
  • -exp <string> (default: '')
    Crayon experiment name.

Cuda options

  • -gpuid <table> (default: 0)
    List of GPU identifiers (1-indexed). CPU is used when set to 0.
  • -fallback_to_cpu [<boolean>] (default: false)
    If GPU can't be used, rollback on the CPU.
  • -fp16 [<boolean>] (default: false)
    Use half-precision float on GPU.
  • -no_nccl [<boolean>] (default: false)
    Disable usage of nccl in parallel mode.

Logger options

  • -log_file <string> (default: '')
    Output logs to a file under this path instead of stdout.
  • -disable_logs [<boolean>] (default: false)
    If set, output nothing.
  • -log_level <string> (accepted: DEBUG, INFO, WARNING, ERROR; default: INFO)
    Output logs at this level and above.

Other options

  • -disable_mem_optimization [<boolean>] (default: false)
    Disable sharing of internal buffers between clones for visualization or development.
  • -profiler [<boolean>] (default: false)
    Generate profiling logs.
  • -seed <number> (default: 3435)
    Random seed.