edit

train.lua

train.lua options:

  • -h [<boolean>] (default: false)
    This help.
  • -md [<boolean>] (default: false)
    Dump help in Markdown format.
  • -config <string> (default: '')
    Load options from this file.
  • -save_config <string> (default: '')
    Save options to this file.

Data options

  • -data <string> (required)
    Path to the data package *-train.t7 generated by the preprocessing step.

Sampled dataset options

  • -sample <number> (default: 0)
    Number of instances to sample from train data in each epoch.
  • -sample_type <string> (accepted: uniform, perplexity, partition; default: uniform)
    Define the partition type. uniform draws randomly the sample, perplexity uses perplexity as a probability distribution when sampling (with -sample_perplexity_init and -sample_perplexity_max options), partition draws different subsets at each epoch.
  • -sample_perplexity_init <number> (default: 15)
    Start perplexity-based sampling when average train perplexity per batch falls below this value.
  • -sample_perplexity_max <number> (default: -1.5)
    When greater than 0, instances with perplexity above this value will be considered as noise and ignored; when less than 0, mode + -sample_perplexity_max * stdev will be used as threshold.
  • -sample_tgt_vocab [<boolean>] (default: false)
    Use importance sampling approach as approximation of full softmax: target vocabulary is built using sample.

Model options

  • -model_type <string> (accepted: lm, seq2seq, seqtagger; default: seq2seq)
    Type of model to train. This option impacts all options choices.
  • -param_init <number> (default: 0.1)
    Parameters are initialized over uniform distribution with support (-param_init, param_init).

Sequence to Sequence with Attention options

  • -enc_layers <number> (default: 0)
    If > 0, number of layers of the encoder. This overrides the global -layers option.
  • -dec_layers <number> (default: 0)
    If > 0, number of layers of the decoder. This overrides the global -layers option.
  • -word_vec_size <number> (default: 0)
    Shared word embedding size. If set, this overrides -src_word_vec_size and -tgt_word_vec_size.
  • -src_word_vec_size <table> (default: 500)
    List of source embedding sizes: word[ feat1[ feat2[ ...] ] ].
  • -tgt_word_vec_size <table> (default: 500)
    List of target embedding sizes: word[ feat1[ feat2[ ...] ] ].
  • -pre_word_vecs_enc <string> (default: '')
    Path to pretrained word embeddings on the encoder side serialized as a Torch tensor.
  • -pre_word_vecs_dec <string> (default: '')
    Path to pretrained word embeddings on the decoder side serialized as a Torch tensor.
  • -fix_word_vecs_enc [<boolean>/<string>] (accepted: false, true, pretrained; default: false)
    Fix word embeddings on the encoder side.
  • -fix_word_vecs_dec [<boolean>/<string>] (accepted: false, true, pretrained; default: false)
    Fix word embeddings on the decoder side.
  • -feat_merge <string> (accepted: concat, sum; default: concat)
    Merge action for the features embeddings.
  • -feat_vec_exponent <number> (default: 0.7)
    When features embedding sizes are not set and using -feat_merge concat, their dimension will be set to N^feat_vec_exponent where N is the number of values the feature takes.
  • -feat_vec_size <number> (default: 20)
    When features embedding sizes are not set and using -feat_merge sum, this is the common embedding size of the features
  • -layers <number> (default: 2)
    Number of recurrent layers of the encoder and decoder. See also -enc_layers, -dec_layers and -bridge to assign different layers to the encoder and decoder.
  • -rnn_size <number> (default: 500)
    Hidden size of the recurrent unit.
  • -rnn_type <string> (accepted: LSTM, GRU; default: LSTM)
    Type of recurrent cell.
  • -dropout <number> (default: 0.3)
    Dropout probability applied between recurrent layers.
  • -dropout_input [<boolean>] (default: false)
    Dropout probability applied to the input of the recurrent module.
  • -dropout_words <number> (default: 0)
    Dropout probability applied to the source sequence.
  • -dropout_type <string> (accepted: naive, variational; default: naive)
    Dropout type.
  • -residual [<boolean>] (default: false)
    Add residual connections between recurrent layers.
  • -bridge <string> (accepted: copy, dense, dense_nonlinear, none; default: copy)
    Define how to pass encoder states to the decoder. With copy, the encoder and decoder must have the same number of layers.
  • -input_feed [<boolean>] (default: true)
    Feed the context vector at each time step as additional input (via concatenation with the word embeddings) to the decoder.
  • -encoder_type <string> (accepted: rnn, brnn, dbrnn, pdbrnn, gnmt; default: rnn)
    Encoder type.
  • -attention <string> (accepted: none, global; default: global)
    Attention model.
  • -brnn_merge <string> (accepted: concat, sum; default: sum)
    Merge action for the bidirectional states.
  • -pdbrnn_reduction <number> (default: 2)
    Time-reduction factor at each layer.
  • -pdbrnn_merge <string> (accepted: concat, sum; default: concat)
    Merge action when reducing time.

Global Attention Model options

  • -global_attention <string> (accepted: general, dot, concat; default: general)
    Global attention model type.

Trainer options

  • -save_every <number> (default: 5000)
    Save intermediate models every this many iterations within an epoch. If = 0, will not save intermediate models.
  • -save_every_epochs <number> (default: 1)
    Save a model every this many epochs. If = 0, will not save a model at each epoch.
  • -report_every <number> (default: 50)
    Report progress every this many iterations within an epoch.
  • -async_parallel [<boolean>] (default: false)
    When training on multiple GPUs, update parameters asynchronously.
  • -async_parallel_minbatch <number> (default: 1000)
    In asynchronous training, minimal number of sequential batches before being parallel.
  • -start_iteration <number> (default: 1)
    If loading from a checkpoint, the iteration from which to start.
  • -start_epoch <number> (default: 1)
    If loading from a checkpoint, the epoch from which to start.
  • -end_epoch <number> (default: 13)
    The final epoch of the training. If = 0, train forever unless another stopping condition is met (e.g. -min_learning_rate is reached).
  • -curriculum <number> (default: 0)
    For this many epochs, order the minibatches based on source length (from smaller to longer). Sometimes setting this to 1 will increase convergence speed.
  • -validation_metric <string> (accepted: perplexity, loss, bleu; default: perplexity)
    Metric to use for validation.

Optimization options

  • -max_batch_size <number> (default: 64)
    Maximum batch size.
  • -uneven_batches [<boolean>] (default: false)
    If set, batches are filled up to -max_batch_size even if the source lengths are different. Slower but needed for some tasks.
  • -optim <string> (accepted: sgd, adagrad, adadelta, adam; default: sgd)
    Optimization method.
  • -learning_rate <number> (default: 1)
    Initial learning rate. If adagrad or adam is used, then this is the global learning rate. Recommended settings are: sgd = 1, adagrad = 0.1, adam = 0.0002.
  • -min_learning_rate <number> (default: 0)
    Do not continue the training past this learning rate value.
  • -max_grad_norm <number> (default: 5)
    Clip the gradients norm to this value.
  • -learning_rate_decay <number> (default: 0.7)
    Learning rate decay factor: learning_rate = learning_rate * learning_rate_decay.
  • -start_decay_at <number> (default: 9)
    In "default" decay mode, start decay after this epoch.
  • -start_decay_score_delta <number> (default: 0)
    Start decay when validation score improvement is lower than this value.
  • -decay <string> (accepted: default, epoch_only, score_only; default: default)
    When to apply learning rate decay. default: decay after each epoch past -start_decay_at or as soon as the validation score is not improving more than -start_decay_score_delta, epoch_only: only decay after each epoch past -start_decay_at, score_only: only decay when validation score is not improving more than -start_decay_ppl_delta.

Saver options

  • -save_model <string> (required)
    Model filename (the model will be saved as <save_model>_epochN_PPL.t7 where PPL is the validation perplexity.
  • -train_from <string> (default: '')
    Path to a checkpoint.
  • -continue [<boolean>] (default: false)
    If set, continue the training where it left off.

Translator options

  • -model <string> (default: '')
    Path to the serialized model file.
  • -beam_size <number> (default: 5)
    Beam size.
  • -max_sent_length <number> (default: 250)
    Maximum output sentence length.
  • -replace_unk [<boolean>] (default: false)
    Replace the generated tokens with the source token that has the highest attention weight. If -phrase_table is provided, it will lookup the identified source token and give the corresponding target token. If it is not provided (or the identified source token does not exist in the table) then it will copy the source token
  • -phrase_table <string> (default: '')
    Path to source-target dictionary to replace <unk> tokens.
  • -n_best <number> (default: 1)
    If > 1, it will also output an n-best list of decoded sentences.
  • -max_num_unks <number> (default: inf)
    All sequences with more <unk>s than this will be ignored during beam search.
  • -target_subdict <string> (default: '')
    Path to target words dictionary corresponding to the source.
  • -pre_filter_factor <number> (default: 1)
    Optional, set this only if filter is being used. Before applying filters, hypotheses with top beam_size * pre_filter_factor scores will be considered. If the returned hypotheses voilate filters, then set this to a larger value to consider more.
  • -length_norm <number> (default: 0)
    Length normalization coefficient (alpha). If set to 0, no length normalization.
  • -coverage_norm <number> (default: 0)
    Coverage normalization coefficient (beta). An extra coverage term multiplied by beta is added to hypotheses scores. If is set to 0, no coverage normalization.
  • -eos_norm <number> (default: 0)
    End of sentence normalization coefficient (gamma). If set to 0, no EOS normalization.
  • -dump_input_encoding [<boolean>] (default: false)
    Instead of generating target tokens conditional on the source tokens, we print the representation (encoding/embedding) of the input.
  • -save_beam_to <string> (default: '')
    Path to a file where the beam search exploration will be saved in a JSON format. Requires the dkjson package.

Crayon options

  • -exp_host <string> (default: 127.0.0.1)
    Crayon server IP.
  • -exp_port <string> (default: 8889)
    Crayon server port.
  • -exp <string> (default: '')
    Crayon experiment name.

Cuda options

  • -gpuid <table> (default: 0)
    List of GPU identifiers (1-indexed). CPU is used when set to 0.
  • -fallback_to_cpu [<boolean>] (default: false)
    If GPU can't be used, rollback on the CPU.
  • -fp16 [<boolean>] (default: false)
    Use half-precision float on GPU.
  • -no_nccl [<boolean>] (default: false)
    Disable usage of nccl in parallel mode.

Logger options

  • -log_file <string> (default: '')
    Output logs to a file under this path instead of stdout.
  • -disable_logs [<boolean>] (default: false)
    If set, output nothing.
  • -log_level <string> (accepted: DEBUG, INFO, WARNING, ERROR; default: INFO)
    Output logs at this level and above.

Other options

  • -disable_mem_optimization [<boolean>] (default: false)
    Disable sharing of internal buffers between clones for visualization or development.
  • -profiler [<boolean>] (default: false)
    Generate profiling logs.
  • -seed <number> (default: 3435)
    Random seed.