train.lua

train.lua options:

  • -h [<boolean>] (default: false)
    This help.
  • -md [<boolean>] (default: false)
    Dump help in Markdown format.
  • -config <string> (default: '')
    Load options from this file.
  • -save_config <string> (default: '')
    Save options to this file.

Data options

  • -data <string> (default: '')
    Path to the data package *-train.t7 generated by the preprocessing step.

Sampled dataset options

  • -sample <number> (default: 0)
    Number of instances to sample from train data in each epoch.
  • -sample_type <string> (accepted: uniform, perplexity, partition; default: uniform)
    Define the partition type. uniform draws randomly the sample, perplexity uses perplexity as a probability distribution when sampling (with -sample_perplexity_init and -sample_perplexity_max options), partition draws different subsets at each epoch.
  • -sample_perplexity_init <number> (default: 15)
    Start perplexity-based sampling when average train perplexity per batch falls below this value.
  • -sample_perplexity_max <number> (default: -1.5)
    When greater than 0, instances with perplexity above this value will be considered as noise and ignored; when less than 0, mode + -sample_perplexity_max * stdev will be used as threshold.

Data options

  • -train_dir <string> (default: '')
    Path to training files directory.
  • -train_src <string> (default: '')
    Path to the training source data.
  • -train_tgt <string> (default: '')
    Path to the training target data.
  • -valid_src <string> (default: '')
    Path to the validation source data.
  • -valid_tgt <string> (default: '')
    Path to the validation target data.
  • -src_vocab <string> (default: '')
    Path to an existing source vocabulary.
  • -src_suffix <string> (default: .src)
    Suffix for source files in train/valid directories.
  • -src_vocab_size <table> (default: 50000)
    List of source vocabularies size: word[ feat1[ feat2[ ...] ] ]. If = 0, vocabularies are not pruned.
  • -src_words_min_frequency <table> (default: 0)
    List of source words min frequency: word[ feat1[ feat2[ ...] ] ]. If = 0, vocabularies are pruned by size.
  • -tgt_vocab <string> (default: '')
    Path to an existing target vocabulary.
  • -tgt_suffix <string> (default: .tgt)
    Suffix for target files in train/valid directories.
  • -tgt_vocab_size <table> (default: 50000)
    List of target vocabularies size: word[ feat1[ feat2[ ...] ] ]. If = 0, vocabularies are not pruned.
  • -tgt_words_min_frequency <table> (default: 0)
    List of target words min frequency: word[ feat1[ feat2[ ...] ] ]. If = 0, vocabularies are pruned by size.
  • -src_seq_length <number> (default: 50)
    Maximum source sequence length.
  • -tgt_seq_length <number> (default: 50)
    Maximum target sequence length.
  • -check_plength [<boolean>] (default: false)
    Check source and target have same length (for seq tagging).
  • -features_vocabs_prefix <string> (default: '')
    Path prefix to existing features vocabularies.
  • -time_shift_feature [<boolean>] (default: true)
    Time shift features on the decoder side.
  • -keep_frequency [<boolean>] (default: false)
    Keep frequency of words in dictionary.
  • -gsample <number> (default: 0)
    If not zero, extract a new sample from the corpus. In training mode, file sampling is done at each epoch. Values between 0 and 1 indicate ratio, values higher than 1 indicate data size
  • -gsample_dist <string> (default: '')
    Configuration file with data class distribution to use for sampling training corpus. If not set, sampling is uniform.
  • -sort [<boolean>] (default: true)
    If set, sort the sequences by size to build batches without source padding.
  • -shuffle [<boolean>] (default: true)
    If set, shuffle the data (prior sorting).
  • -idx_files [<boolean>] (default: false)
    If set, source and target files are 'key value' with key match between source and target.
  • -report_progress_every <number> (default: 100000)
    Report status every this many sentences.
  • -preprocess_pthreads <number> (default: 4)
    Number of parallel threads for preprocessing.

Tokenizer options

  • -tok_{src,tgt}_mode <string> (accepted: conservative, aggressive, space; default: space)
    Define how aggressive should the tokenization be. space is space-tokenization.
  • -tok_{src,tgt}_joiner_annotate [<boolean>] (default: false)
    Include joiner annotation using -joiner character.
  • -tok_{src,tgt}_joiner <string> (default: )
    Character used to annotate joiners.
  • -tok_{src,tgt}_joiner_new [<boolean>] (default: false)
    In -joiner_annotate mode, -joiner is an independent token.
  • -tok_{src,tgt}_case_feature [<boolean>] (default: false)
    Generate case feature.
  • -tok_{src,tgt}_segment_case [<boolean>] (default: false)
    Segment case feature, splits AbC to Ab C to be able to restore case
  • -tok_{src,tgt}_segment_alphabet <table> (accepted: Tagalog, Hanunoo, Limbu, Yi, Hebrew, Latin, Devanagari, Thaana, Lao, Sinhala, Georgian, Kannada, Cherokee, Kanbun, Buhid, Malayalam, Han, Thai, Katakana, Telugu, Greek, Myanmar, Armenian, Hangul, Cyrillic, Ethiopic, Tagbanwa, Gurmukhi, Ogham, Khmer, Arabic, Oriya, Hiragana, Mongolian, Kangxi, Syriac, Gujarati, Braille, Bengali, Tamil, Bopomofo, Tibetan)
    Segment all letters from indicated alphabet.
  • -tok_{src,tgt}_segment_numbers [<boolean>] (default: false)
    Segment numbers into single digits.
  • -tok_{src,tgt}_segment_alphabet_change [<boolean>] (default: false)
    Segment if alphabet change between 2 letters.
  • -tok_{src,tgt}_bpe_model <string> (default: '')
    Apply Byte Pair Encoding if the BPE model path is given. If the option is used, BPE related options will be overridden/set automatically if the BPE model specified by -bpe_model is learnt using learn_bpe.lua.
  • -tok_{src,tgt}_bpe_EOT_marker <string> (default: </w>)
    Marker used to mark the End of Token while applying BPE in mode 'prefix' or 'both'.
  • -tok_{src,tgt}_bpe_BOT_marker <string> (default: <w>)
    Marker used to mark the Beginning of Token while applying BPE in mode 'suffix' or 'both'.
  • -tok_{src,tgt}_bpe_case_insensitive [<boolean>] (default: false)
    Apply BPE internally in lowercase, but still output the truecase units. This option will be overridden/set automatically if the BPE model specified by -bpe_model is learnt using learn_bpe.lua.
  • -tok_{src,tgt}_bpe_mode <string> (accepted: suffix, prefix, both, none; default: suffix)
    Define the BPE mode. This option will be overridden/set automatically if the BPE model specified by -bpe_model is learnt using learn_bpe.lua. prefix: append -bpe_BOT_marker to the begining of each word to learn prefix-oriented pair statistics; suffix: append -bpe_EOT_marker to the end of each word to learn suffix-oriented pair statistics, as in the original Python script; both: suffix and prefix; none: no suffix nor prefix.

Sampled Vocabulary options

  • -sample_vocab [<boolean>] (default: false)
    Use importance sampling as an approximation of the full output vocabulary softmax.

Model options

  • -model_type <string> (accepted: lm, seq2seq, seqtagger; default: seq2seq)
    Type of model to train. This option impacts all options choices.
  • -param_init <number> (default: 0.1)
    Parameters are initialized over uniform distribution with support (-param_init, param_init). Set to 0 to rely on each module default initialization.

Sequence to Sequence with Attention options

  • -enc_layers <number> (default: 0)
    If > 0, number of layers of the encoder. This overrides the global -layers option.
  • -dec_layers <number> (default: 0)
    If > 0, number of layers of the decoder. This overrides the global -layers option.
  • -word_vec_size <number> (default: 0)
    Shared word embedding size. If set, this overrides -src_word_vec_size and -tgt_word_vec_size.
  • -src_word_vec_size <table> (default: 500)
    List of source embedding sizes: word[ feat1[ feat2[ ...] ] ].
  • -tgt_word_vec_size <table> (default: 500)
    List of target embedding sizes: word[ feat1[ feat2[ ...] ] ].
  • -pre_word_vecs_enc <string> (default: '')
    Path to pretrained word embeddings on the encoder side serialized as a Torch tensor.
  • -pre_word_vecs_dec <string> (default: '')
    Path to pretrained word embeddings on the decoder side serialized as a Torch tensor.
  • -fix_word_vecs_enc [<boolean>/<string>] (accepted: false, true, pretrained; default: false)
    Fix word embeddings on the encoder side.
  • -fix_word_vecs_dec [<boolean>/<string>] (accepted: false, true, pretrained; default: false)
    Fix word embeddings on the decoder side.
  • -feat_merge <string> (accepted: concat, sum; default: concat)
    Merge action for the features embeddings.
  • -feat_vec_exponent <number> (default: 0.7)
    When features embedding sizes are not set and using -feat_merge concat, their dimension will be set to N^feat_vec_exponent where N is the number of values the feature takes.
  • -feat_vec_size <number> (default: 20)
    When features embedding sizes are not set and using -feat_merge sum, this is the common embedding size of the features
  • -layers <number> (default: 2)
    Number of recurrent layers of the encoder and decoder. See also -enc_layers, -dec_layers and -bridge to assign different layers to the encoder and decoder.
  • -rnn_size <number> (default: 500)
    Hidden size of the recurrent unit.
  • -rnn_type <string> (accepted: LSTM, GRU; default: LSTM)
    Type of recurrent cell.
  • -dropout <number> (default: 0.3)
    Dropout probability applied between recurrent layers.
  • -dropout_input [<boolean>] (default: false)
    Dropout probability applied to the input of the recurrent module.
  • -dropout_words <number> (default: 0)
    Dropout probability applied to the source sequence.
  • -dropout_type <string> (accepted: naive, variational; default: naive)
    Dropout type.
  • -residual [<boolean>] (default: false)
    Add residual connections between recurrent layers.
  • -bridge <string> (accepted: copy, dense, dense_nonlinear, none; default: copy)
    Define how to pass encoder states to the decoder. With copy, the encoder and decoder must have the same number of layers.
  • -input_feed [<boolean>] (default: true)
    Feed the context vector at each time step as additional input (via concatenation with the word embeddings) to the decoder.
  • -scheduled_sampling <number> (default: 1)
    Probability of feeding true (vs. generated) previous token to decoder.
  • -scheduled_sampling_scope <string> (accepted: token, sentence; default: token)
    Apply scheduled sampling at token or sentence level.
  • -scheduled_sampling_decay_type <string> (accepted: linear, invsigmoid; default: linear)
    Scheduled Sampling decay type.
  • -scheduled_sampling_decay_rate <number> (default: 0)
    Scheduled Sampling decay rate.
  • -encoder_type <string> (accepted: rnn, brnn, dbrnn, pdbrnn, gnmt, cnn; default: rnn)
    Encoder type.
  • -attention <string> (accepted: none, global; default: global)
    Attention model.
  • -brnn_merge <string> (accepted: concat, sum; default: sum)
    Merge action for the bidirectional states.
  • -pdbrnn_reduction <number> (default: 2)
    Time-reduction factor at each layer.
  • -pdbrnn_merge <string> (accepted: concat, sum; default: concat)
    Merge action when reducing time.
  • -cnn_layers <number> (default: 2)
    Number of convolutional layers in the encoder.
  • -cnn_kernel <number> (default: 3)
    Kernel size for convolutions. Same in each layer.
  • -cnn_size <number> (default: 500)
    Number of output units per convolutional layer. Same in each layer.
  • -use_pos_emb [<boolean>] (default: true)
    Add positional embeddings to word embeddings.
  • -max_pos <number> (default: 50)
    Maximum value for positional indexes.

Global Attention Model options

  • -global_attention <string> (accepted: general, dot, concat; default: general)
    Global attention model type.

Trainer options

  • -save_every <number> (default: 5000)
    Save intermediate models every this many iterations within an epoch. If = 0, will not save intermediate models.
  • -save_every_epochs <number> (default: 1)
    Save a model every this many epochs. If = 0, will not save a model at each epoch.
  • -report_every <number> (default: 50)
    Report progress every this many iterations within an epoch.
  • -async_parallel [<boolean>] (default: false)
    When training on multiple GPUs, update parameters asynchronously.
  • -async_parallel_minbatch <number> (default: 1000)
    In asynchronous training, minimal number of sequential batches before being parallel.
  • -start_iteration <number> (default: 1)
    If loading from a checkpoint, the iteration from which to start.
  • -start_epoch <number> (default: 1)
    If loading from a checkpoint, the epoch from which to start.
  • -end_epoch <number> (default: 13)
    The final epoch of the training. If = 0, train forever unless another stopping condition is met (e.g. -min_learning_rate is reached).
  • -curriculum <number> (default: 0)
    For this many epochs, order the minibatches based on source length (from smaller to longer). Sometimes setting this to 1 will increase convergence speed.
  • -validation_metric <string> (accepted: perplexity, loss, bleu, ter, dlratio; default: perplexity)
    Metric to use for validation.
  • -save_validation_translation_every <number> (default: 0)
    When using translation-based validation metrics (e.g. BLEU, TER, etc.), also save the translation every this many epochs to the file <save_model>_epochN_validation_translation.txt. If = 0, will not save validation translation.
  • -update_vocab <string> (accepted: none, replace, merge; default: none)
    When training on a new train-set with a different vocabulary, update the vocabulary and save the common words' information (embedding, generator ...).

Optimization options

  • -max_batch_size <number> (default: 160)
    Maximum batch size.
  • -max_tokens <number> (default: 1800)
    Maximum batch size.
  • -uneven_batches [<boolean>] (default: false)
    If set, batches are filled up to -max_batch_size even if the source lengths are different. Slower but needed for some tasks.
  • -optim <string> (accepted: sgd, adagrad, adadelta, adam; default: sgd)
    Optimization method.
  • -learning_rate <number> (default: 1)
    Initial learning rate. If adagrad or adam is used, then this is the global learning rate. Recommended settings are: sgd = 1, adagrad = 0.1, adam = 0.0002.
  • -min_learning_rate <number> (default: 0)
    Do not continue the training past this learning rate value.
  • -max_grad_norm <number> (default: 5)
    Clip the gradients L2-norm to this value. Set to 0 to disable.
  • -learning_rate_decay <number> (default: 0.7)
    Learning rate decay factor: learning_rate = learning_rate * learning_rate_decay.
  • -start_decay_at <number> (default: 9)
    In "default" decay mode, start decay after this epoch.
  • -start_decay_score_delta <number> (default: 0)
    Start decay when validation score improvement is lower than this value.
  • -decay <string> (accepted: default, epoch_only, score_only; default: default)
    When to apply learning rate decay. default: decay after each epoch past -start_decay_at or as soon as the validation score is not improving more than -start_decay_score_delta, epoch_only: only decay after each epoch past -start_decay_at, score_only: only decay when validation score is not improving more than -start_decay_score_delta.
  • -decay_method <string> (accepted: default, restart; default: default)
    If restart is set, the optimizer states (if any) will be reset when the decay condition is met.

Saver options

  • -save_model <string> (required)
    Model filename (the model will be saved as <save_model>_epochN_PPL.t7 where PPL is the validation perplexity.
  • -train_from <string> (default: '')
    Path to a checkpoint.
  • -continue [<boolean>] (default: false)
    If set, continue the training where it left off.

Translator options

  • -model <string> (default: '')
    Path to the serialized model file.
  • -lm_model <string> (default: '')
    Path to serialized language model file.
  • -lm_weight <number> (default: 0.1)
    Relative weight of language model.
  • -beam_size <number> (default: 5)
    Beam size.
  • -max_sent_length <number> (default: 250)
    Maximum output sentence length.
  • -replace_unk [<boolean>] (default: false)
    Replace the generated tokens with the source token that has the highest attention weight. If -phrase_table is provided, it will lookup the identified source token and give the corresponding target token. If it is not provided (or the identified source token does not exist in the table) then it will copy the source token
  • -replace_unk_tagged [<boolean>] (default: false)
    The same as -replace_unk, but wrap the replaced token in ⦅unk:xxxxx⦆ if it is not found in the phrase table.
  • -lexical_constraints [<boolean>] (default: false)
    Force the beam search to apply the translations from the phrase table.
  • -limit_lexical_constraints [<boolean>] (default: false)
    Prevents producing each lexical constraint more than required.
  • -placeholder_constraints [<boolean>] (default: false)
    Force the beam search to reproduce placeholders in the translation.
  • -phrase_table <string> (default: '')
    Path to source-target dictionary to replace <unk> tokens.
  • -n_best <number> (default: 1)
    If > 1, it will also output an n-best list of decoded sentences.
  • -max_num_unks <number> (default: inf)
    All sequences with more <unk>s than this will be ignored during beam search.
  • -target_subdict <string> (default: '')
    Path to target words dictionary corresponding to the source.
  • -pre_filter_factor <number> (default: 1)
    Optional, set this only if filter is being used. Before applying filters, hypotheses with top beam_size * pre_filter_factor scores will be considered. If the returned hypotheses voilate filters, then set this to a larger value to consider more.
  • -length_norm <number> (default: 0)
    Length normalization coefficient (alpha). If set to 0, no length normalization.
  • -coverage_norm <number> (default: 0)
    Coverage normalization coefficient (beta). An extra coverage term multiplied by beta is added to hypotheses scores. If is set to 0, no coverage normalization.
  • -eos_norm <number> (default: 0)
    End of sentence normalization coefficient (gamma). If set to 0, no EOS normalization.
  • -dump_input_encoding [<boolean>] (default: false)
    Instead of generating target tokens conditional on the source tokens, we print the representation (encoding/embedding) of the input.
  • -save_beam_to <string> (default: '')
    Path to a file where the beam search exploration will be saved in a JSON format. Requires the dkjson package.

Crayon options

  • -exp_host <string> (default: 127.0.0.1)
    Crayon server IP.
  • -exp_port <string> (default: 8889)
    Crayon server port.
  • -exp <string> (default: '')
    Crayon experiment name.

Cuda options

  • -gpuid <table> (default: 0)
    List of GPU identifiers (1-indexed). CPU is used when set to 0.
  • -fallback_to_cpu [<boolean>] (default: false)
    If GPU can't be used, rollback on the CPU.
  • -fp16 [<boolean>] (default: false)
    Use half-precision float on GPU.
  • -no_nccl [<boolean>] (default: false)
    Disable usage of nccl in parallel mode.

Logger options

  • -log_file <string> (default: '')
    Output logs to a file under this path instead of stdout - if file name ending with json, output structure json.
  • -disable_logs [<boolean>] (default: false)
    If set, output nothing.
  • -log_level <string> (accepted: DEBUG, INFO, WARNING, ERROR, NONE; default: INFO)
    Output logs at this level and above.

HookManager options

  • -hook_file <string> (default: '')
    Pointer to a lua file registering hooks for the current process

Other options

  • -disable_mem_optimization [<boolean>] (default: false)
    Disable sharing of internal buffers between clones for visualization or development.
  • -profiler [<boolean>] (default: false)
    Generate profiling logs.
  • -seed <number> (default: 3435)
    Random seed.