Translate¶

translate.py

usage: translate.py [-h] [-config CONFIG] [-save_config SAVE_CONFIG] --model
                    MODEL [MODEL ...] [--precision {,fp32,fp16,int8}] [--fp32]
                    [--int8] [--avg_raw_probs]
                    [--self_attn_type SELF_ATTN_TYPE] [--data_type DATA_TYPE]
                    --src SRC [--tgt TGT] [--tgt_file_prefix]
                    [--output OUTPUT] [--report_align] [--gold_align]
                    [--report_time] [--profile] [-n_src_feats N_SRC_FEATS]
                    [-src_feats_defaults SRC_FEATS_DEFAULTS]
                    [--beam_size BEAM_SIZE] [--ratio RATIO]
                    [--random_sampling_topk RANDOM_SAMPLING_TOPK]
                    [--random_sampling_topp RANDOM_SAMPLING_TOPP]
                    [--random_sampling_temp RANDOM_SAMPLING_TEMP]
                    [--seed SEED] [--length_penalty {none,wu,avg}]
                    [--alpha ALPHA] [--coverage_penalty {none,wu,summary}]
                    [--beta BETA] [--stepwise_penalty]
                    [--min_length MIN_LENGTH] [--max_length MAX_LENGTH]
                    [--max_length_ratio MAX_LENGTH_RATIO]
                    [--block_ngram_repeat BLOCK_NGRAM_REPEAT]
                    [--ignore_when_blocking IGNORE_WHEN_BLOCKING [IGNORE_WHEN_BLOCKING ...]]
                    [--replace_unk] [--ban_unk_token]
                    [--phrase_table PHRASE_TABLE] [--log_file LOG_FILE]
                    [--log_file_level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET,50,40,30,20,10,0}]
                    [--verbose] [--attn_debug] [--align_debug]
                    [--dump_beam DUMP_BEAM] [--n_best N_BEST] [--with_score]
                    [--gpu_ranks [GPU_RANKS ...]] [--world_size WORLD_SIZE]
                    [--parallel_mode {tensor_parallel,data_parallel}]
                    [--gpu_backend GPU_BACKEND]
                    [--gpu_verbose_level GPU_VERBOSE_LEVEL]
                    [--master_ip MASTER_IP] [--master_port MASTER_PORT]
                    [--timeout TIMEOUT] [--batch_size BATCH_SIZE]
                    [--batch_type {sents,tokens}] [--gpu GPU]
                    [-transforms {insert_mask_before_placeholder,uppercase,inlinetags,bart,terminology,docify,inferfeats,filtertoolong,prefix,suffix,fuzzymatch,clean,switchout,tokendrop,tokenmask,sentencepiece,bpe,onmt_tokenize,normalize} [{insert_mask_before_placeholder,uppercase,inlinetags,bart,terminology,docify,inferfeats,filtertoolong,prefix,suffix,fuzzymatch,clean,switchout,tokendrop,tokenmask,sentencepiece,bpe,onmt_tokenize,normalize} ...]]
                    [--response_patterns RESPONSE_PATTERNS [RESPONSE_PATTERNS ...]]
                    [--upper_corpus_ratio UPPER_CORPUS_RATIO]
                    [--tags_dictionary_path TAGS_DICTIONARY_PATH]
                    [--tags_corpus_ratio TAGS_CORPUS_RATIO]
                    [--max_tags MAX_TAGS] [--paired_stag PAIRED_STAG]
                    [--paired_etag PAIRED_ETAG] [--isolated_tag ISOLATED_TAG]
                    [--src_delimiter SRC_DELIMITER]
                    [--permute_sent_ratio PERMUTE_SENT_RATIO]
                    [--rotate_ratio ROTATE_RATIO]
                    [--insert_ratio INSERT_RATIO]
                    [--random_ratio RANDOM_RATIO] [--mask_ratio MASK_RATIO]
                    [--mask_length {subword,word,span-poisson}]
                    [--poisson_lambda POISSON_LAMBDA]
                    [--replace_length {-1,0,1}]
                    [--termbase_path TERMBASE_PATH]
                    [--src_spacy_language_model SRC_SPACY_LANGUAGE_MODEL]
                    [--tgt_spacy_language_model TGT_SPACY_LANGUAGE_MODEL]
                    [--term_corpus_ratio TERM_CORPUS_RATIO]
                    [--term_example_ratio TERM_EXAMPLE_RATIO]
                    [--src_term_stoken SRC_TERM_STOKEN]
                    [--tgt_term_stoken TGT_TERM_STOKEN]
                    [--tgt_term_etoken TGT_TERM_ETOKEN]
                    [--term_source_delimiter TERM_SOURCE_DELIMITER]
                    [--doc_length DOC_LENGTH] [--max_context MAX_CONTEXT]
                    [--reversible_tokenization {joiner,spacer}]
                    [--src_seq_length SRC_SEQ_LENGTH]
                    [--tgt_seq_length TGT_SEQ_LENGTH]
                    [--src_prefix SRC_PREFIX] [--tgt_prefix TGT_PREFIX]
                    [--src_suffix SRC_SUFFIX] [--tgt_suffix TGT_SUFFIX]
                    [--tm_path TM_PATH]
                    [--fuzzy_corpus_ratio FUZZY_CORPUS_RATIO]
                    [--fuzzy_threshold FUZZY_THRESHOLD]
                    [--tm_delimiter TM_DELIMITER] [--fuzzy_token FUZZY_TOKEN]
                    [--fuzzymatch_min_length FUZZYMATCH_MIN_LENGTH]
                    [--fuzzymatch_max_length FUZZYMATCH_MAX_LENGTH]
                    [--src_eq_tgt] [--same_char] [--same_word]
                    [--scripts_ok [SCRIPTS_OK ...]]
                    [--scripts_nok [SCRIPTS_NOK ...]]
                    [--src_tgt_ratio SRC_TGT_RATIO]
                    [--avg_tok_min AVG_TOK_MIN] [--avg_tok_max AVG_TOK_MAX]
                    [--langid [LANGID ...]]
                    [-switchout_temperature SWITCHOUT_TEMPERATURE]
                    [-tokendrop_temperature TOKENDROP_TEMPERATURE]
                    [-tokenmask_temperature TOKENMASK_TEMPERATURE]
                    [-src_subword_model SRC_SUBWORD_MODEL]
                    [-tgt_subword_model TGT_SUBWORD_MODEL]
                    [-src_subword_nbest SRC_SUBWORD_NBEST]
                    [-tgt_subword_nbest TGT_SUBWORD_NBEST]
                    [-src_subword_alpha SRC_SUBWORD_ALPHA]
                    [-tgt_subword_alpha TGT_SUBWORD_ALPHA]
                    [-src_subword_vocab SRC_SUBWORD_VOCAB]
                    [-tgt_subword_vocab TGT_SUBWORD_VOCAB]
                    [-src_vocab_threshold SRC_VOCAB_THRESHOLD]
                    [-tgt_vocab_threshold TGT_VOCAB_THRESHOLD]
                    [-src_subword_type {none,sentencepiece,bpe}]
                    [-tgt_subword_type {none,sentencepiece,bpe}]
                    [-src_onmttok_kwargs SRC_ONMTTOK_KWARGS]
                    [-tgt_onmttok_kwargs TGT_ONMTTOK_KWARGS] [--gpt2_pretok]
                    [--src_lang SRC_LANG] [--tgt_lang TGT_LANG] [--penn PENN]
                    [--norm_quote_commas NORM_QUOTE_COMMAS]
                    [--norm_numbers NORM_NUMBERS]
                    [--pre_replace_unicode_punct PRE_REPLACE_UNICODE_PUNCT]
                    [--post_remove_control_chars POST_REMOVE_CONTROL_CHARS]
                    [--quant_layers QUANT_LAYERS [QUANT_LAYERS ...]]
                    [--quant_type {,bnb_8bit,bnb_FP4,bnb_NF4,awq_gemm,awq_gemv}]
                    [--w_bit {4}] [--group_size {128}]

Configuration¶

-config, --config: Path of the main YAML config file.
-save_config, --save_config: Path where to save the config.

Model¶

--model, -model

Path to model .pt file(s). Multiple models can be specified, for ensemble decoding.

Default: []

--precision, -precision

Possible choices: , fp32, fp16, int8

Precision to run inference.default is model.dtypefp32 to force slow FP16 model on GTX1080int8 enables pytorch native 8-bit quantization(cpu only)

Default: “”

--fp32, -fp32

Deprecated use ‘precision’ instead

--int8, -int8

Deprecated use ‘precision’ instead

--avg_raw_probs, -avg_raw_probs

If this is set, during ensembling scores from different models will be combined by averaging their raw probabilities and then taking the log. Otherwise, the log probabilities will be averaged directly. Necessary for models whose output layers can assign zero probability.

Default: False

--self_attn_type, -self_attn_type

Self attention type in Transformer decoder layer – currently “scaled-dot”, “scaled-dot-flash” or “average”

Default: “scaled-dot-flash”

Data¶

--data_type, -data_type

Type of the source input. Options: [text].

Default: “text”

--src, -src

Source sequence to decode (one line per sequence)

--tgt, -tgt

True target sequence (optional)

--tgt_file_prefix, -tgt_file_prefix

Generate predictions using provided -tgt as prefix.

Default: False

--output, -output

Path to output the predictions (each line will be the decoded sequence

Default: “pred.txt”

--report_align, -report_align

Report alignment for each translation.

Default: False

--gold_align, -gold_align

Report alignment between source and gold target.Useful to test the performance of learnt alignments.

Default: False

--report_time, -report_time

Report some translation time metrics

Default: False

--profile, -profile

Report pytorch profiling stats

Default: False

Features¶

-n_src_feats, --n_src_feats

Number of source feats.

Default: 0

-src_feats_defaults, --src_feats_defaults

Default features to apply in source in case there are not annotated

Beam Search¶

--beam_size, -beam_size

Beam size

Default: 5

--ratio, -ratio

Ratio based beam stop condition

Default: -0.0

Random Sampling¶

--random_sampling_topk, -random_sampling_topk

Set this to -1 to do random sampling from full distribution. Set this to value k>1 to do random sampling restricted to the k most likely next tokens. Set this to 1 to use argmax.

Default: 0

--random_sampling_topp, -random_sampling_topp

Probability for top-p/nucleus sampling. Restrict tokens to the most likely until the cumulated probability is over p. In range [0, 1]. https://arxiv.org/abs/1904.09751

Default: 0.0

--random_sampling_temp, -random_sampling_temp

If doing random sampling, divide the logits by this before computing softmax during decoding.

Default: 1.0

--beam_size, -beam_size

Beam size

Default: 5

Reproducibility¶

--seed, -seed

Set random seed used for better reproducibility between experiments.

Default: -1

Penalties¶

Note

Coverage Penalty is not available in sampling.

--length_penalty, -length_penalty

Possible choices: none, wu, avg

Length Penalty to use.

Default: “avg”

--alpha, -alpha

Length penalty parameter(higher = longer generation)

Default: 1.0

--coverage_penalty, -coverage_penalty

Possible choices: none, wu, summary

Coverage Penalty to use. Only available in beam search.

Default: “none”

--beta, -beta

Coverage penalty parameter

Default: -0.0

--stepwise_penalty, -stepwise_penalty

Apply coverage penalty at every decoding step. Helpful for summary penalty.

Default: False

Decoding tricks¶

Tip

Following options can be used to limit the decoding length or content.

--min_length, -min_length

Minimum prediction length

Default: 0

--max_length, -max_length

Maximum prediction length.

Default: 250

--max_length_ratio, -max_length_ratio

Maximum prediction length ratio.for European languages 1.25 is large enoughfor target Asian characters need to increase to 2-3for special languages (burmese, amharic) to 10

Default: 1.25

--block_ngram_repeat, -block_ngram_repeat

Block repetition of ngrams during decoding.

Default: 0

--ignore_when_blocking, -ignore_when_blocking

Ignore these strings when blocking repeats. You want to block sentence delimiters.

Default: []

--replace_unk, -replace_unk

Replace the generated UNK tokens with the source token that had highest attention weight. If phrase_table is provided, it will look up the identified source token and give the corresponding target token. If it is not provided (or the identified source token does not exist in the table), then it will copy the source token.

Default: False

--ban_unk_token, -ban_unk_token

Prevent unk token generation by setting unk proba to 0

Default: False

--phrase_table, -phrase_table

If phrase_table is provided (with replace_unk), it will look up the identified source token and give the corresponding target token. If it is not provided (or the identified source token does not exist in the table), then it will copy the source token.

Default: “”

Logging¶

--log_file, -log_file

Output logs to a file under this path.

Default: “”

--log_file_level, -log_file_level

Possible choices: CRITICAL, ERROR, WARNING, INFO, DEBUG, NOTSET, 50, 40, 30, 20, 10, 0

Default: “0”

--verbose, -verbose

Print scores and predictions for each sentence

Default: False

--attn_debug, -attn_debug

Print best attn for each word

Default: False

--align_debug, -align_debug

Print best align for each word

Default: False

--dump_beam, -dump_beam

File to dump beam information to.

Default: “”

--n_best, -n_best

If verbose is set, will output the n_best decoded sentences

Default: 1

--with_score, -with_score

add a tab separated score to the translation

Default: False

Distributed¶

--gpu_ranks, -gpu_ranks

list of ranks of each process.

Default: []

--world_size, -world_size

total number of distributed processes.

Default: 1

--parallel_mode, -parallel_mode

Possible choices: tensor_parallel, data_parallel

Distributed mode.

Default: “data_parallel”

--gpu_backend, -gpu_backend

Type of torch distributed backend

Default: “nccl”

--gpu_verbose_level, -gpu_verbose_level

Gives more info on each process per GPU.

Default: 0

--master_ip, -master_ip

IP of master for torch.distributed training.

Default: “localhost”

--master_port, -master_port

Port of master for torch.distributed training.

Default: 10000

--timeout, -timeout

Timeout for one GOU to wait for the others.

Default: 60

Efficiency¶

--batch_size, -batch_size

Batch size

Default: 30

--batch_type, -batch_type

Possible choices: sents, tokens

Batch grouping for batch_size. Standard is sents. Tokens will do dynamic batching

Default: “sents”

--gpu, -gpu

Device to run on

Default: -1

-transforms, --transforms

Possible choices: insert_mask_before_placeholder, uppercase, inlinetags, bart, terminology, docify, inferfeats, filtertoolong, prefix, suffix, fuzzymatch, clean, switchout, tokendrop, tokenmask, sentencepiece, bpe, onmt_tokenize, normalize

Default transform pipeline to apply to data.

Default: []

Transform/InsertMaskBeforePlaceholdersTransform¶

--response_patterns, -response_patterns

Response patten to locate the end of the prompt

Default: [‘Response : ｟newline｠’]

Transform/Uppercase¶

--upper_corpus_ratio, -upper_corpus_ratio

Corpus ratio to apply uppercasing.

Default: 0.01

Transform/InlineTags¶

--tags_dictionary_path, -tags_dictionary_path

Path to a flat term dictionary.

--tags_corpus_ratio, -tags_corpus_ratio

Ratio of corpus to augment with tags.

Default: 0.1

--max_tags, -max_tags

Maximum number of tags that can be added to a single sentence.

Default: 12

--paired_stag, -paired_stag

The format of an opening paired inline tag. Must include the character #.

Default: “｟ph_#_beg｠”

--paired_etag, -paired_etag

The format of a closing paired inline tag. Must include the character #.

Default: “｟ph_#_end｠”

--isolated_tag, -isolated_tag

The format of an isolated inline tag. Must include the character #.

Default: “｟ph_#_std｠”

--src_delimiter, -src_delimiter

Any special token used for augmented src sentences. The default is the fuzzy token used in the FuzzyMatch transform.

Default: “｟fuzzy｠”

Transform/BART¶

--permute_sent_ratio, -permute_sent_ratio

Permute this proportion of sentences (boundaries defined by [‘.’, ‘?’, ‘!’]) in all inputs.

Default: 0.0

--rotate_ratio, -rotate_ratio

Rotate this proportion of inputs.

Default: 0.0

--insert_ratio, -insert_ratio

Insert this percentage of additional random tokens.

Default: 0.0

--random_ratio, -random_ratio

Instead of using <mask>, use random token this often.

Default: 0.0

--mask_ratio, -mask_ratio

Fraction of words/subwords that will be masked.

Default: 0.0

--mask_length, -mask_length

Possible choices: subword, word, span-poisson

Length of masking window to apply.

Default: “subword”

--poisson_lambda, -poisson_lambda

Lambda for Poisson distribution to sample span length if -mask_length set to span-poisson.

Default: 3.0

--replace_length, -replace_length

Possible choices: -1, 0, 1

When masking N tokens, replace with 0, 1, or N tokens. (use -1 for N)

Default: -1

Transform/Terminology¶

--termbase_path, -termbase_path

Path to a dictionary file with terms.

--src_spacy_language_model, -src_spacy_language_model

Name of the spacy language model for the source corpus.

--tgt_spacy_language_model, -tgt_spacy_language_model

Name of the spacy language model for the target corpus.

--term_corpus_ratio, -term_corpus_ratio

Ratio of corpus to augment with terms.

Default: 0.3

--term_example_ratio, -term_example_ratio

Max terms allowed in an example.

Default: 0.2

--src_term_stoken, -src_term_stoken

The source term start token.

Default: “｟src_term_start｠”

--tgt_term_stoken, -tgt_term_stoken

The target term start token.

Default: “｟tgt_term_start｠”

--tgt_term_etoken, -tgt_term_etoken

The target term end token.

Default: “｟tgt_term_end｠”

--term_source_delimiter, -term_source_delimiter

Any special token used for augmented source sentences. The default is the fuzzy token used in the FuzzyMatch transform.

Default: “｟fuzzy｠”

Transform/Docify¶

--doc_length, -doc_length

Number of tokens per doc.

Default: 200

--max_context, -max_context

Max context segments.

Default: 1

Transform/InferFeats¶

--reversible_tokenization, -reversible_tokenization

Possible choices: joiner, spacer

Type of reversible tokenization applied on the tokenizer.

Default: “joiner”

Transform/Filter¶

--src_seq_length, -src_seq_length

Maximum source sequence length.

Default: 192

--tgt_seq_length, -tgt_seq_length

Maximum target sequence length.

Default: 192

Transform/Prefix¶

--src_prefix, -src_prefix

String to prepend to all source example.

Default: “”

--tgt_prefix, -tgt_prefix

String to prepend to all target example.

Default: “”

Transform/Suffix¶

--src_suffix, -src_suffix

String to append to all source example.

Default: “”

--tgt_suffix, -tgt_suffix

String to append to all target example.

Default: “”

Transform/FuzzyMatching¶

--tm_path, -tm_path

Path to a flat text TM.

--fuzzy_corpus_ratio, -fuzzy_corpus_ratio

Ratio of corpus to augment with fuzzy matches.

Default: 0.1

--fuzzy_threshold, -fuzzy_threshold

The fuzzy matching threshold.

Default: 70

--tm_delimiter, -tm_delimiter

The delimiter used in the flat text TM.

Default: “ “

--fuzzy_token, -fuzzy_token

The fuzzy token to be added with the matches.

Default: “｟fuzzy｠”

--fuzzymatch_min_length, -fuzzymatch_min_length

Min length for TM entries and examples to match.

Default: 4

--fuzzymatch_max_length, -fuzzymatch_max_length

Max length for TM entries and examples to match.

Default: 70

Transform/Clean¶

--src_eq_tgt, -src_eq_tgt

Remove ex src==tgt

Default: False

--same_char, -same_char

Remove ex with same char more than 4 times

Default: False

--same_word, -same_word

Remove ex with same word more than 3 times

Default: False

--scripts_ok, -scripts_ok

list of unicodata scripts accepted

Default: [‘Latin’, ‘Common’]

--scripts_nok, -scripts_nok

list of unicodata scripts not accepted

Default: []

--src_tgt_ratio, -src_tgt_ratio

ratio between src and tgt

Default: 2

--avg_tok_min, -avg_tok_min

average length of tokens min

Default: 3

--avg_tok_max, -avg_tok_max

average length of tokens max

Default: 20

--langid, -langid

list of languages accepted

Default: []

Transform/SwitchOut¶

-switchout_temperature, --switchout_temperature

Sampling temperature for SwitchOut. \(\tau^{-1}\) in [WPDN18]. Smaller value makes data more diverse.

Default: 1.0

Transform/Token_Drop¶

-tokendrop_temperature, --tokendrop_temperature

Sampling temperature for token deletion.

Default: 1.0

Transform/Token_Mask¶

-tokenmask_temperature, --tokenmask_temperature

Sampling temperature for token masking.

Default: 1.0

Transform/Subword/Common¶

Attention

Common options shared by all subword transforms. Including options for indicate subword model path, Subword Regularization/BPE-Dropout, and Vocabulary Restriction.

-src_subword_model, --src_subword_model

Path of subword model for src (or shared).

-tgt_subword_model, --tgt_subword_model

Path of subword model for tgt.

-src_subword_nbest, --src_subword_nbest

Number of candidates in subword regularization. Valid for unigram sampling, invalid for BPE-dropout. (source side)

Default: 1

-tgt_subword_nbest, --tgt_subword_nbest

Number of candidates in subword regularization. Valid for unigram sampling, invalid for BPE-dropout. (target side)

Default: 1

-src_subword_alpha, --src_subword_alpha

Smoothing parameter for sentencepiece unigram sampling, and dropout probability for BPE-dropout. (source side)

Default: 0

-tgt_subword_alpha, --tgt_subword_alpha

Smoothing parameter for sentencepiece unigram sampling, and dropout probability for BPE-dropout. (target side)

Default: 0

-src_subword_vocab, --src_subword_vocab

Path to the vocabulary file for src subword. Format: <word> <count> per line.

Default: “”

-tgt_subword_vocab, --tgt_subword_vocab

Path to the vocabulary file for tgt subword. Format: <word> <count> per line.

Default: “”

-src_vocab_threshold, --src_vocab_threshold

Only produce src subword in src_subword_vocab with frequency >= src_vocab_threshold.

Default: 0

-tgt_vocab_threshold, --tgt_vocab_threshold

Only produce tgt subword in tgt_subword_vocab with frequency >= tgt_vocab_threshold.

Default: 0

Transform/Subword/ONMTTOK¶

-src_subword_type, --src_subword_type

Possible choices: none, sentencepiece, bpe

Type of subword model for src (or shared) in pyonmttok.

Default: “none”

-tgt_subword_type, --tgt_subword_type

Possible choices: none, sentencepiece, bpe

Type of subword model for tgt in pyonmttok.

Default: “none”

-src_onmttok_kwargs, --src_onmttok_kwargs

Other pyonmttok options for src in dict string, except subword related options listed earlier.

Default: “{‘mode’: ‘none’}”

-tgt_onmttok_kwargs, --tgt_onmttok_kwargs

Other pyonmttok options for tgt in dict string, except subword related options listed earlier.

Default: “{‘mode’: ‘none’}”

--gpt2_pretok, -gpt2_pretok

Preprocess sentence with byte-level mapping

Default: False

Transform/Normalize¶

--src_lang, -src_lang

Source language code

Default: “”

--tgt_lang, -tgt_lang

Target language code

Default: “”

--penn, -penn

Penn substitution

Default: True

--norm_quote_commas, -norm_quote_commas

Normalize quotations and commas

Default: True

--norm_numbers, -norm_numbers

Normalize numbers

Default: True

--pre_replace_unicode_punct, -pre_replace_unicode_punct

Replace unicode punct

Default: False

--post_remove_control_chars, -post_remove_control_chars

Remove control chars

Default: False

Quant options¶

--quant_layers, -quant_layers

list of layers to be compressed in 4/8bit.

Default: []

--quant_type, -quant_type

Possible choices: , bnb_8bit, bnb_FP4, bnb_NF4, awq_gemm, awq_gemv

Type of compression.

Default: “”

--w_bit, -w_bit

Possible choices: 4

W_bit quantization.

Default: 4

--group_size, -group_size

Possible choices: 128

group size quantization.

Default: 128