Translate

translate.py

usage: translate.py [-h] [-config CONFIG] [-save_config SAVE_CONFIG] --model
                    MODEL [MODEL ...] [--fp32] [--int8] [--avg_raw_probs]
                    [--data_type DATA_TYPE] --src SRC [--tgt TGT]
                    [--tgt_prefix] [--shard_size SHARD_SIZE] [--output OUTPUT]
                    [--report_align] [--report_time]
                    [--block_ngram_repeat BLOCK_NGRAM_REPEAT]
                    [--ignore_when_blocking IGNORE_WHEN_BLOCKING [IGNORE_WHEN_BLOCKING ...]]
                    [--replace_unk] [--phrase_table PHRASE_TABLE]
                    [--random_sampling_topk RANDOM_SAMPLING_TOPK]
                    [--random_sampling_temp RANDOM_SAMPLING_TEMP]
                    [--seed SEED] [--beam_size BEAM_SIZE]
                    [--min_length MIN_LENGTH] [--max_length MAX_LENGTH]
                    [--max_sent_length] [--stepwise_penalty]
                    [--length_penalty {none,wu,avg}] [--ratio RATIO]
                    [--coverage_penalty {none,wu,summary}] [--alpha ALPHA]
                    [--beta BETA] [--log_file LOG_FILE]
                    [--log_file_level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET,50,40,30,20,10,0}]
                    [--verbose] [--attn_debug] [--align_debug]
                    [--dump_beam DUMP_BEAM] [--n_best N_BEST]
                    [--batch_size BATCH_SIZE] [--batch_type {sents,tokens}]
                    [--gpu GPU]

Configuration

-config, --config

Path of the main YAML config file.

-save_config, --save_config

Path where to save the config.

Model

--model, -model

Path to model .pt file(s). Multiple models can be specified, for ensemble decoding.

Default: []

--fp32, -fp32

Force the model to be in FP32 because FP16 is very slow on GTX1080(ti).

Default: False

--int8, -int8

Enable dynamic 8-bit quantization (CPU only).

Default: False

--avg_raw_probs, -avg_raw_probs

If this is set, during ensembling scores from different models will be combined by averaging their raw probabilities and then taking the log. Otherwise, the log probabilities will be averaged directly. Necessary for models whose output layers can assign zero probability.

Default: False

Data

--data_type, -data_type

Type of the source input. Options: [text].

Default: “text”

--src, -src

Source sequence to decode (one line per sequence)

--tgt, -tgt

True target sequence (optional)

--tgt_prefix, -tgt_prefix

Generate predictions using provided -tgt as prefix.

Default: False

--shard_size, -shard_size

Divide src and tgt (if applicable) into smaller multiple src and tgt files, then build shards, each shard will have opt.shard_size samples except last shard. shard_size=0 means no segmentation shard_size>0 means segment dataset into multiple shards, each shard has shard_size samples

Default: 10000

--output, -output

Path to output the predictions (each line will be the decoded sequence

Default: “pred.txt”

--report_align, -report_align

Report alignment for each translation.

Default: False

--report_time, -report_time

Report some translation time metrics

Default: False

Decoding tricks

--block_ngram_repeat, -block_ngram_repeat

Block repetition of ngrams during decoding.

Default: 0

--ignore_when_blocking, -ignore_when_blocking

Ignore these strings when blocking repeats. You want to block sentence delimiters.

Default: []

--replace_unk, -replace_unk

Replace the generated UNK tokens with the source token that had highest attention weight. If phrase_table is provided, it will look up the identified source token and give the corresponding target token. If it is not provided (or the identified source token does not exist in the table), then it will copy the source token.

Default: False

--phrase_table, -phrase_table

If phrase_table is provided (with replace_unk), it will look up the identified source token and give the corresponding target token. If it is not provided (or the identified source token does not exist in the table), then it will copy the source token.

Default: “”

Random Sampling

--random_sampling_topk, -random_sampling_topk

Set this to -1 to do random sampling from full distribution. Set this to value k>1 to do random sampling restricted to the k most likely next tokens. Set this to 1 to use argmax or for doing beam search.

Default: 1

--random_sampling_temp, -random_sampling_temp

If doing random sampling, divide the logits by this before computing softmax during decoding.

Default: 1.0

Reproducibility

--seed, -seed

Set random seed used for better reproducibility between experiments.

Default: -1

Logging

--log_file, -log_file

Output logs to a file under this path.

Default: “”

--log_file_level, -log_file_level

Possible choices: CRITICAL, ERROR, WARNING, INFO, DEBUG, NOTSET, 50, 40, 30, 20, 10, 0

Default: “0”

--verbose, -verbose

Print scores and predictions for each sentence

Default: False

--attn_debug, -attn_debug

Print best attn for each word

Default: False

--align_debug, -align_debug

Print best align for each word

Default: False

--dump_beam, -dump_beam

File to dump beam information to.

Default: “”

--n_best, -n_best

If verbose is set, will output the n_best decoded sentences

Default: 1

Efficiency

--batch_size, -batch_size

Batch size

Default: 30

--batch_type, -batch_type

Possible choices: sents, tokens

Batch grouping for batch_size. Standard is sents. Tokens will do dynamic batching

Default: “sents”

--gpu, -gpu

Device to run on

Default: -1