load_pretrained_embeddings
- opennmt.inputters.load_pretrained_embeddings(embedding_file, vocabulary_file, num_oov_buckets=1, with_header=True, case_insensitive_embeddings=True)[source]
Returns pretrained embeddings relative to the vocabulary.
The
embedding_filemust have the following format:N M word1 val1 val2 ... valM word2 val1 val2 ... valM ... wordN val1 val2 ... valM
or if
with_headerisFalse:word1 val1 val2 ... valM word2 val1 val2 ... valM ... wordN val1 val2 ... valM
This function will iterate on each embedding in
embedding_fileand assign the pretrained vector to the associated word invocabulary_fileif found. Otherwise, the embedding is ignored.If
case_insensitive_embeddingsisTrue, word embeddings are assumed to be trained on lowercase data. In that case, word alignments are case insensitive meaning the pretrained word embedding for “the” will be assigned to “the”, “The”, “THE”, or any other case variants included invocabulary_file.- Parameters
embedding_file – Path the embedding file. Entries will be matched against
vocabulary_file.vocabulary_file – The vocabulary file containing one word per line.
num_oov_buckets – The number of additional unknown tokens.
with_header –
Trueif the embedding file starts with a header line like in GloVe embedding files.case_insensitive_embeddings –
Trueif embeddings are trained on lowercase data.
- Returns
A Numpy array of shape
[vocabulary_size + num_oov_buckets, embedding_size].