load_pretrained_embeddings
- opennmt.inputters.load_pretrained_embeddings(embedding_file, vocabulary_file, num_oov_buckets=1, with_header=True, case_insensitive_embeddings=True)[source]
Returns pretrained embeddings relative to the vocabulary.
The
embedding_file
must have the following format:N M word1 val1 val2 ... valM word2 val1 val2 ... valM ... wordN val1 val2 ... valM
or if
with_header
isFalse
:word1 val1 val2 ... valM word2 val1 val2 ... valM ... wordN val1 val2 ... valM
This function will iterate on each embedding in
embedding_file
and assign the pretrained vector to the associated word invocabulary_file
if found. Otherwise, the embedding is ignored.If
case_insensitive_embeddings
isTrue
, word embeddings are assumed to be trained on lowercase data. In that case, word alignments are case insensitive meaning the pretrained word embedding for “the” will be assigned to “the”, “The”, “THE”, or any other case variants included invocabulary_file
.- Parameters
embedding_file – Path the embedding file. Entries will be matched against
vocabulary_file
.vocabulary_file – The vocabulary file containing one word per line.
num_oov_buckets – The number of additional unknown tokens.
with_header –
True
if the embedding file starts with a header line like in GloVe embedding files.case_insensitive_embeddings –
True
if embeddings are trained on lowercase data.
- Returns
A Numpy array of shape
[vocabulary_size + num_oov_buckets, embedding_size]
.