opennmt.models.sequence_tagger module

Sequence tagger.

class opennmt.models.sequence_tagger.SequenceTagger(inputter, encoder, labels_vocabulary_file_key, tagging_scheme=None, crf_decoding=False, daisy_chain_variables=False, name='seqtagger')[source]

Bases: opennmt.models.model.Model

A sequence tagger.

__init__(inputter, encoder, labels_vocabulary_file_key, tagging_scheme=None, crf_decoding=False, daisy_chain_variables=False, name='seqtagger')[source]

Initializes a sequence tagger.

Parameters:
  • inputter – A opennmt.inputters.inputter.Inputter to process the input data.
  • encoder – A opennmt.encoders.encoder.Encoder to encode the input.
  • labels_vocabulary_file_key – The data configuration key of the labels vocabulary file containing one label per line.
  • tagging_scheme – The tagging scheme used. For supported schemes (currently only BIOES), additional evaluation metrics could be computed such as precision, recall, etc.
  • crf_decoding – If True, add a CRF layer after the encoder.
  • daisy_chain_variables – If True, copy variables in a daisy chain between devices for this model. Not compatible with RNN based models.
  • name – The name of this model.
initialize(metadata)[source]

Initializes the model from the data configuration.

Parameters:metadata – A dictionary containing additional data configuration set by the user (e.g. vocabularies, tokenization, pretrained embeddings, etc.).
compute_loss(outputs, labels, training=True, params=None)[source]

Computes the loss.

Parameters:
  • outputs – The model outputs (usually unscaled probabilities).
  • labels – The dict of labels tf.Tensor.
  • training – Compute training loss.
  • params – A dictionary of hyperparameters.
Returns:

The loss or a tuple containing the computed loss and the loss to display.

compute_metrics(predictions, labels)[source]

Computes additional metrics on the predictions.

Parameters:
  • predictions – The model predictions.
  • labels – The dict of labels tf.Tensor.
Returns:

A dict of metrics. See the eval_metric_ops field of tf.estimator.EstimatorSpec.

print_prediction(prediction, params=None, stream=None)[source]

Prints the model prediction.

Parameters:
  • prediction – The evaluated prediction.
  • params – (optional) Dictionary of formatting parameters.
  • stream – (optional) The stream to print to.
class opennmt.models.sequence_tagger.TagsInputter(vocabulary_file_key)[source]

Bases: opennmt.inputters.text_inputter.TextInputter

Reading space-separated tags.

make_features(element=None, features=None, training=None)[source]

Tokenizes raw text.

opennmt.models.sequence_tagger.flag_bioes_tags(gold, predicted, sequence_length=None)[source]

Flags chunk matches for the BIOES tagging scheme.

This function will produce the gold flags and the predicted flags. For each aligned gold flag g and predicted flag p:

  • when g == p == True, the chunk has been correctly identified (true positive).
  • when g == False and p == True, the chunk has been incorrectly identified (false positive).
  • when g == True and p == False, the chunk has been missed (false negative).
  • when g == p == False, the chunk has been correctly ignored (true negative).
Parameters:
  • gold – The gold tags as a Numpy 2D string array.
  • predicted – The predicted tags as a Numpy 2D string array.
  • sequence_length – The length of each sequence as Numpy array.
Returns:

A tuple (gold_flags, predicted_flags).