See the GPT-2 example in the Transformers guide.
Generator class exposes the method
score_batch which can be used to calculate the perplexity for full sequences. It returns the log-likelihood of each token.
See the WMT19 language model example in the Fairseq guide.
Special tokens such as the decoder start token
<s> should be explicitly included in the input if required by the model. No special tokens are added by the generator methods.
This is different from the translator methods which usually include these special tokens implicitly.