For instance there may be a batch of instances where A is padding.

AXXXAA
AXXAAA
AXXXXX


MaskedSoftmax ensures that no probability is given to the A's.

For this example, beamSize is 3, sourceLength is {3, 2, 5}.

• sourceSizes - the true lengths (with left padding).
• sourceLength - the max length in the batch beamSize.