shuffle_dataset

opennmt.data.shuffle_dataset(buffer_size, shuffle_shards=True, dataset_size=None)[source]

Transformation that shuffles the dataset based on its size.

Example

>>> dataset = tf.data.Dataset.range(6)
>>> dataset = dataset.apply(opennmt.data.shuffle_dataset(3))
>>> list(dataset.as_numpy_iterator())
[2, 3, 1, 0, 4, 5]
Parameters
  • buffer_size – The number of elements from which to sample.

  • shuffle_shards – When buffer_size is smaller than the dataset size, the dataset is first sharded in a random order to add another level of shuffling.

  • dataset_size – If the dataset size is already known, it can be passed here to avoid a slower generic computation of the dataset size later.

Returns

A tf.data.Dataset transformation.