Data Loaders¶
Data Iterator¶
-
class
onmt.inputters.
DynamicDatasetIter
(corpora, corpora_info, transforms, vocabs, task, batch_type, batch_size, batch_size_multiple, data_type='text', bucket_size=2048, bucket_size_init=-1, bucket_size_increment=0, copy=False, skip_empty_level='warning', stride=1, offset=0)[source]¶ Bases:
torch.utils.data.dataset.IterableDataset
Yield batch from (multiple) plain text corpus.
- Parameters
corpora (dict[str, ParallelCorpus]) – collections of corpora to iterate;
corpora_info (dict[str, dict]) – corpora infos correspond to corpora;
transforms (dict[str, Transform]) – transforms may be used by corpora;
vocabs (dict[str, Vocab]) – vocab dict for convert corpora into Tensor;
task (str) – CorpusTask.TRAIN/VALID/INFER;
batch_type (str) – batching type to count on, choices=[tokens, sents];
batch_size (int) – numbers of examples in a batch;
batch_size_multiple (int) – make batch size multiply of this;
data_type (str) – input data type, currently only text;
bucket_size (int) – accum this number of examples in a dynamic dataset;
bucket_size_init (int) – initialize the bucket with this
of examples; (amount) –
bucket_size_increment (int) – increment the bucket
with this amount of examples; (size) –
copy (Bool) – if True, will add specific items for copy_attn
skip_empty_level (str) – security level when encouter empty line;
stride (int) – iterate data files with this stride;
offset (int) – iterate data files with this offset.
- Variables
batch_size_fn (function) – functions to calculate batch_size;
sort_key (function) – functions define how to sort examples;
mixer (MixingStrategy) – the strategy to iterate corpora.
-
class
onmt.inputters.
MixingStrategy
(iterables, weights)[source]¶ Bases:
object
Mixing strategy that should be used in Data Iterator.
Dataset¶
-
class
onmt.inputters.
ParallelCorpus
(name, src, tgt, align=None, src_feats=None)[source]¶ Bases:
object
A parallel corpus file pair that can be loaded to iterate.
-
class
onmt.inputters.
ParallelCorpusIterator
(corpus, transform, skip_empty_level='warning', stride=1, offset=0)[source]¶ Bases:
object
An iterator dedicated to ParallelCorpus.
- Parameters
corpus (ParallelCorpus) – corpus to iterate;
transform (TransformPipe) – transforms to be applied to corpus;
skip_empty_level (str) – security level when encouter empty line;
stride (int) – iterate corpus with this line stride;
offset (int) – iterate corpus with this line offset.