Data Loaders¶
Data Iterator¶
- class onmt.inputters.DynamicDatasetIter(corpora, corpora_info, transforms, vocabs, task, batch_type, batch_size, batch_size_multiple, data_type='text', bucket_size=2048, bucket_size_init=-1, bucket_size_increment=0, copy=False, device=device(type='cpu'), skip_empty_level='warning', stride=1, offset=0)[source]¶
Bases:
IterableDataset
Yield batch from (multiple) plain text corpus.
- Parameters:
corpora (dict[str, ParallelCorpus]) – collections of corpora to iterate;
corpora_info (dict[str, dict]) – corpora infos correspond to corpora;
transforms (dict[str, Transform]) – transforms may be used by corpora;
vocabs (dict[str, Vocab]) – vocab dict for convert corpora into Tensor;
task (str) – CorpusTask.TRAIN/VALID/INFER;
batch_type (str) – batching type to count on, choices=[tokens, sents];
batch_size (int) – numbers of examples in a batch;
batch_size_multiple (int) – make batch size multiply of this;
data_type (str) – input data type, currently only text;
bucket_size (int) – accum this number of examples in a dynamic dataset;
bucket_size_init (int) – initialize the bucket with this
examples; (size with this amount of) –
bucket_size_increment (int) – increment the bucket
examples; –
copy (Bool) – if True, will add specific items for copy_attn
skip_empty_level (str) – security level when encouter empty line;
stride (int) – iterate data files with this stride;
offset (int) – iterate data files with this offset.
- Variables:
sort_key (function) – functions define how to sort examples;
mixer (MixingStrategy) – the strategy to iterate corpora.
- class onmt.inputters.MixingStrategy(iterables, weights)[source]¶
Bases:
object
Mixing strategy that should be used in Data Iterator.
- class onmt.inputters.SequentialMixer(iterables, weights)[source]¶
Bases:
MixingStrategy
Generate data sequentially from iterables which is exhaustible.
- class onmt.inputters.WeightedMixer(iterables, weights)[source]¶
Bases:
MixingStrategy
A mixing strategy that mix data weightedly and iterate infinitely.
Dataset¶
- class onmt.inputters.ParallelCorpus(name, src, tgt, align=None, n_src_feats=0, src_feats_defaults=None)[source]¶
Bases:
object
A parallel corpus file pair that can be loaded to iterate.
- class onmt.inputters.ParallelCorpusIterator(corpus, transform, skip_empty_level='warning', stride=1, offset=0)[source]¶
Bases:
object
An iterator dedicated to ParallelCorpus.
- Parameters:
corpus (ParallelCorpus) – corpus to iterate;
transform (TransformPipe) – transforms to be applied to corpus;
skip_empty_level (str) – security level when encouter empty line;
stride (int) – iterate corpus with this line stride;
offset (int) – iterate corpus with this line offset.