How does one actually construct a dataset in the present version? Earlier on, it could be done with only the path to the desired text file. Now however, we need the following in onmt.inputters.Dataset
Args:
fields (dict[str, Field]): a dict with the structure
returned by :func:`onmt.inputters.get_fields()`. Usually
that means the dataset side, ``"src"`` or ``"tgt"``. Keys match
the keys of items yielded by the ``readers``, while values
are lists of (name, Field) pairs. An attribute with this
name will be created for each :class:`torchtext.data.Example`
object and its value will be the result of applying the Field
to the data that matches the key. The advantage of having
sequences of fields for each piece of raw input is that it allows
the dataset to store multiple "views" of each input, which allows
for easy implementation of token-level features, mixed word-
and character-level models, and so on. (See also
:class:`onmt.inputters.TextMultiField`.)
readers (Iterable[onmt.inputters.DataReaderBase]): Reader objects
for disk-to-dict. The yielded dicts are then processed
according to ``fields``.
data (Iterable[Tuple[str, Any]]): (name, ``data_arg``) pairs
where ``data_arg`` is passed to the ``read()`` method of the
reader in ``readers`` at that position. (See the reader object for
details on the ``Any`` type.)
dirs (Iterable[str or NoneType]): A list of directories where
data is contained. See the reader object for more details.
sort_key (Callable[[torchtext.data.Example], Any]): A function
for determining the value on which data is sorted (i.e. length).
filter_pred (Callable[[torchtext.data.Example], bool]): A function
that accepts Example objects and returns a boolean value
indicating whether to include that example in the dataset.Download
So how to set this up if all I have is a text file? Do I have to make an iterator first for the data argument and if so, how?