Hello,
My config file is similiar as the example as it is stated in the website. All the files were tokenized.
Here is the traceback msg with the error:
Traceback (most recent call last):
File “C:####Python\Python310\site-packages\torch\utils\data_utils\worker.py”, line 302, in _worker_loop
data = fetcher.fetch(index)
File “C:####Python\Python310\site-packages\torch\utils\data_utils\fetch.py”, line 43, in fetch
data = next(self.dataset_iter)
File “C:####Python\Python310\site-packages\onmt\inputters\dynamic_iterator.py”, line 290, in iter
for bucket in self._bucketing():
File “C:####Python\Python310\site-packages\onmt\inputters\dynamic_iterator.py”, line 231, in _bucketing
for ex in self.mixer:
File “C:####Python\Python310\site-packages\onmt\inputters\dynamic_iterator.py”, line 83, in iter
item = next(iterator)
File “C:####Python\Python310\site-packages\onmt\inputters\text_corpus.py”, line 209, in iter
yield from indexed_corpus
File “C:####Python\Python310\site-packages\onmt\inputters\text_corpus.py”, line 186, in _add_index
for i, item in enumerate(stream):
File “C:####Python\Python310\site-packages\onmt\inputters\text_corpus.py”, line 170, in _transform
for example in stream:
File “C:####Python\Python310\site-packages\onmt\inputters\text_corpus.py”, line 153, in _tokenize
for example in stream:
File “C:####Python\Python310\site-packages\onmt\inputters\text_corpus.py”, line 69, in load
sline = sline.decode(‘utf-8’)
MemoryError
should I start reducing the training datasets? There were only 65k sentences in the training sets.