Thank you! The 1.3B NLLB is working using LoRa. But I had to reduce the batch_size to 256 and there was OOM at some step. You used 384 batch_size. What is with 384? it doesn’t look a random number.
[2023-05-18 08:51:11,186 INFO] Get prefix for cc-matrix-enzh: {‘src’: ‘ eng_Latn’, ‘tgt’: ‘gez_Ethi’}
[2023-05-18 08:51:11,186 INFO] Get prefix for src infer:
[2023-05-18 08:51:11,186 INFO] Get prefix for tgt infer:
[2023-05-18 08:51:11,186 INFO] Get suffix for cc-matrix-enzh: {‘src’: ‘’, ‘tgt’: ‘’}
[2023-05-18 08:51:11,186 INFO] Get suffix for src infer:
[2023-05-18 08:51:11,186 INFO] Get suffix for tgt infer:
[2023-05-18 08:51:11,266 INFO] Get prefix for cc-matrix-enzh: {‘src’: ‘ eng_Latn’, ‘tgt’: ‘gez_Ethi’}
[2023-05-18 08:51:11,266 INFO] Get prefix for src infer:
[2023-05-18 08:51:11,266 INFO] Get prefix for tgt infer:
[2023-05-18 08:51:11,309 INFO] Starting training on GPU: [0]
[2023-05-18 08:51:11,309 INFO] Start training loop without validation…
[2023-05-18 08:51:11,309 INFO] Scoring with: TransformPipe()
[2023-05-18 08:52:43,343 INFO] Step 3, cuda OOM - batch removed
[2023-05-18 08:52:43,394 INFO] Step 3, cuda OOM - batch removed
[2023-05-18 08:52:43,436 INFO] Step 3, cuda OOM - batch removed
[2023-05-18 08:52:43,479 INFO] Step 3, cuda OOM - batch removed
[2023-05-18 08:52:43,522 INFO] Step 3, cuda OOM - batch removed
[2023-05-18 08:52:43,564 INFO] Step 3, cuda OOM - batch removed
[2023-05-18 08:52:43,603 INFO] Step 3, cuda OOM - batch removed
[2023-05-18 08:52:43,646 INFO] Step 3, cuda OOM - batch removed
[2023-05-18 08:52:43,690 INFO] Step 3, cuda OOM - batch removed
[2023-05-18 08:52:43,735 INFO] Step 3, cuda OOM - batch removed
[2023-05-18 08:52:43,777 INFO] Step 3, cuda OOM - batch removed
[2023-05-18 08:52:43,821 INFO] Step 3, cuda OOM - batch removed
[2023-05-18 08:52:43,863 INFO] Step 3, cuda OOM - batch removed
[2023-05-18 08:52:43,906 INFO] Step 3, cuda OOM - batch removed
[2023-05-18 08:52:43,947 INFO] Step 3, cuda OOM - batch removed
[2023-05-18 08:52:43,987 INFO] Step 3, cuda OOM - batch removed
[2023-05-18 08:52:44,027 INFO] Step 3, cuda OOM - batch removed
[2023-05-18 08:53:40,481 INFO] Step 10/20000; acc: 87.1; ppl: 41.1; xent: 3.7; lr: 0.01031; sents: 2059; bsz: 242/ 173/ 7; 491/350 tok/s; 149 sec;
[2023-05-18 08:55:01,678 INFO] Step 20/20000; acc: 88.6; ppl: 34.7; xent: 3.5; lr: 0.01969; sents: 2012; bsz: 228/ 171/ 6; 901/673 tok/s; 230 sec;
1
Hello, I got some problems in the “magic”. Here are the errors:
python magic.py
Traceback (most recent call last):
File "magic.py", line 5, in <module>
import sentencepiece_model_pb2 as model
ModuleNotFoundError: No module named 'sentencepiece_model_pb2'
Thanks for the extensive tutorial. I’m getting this error in Colab
“RuntimeError: The expanded size of the tensor (1024) must match the existing size (2048) at non-singleton dimension 0. Target sizes: [1024]. Tensor sizes: [2048]”
Thanks for your reply. Can you give me the complete config? I want to train english to chinese and taglog to chinese at the same time. I will convert the fine-tuned pytorch model to ctranslate2. So can I still use LoRa?
When I fine-tune 3.3B or 1.3B(notebook on cloud GPU), it gives the error below:
File "/workspace/OpenNMT-py/onmt/train_single.py", line 165, in main
model = build_model(model_opt, opt, vocabs, checkpoint)
File "/workspace/OpenNMT-py/onmt/model_builder.py", line 412, in build_model
model.load_state_dict(
File "/workspace/OpenNMT-py/onmt/models/model.py", line 142, in load_state_dict
raise ValueError(
ValueError: Extra keys in model state_dict do not match the model config dict_keys