@martin_bombin I found some memory adjustement / savings. Even if I am working on safetensors implementation (with sharded checkpoints) the current master (soon to be v3.2) will work smoothly with 13B training / loading.
I only needed 27 GB of RAM this time (instead of 62+) but I am getting the following error using the same config.ymal. I am using 3.2.0 version:
[2023-06-08 08:10:15,647 INFO] bnb_NF4 compression of layer ['w_1', 'w_2', 'w_3'] [29/1969]
bin /home/m.barroso/anaconda3/envs/openNMT_llms/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117_nocublaslt.so
[2023-06-08 08:11:08,725 INFO] Adding LoRa layers for linear_values quant None
[2023-06-08 08:11:08,762 INFO] Adding LoRa layers for linear_query quant None
[2023-06-08 08:11:08,780 INFO] Adding LoRa layers for linear_keys quant None [2023-06-08 08:11:08,853 INFO] Adding LoRa layers for final_linear quant None
[2023-06-08 08:11:10,082 INFO] Switching model to half() for FusedAdam legacy
[2023-06-08 08:11:10,082 INFO] Non quantized layer compute is fp16
Traceback (most recent call last):
File "/home/m.barroso/OpenNMT_nllb/llms/OpenNMT-py/onmt/bin/train.py", line 71, in <module>
main()
File "/home/m.barroso/OpenNMT_nllb/llms/OpenNMT-py/onmt/bin/train.py", line 67, in main
train(opt)
File "/home/m.barroso/OpenNMT_nllb/llms/OpenNMT-py/onmt/bin/train.py", line 52, in train
train_process(opt, device_id=0)
File "/home/m.barroso/OpenNMT_nllb/llms/OpenNMT-py/onmt/train_single.py", line 185, in main
model = build_model(model_opt, opt, vocabs, checkpoint)
File "/home/m.barroso/OpenNMT_nllb/llms/OpenNMT-py/onmt/model_builder.py", line 378, in build_model
model.load_state_dict(
File "/home/m.barroso/OpenNMT_nllb/llms/OpenNMT-py/onmt/models/model.py", line 93, in load_state_dict
raise ValueError(
ValueError: Extra keys in model state_dict do not match the model config dict_keys(['decoder.transformer_layers.0.feed_forward.w_1.bias', 'decoder.transformer_layers.0.feed_forward.w_2.bias', 'decoder.tr
ansformer_layers.0.feed_forward.w_3.bias', 'decoder.transformer_layers.1.feed_forward.w_1.bias', 'decoder.transformer_layers.1.feed_forward.w_2.bias', 'decoder.transformer_layers.1.feed_forward.w_3.bias'
, 'decoder.transformer_layers.2.feed_forward.w_1.bias', 'decoder.transformer_layers.2.feed_forward.w_2.bias',
Hi! Trying to merge trained lora weights and got this error:
Traceback (most recent call last):
File "/python-tests/opennmt/OpenNMT-py/tools/lora_weights.py", line 47, in <module>
lora_checkpoint = load_checkpoint(opt.lora_weights)
File "/home/sersh/miniconda3/envs/main/lib/python3.10/site-packages/onmt/models/model_saver.py", line 54, in load_checkpoint
if "0.weight" in checkpoint["generator"]:
TypeError: argument of type 'NoneType' is not iterable
this line should not be line 54, in master itâs 55 maybe something is wrong with your install
what is the line above this one, do you see this:
# fix v2 compatibility
if âgeneratorâ in checkpoint.keys():
if â0.weightâ in checkpoint[âgeneratorâ]:
Yes, I see. Right now I am not on master, but on v3.2.0
I made workaround:
if "generator" in checkpoint.keys():
if not checkpoint["generator"]:
checkpoint["generator"] = {}
if "0.weight" in checkpoint["generator"]:
checkpoint["generator"]["weight"] = checkpoint["generator"].pop(
"0.weight"
)
if "0.bias" in checkpoint["generator"]:
checkpoint["generator"]["bias"] = checkpoint["generator"].pop("0.bias")
And model merged. And I was able to export it in ctranslate2.
But now in inference I see this strange output: and this pageâreflects that ( _ between words ). How to avoid it? I just use spm.Decode() with tokenizer.model from openllama
I gen same results with onmt_translate. Some words are concatenated with _. Checked the results as tokens - it looks good: 'The', 'âclient', 'âis', 'âclearly', 'âenjoying', 'âthe', 'âexperience', but after sentencepiece Decode() - The client is clearlyâenjoying the experience. Strange. Looks like sentencepiece issue.
I donât understand why you would use smp.decode, onmt_translate already performs de detokenization.
what is your version of sentencepiece ? I think I remember it needs to be < 0.1.98
spm.Decode used in inference with ctranslate2. On onmt_translate I donât use it, but get some words concatenated with â in out file. I guess openllama.vocab doesnât match the spm tokenizer.model , since if I try to encode with spm text âhands resting onâ it will encode it to different tokens ( _hands, _rest, ing, _on).