Finetuning Llama-7B/13B or MosaicML MPT-7B - Reproduce Vicuna / Alpaca

@martin_bombin I found some memory adjustement / savings. Even if I am working on safetensors implementation (with sharded checkpoints) the current master (soon to be v3.2) will work smoothly with 13B training / loading.

Let me know if you see the same.

I only needed 27 GB of RAM this time (instead of 62+) but I am getting the following error using the same config.ymal. I am using 3.2.0 version:

[2023-06-08 08:10:15,647 INFO] bnb_NF4 compression of layer ['w_1', 'w_2', 'w_3']                                                                                                                 [29/1969]
bin /home/m.barroso/anaconda3/envs/openNMT_llms/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117_nocublaslt.so                                                                            
[2023-06-08 08:11:08,725 INFO] Adding LoRa layers for linear_values quant None                                                                                                                             
[2023-06-08 08:11:08,762 INFO] Adding LoRa layers for linear_query quant None                                                                                                                              
[2023-06-08 08:11:08,780 INFO] Adding LoRa layers for linear_keys quant None                                                                                                                               [2023-06-08 08:11:08,853 INFO] Adding LoRa layers for final_linear quant None                                                                                                                              
[2023-06-08 08:11:10,082 INFO] Switching model to half() for FusedAdam legacy                                                                                                                              
[2023-06-08 08:11:10,082 INFO] Non quantized layer compute is fp16                                                                                                                                         
Traceback (most recent call last):                                                                                                                                                                         
  File "/home/m.barroso/OpenNMT_nllb/llms/OpenNMT-py/onmt/bin/train.py", line 71, in <module>                                                                                                              
    main()                                                                                                                                                                                                 
  File "/home/m.barroso/OpenNMT_nllb/llms/OpenNMT-py/onmt/bin/train.py", line 67, in main                                                                                                                  
    train(opt)                                                                                                                                                                                             
  File "/home/m.barroso/OpenNMT_nllb/llms/OpenNMT-py/onmt/bin/train.py", line 52, in train                                                                                                                 
    train_process(opt, device_id=0)                                                                                                                                                                        
  File "/home/m.barroso/OpenNMT_nllb/llms/OpenNMT-py/onmt/train_single.py", line 185, in main                                                                                                              
    model = build_model(model_opt, opt, vocabs, checkpoint)                                                                                                                                                
  File "/home/m.barroso/OpenNMT_nllb/llms/OpenNMT-py/onmt/model_builder.py", line 378, in build_model                                                                                                      
    model.load_state_dict(                                                                                                                                                                                 
  File "/home/m.barroso/OpenNMT_nllb/llms/OpenNMT-py/onmt/models/model.py", line 93, in load_state_dict                                                                                                    
    raise ValueError(                                                                                                                                                                                      
ValueError: Extra keys in model state_dict do not match the model config dict_keys(['decoder.transformer_layers.0.feed_forward.w_1.bias', 'decoder.transformer_layers.0.feed_forward.w_2.bias', 'decoder.tr
ansformer_layers.0.feed_forward.w_3.bias', 'decoder.transformer_layers.1.feed_forward.w_1.bias', 'decoder.transformer_layers.1.feed_forward.w_2.bias', 'decoder.transformer_layers.1.feed_forward.w_3.bias'
, 'decoder.transformer_layers.2.feed_forward.w_1.bias', 'decoder.transformer_layers.2.feed_forward.w_2.bias',

Yes, the reason is the new option add_ffnbias.
You have two choices:

  1. you reconvert the original model with the updated converter so that you generate a model without the bias in the checkpoint
  2. you add add_ffnbias: true in your yaml file to match the older config

But I recommend #1
Thanks for reporting this.

Yes, 1 solve it, now it only takes 26 GB of RAM to load the 13 B model, and it goes way faster, congrats.

1 Like

Hi! Trying to merge trained lora weights and got this error:

Traceback (most recent call last):
  File "/python-tests/opennmt/OpenNMT-py/tools/lora_weights.py", line 47, in <module>
    lora_checkpoint = load_checkpoint(opt.lora_weights)
  File "/home/sersh/miniconda3/envs/main/lib/python3.10/site-packages/onmt/models/model_saver.py", line 54, in load_checkpoint
    if "0.weight" in checkpoint["generator"]:
TypeError: argument of type 'NoneType' is not iterable

I trained openllama3b model with 4bit quants:


#4/8bit
quant_layers: ['linear_values', 'linear_query', 'linear_keys', 'final_linear', 'w_1', 'w_2', 'w_3']
quant_type: "bnb_NF4"

#LoRa
lora_layers: ['linear_values', 'linear_query', 'linear_keys', 'final_linear']
lora_rank: 32
lora_dropout: 0.05
lora_alpha: 64
lora_embedding: false

I am doing something wrong? Tried with v3.2.0 +pytorch 1.13.1 and with new v.3.3.0 + pytorch 2.0.1

this line should not be line 54, in master it’s 55 maybe something is wrong with your install

what is the line above this one, do you see this:
# fix v2 compatibility
if “generator” in checkpoint.keys():
if “0.weight” in checkpoint[“generator”]:

Yes, I see. Right now I am not on master, but on v3.2.0
I made workaround:

 if "generator" in checkpoint.keys():
            if not checkpoint["generator"]:
                checkpoint["generator"] = {}
            if "0.weight" in checkpoint["generator"]:
                checkpoint["generator"]["weight"] = checkpoint["generator"].pop(
                    "0.weight"
                )
            if "0.bias" in checkpoint["generator"]:
                checkpoint["generator"]["bias"] = checkpoint["generator"].pop("0.bias")

And model merged. And I was able to export it in ctranslate2.
But now in inference I see this strange output: and this page▁reflects that ( _ between words ). How to avoid it? I just use spm.Decode() with tokenizer.model from openllama

can you confirm you can generate without any issue using OpenNMT-py first ?

I gen same results with onmt_translate. Some words are concatenated with _. Checked the results as tokens - it looks good: 'The', '▁client', '▁is', '▁clearly', '▁enjoying', '▁the', '▁experience', but after sentencepiece Decode() - The client is clearly▁enjoying the experience. Strange. Looks like sentencepiece issue.

Show me the commandline and inference config file, it’s impossible to help you if I don’t have all the info

transforms: [sentencepiece]

#### Subword
src_subword_model: "tokenizer.model"
tgt_subword_model: "tokenizer.model"
src_subword_nbest: 1
src_subword_alpha: 0.0
tgt_subword_nbest: 1
tgt_subword_alpha: 0.0
# Model info
model: "merged.pt"
# Inference
max_length: 1048
gpu: 0
batch_type: tokens
batch_size: 1048
#fp16:
precision: fp16
beam_size: 1
report_time: true

onmt_translate --config inference.yaml --src test.txt --out test.hyp

But it seems sentencepiece tokenizer issue. Here is small test file to reproduce error:

import sentencepiece as spm

tokenizer = spm.SentencePieceProcessor(model_file='tokenizer.model')

test = tokenizer.Decode(['▁hands', '▁resting', '▁on'])
print(test)
# hands▁resting on

Of course simple .replace(‘▁’, ’ ') will solve issue, just wondering why is that.

I don’t understand why you would use smp.decode, onmt_translate already performs de detokenization.
what is your version of sentencepiece ? I think I remember it needs to be < 0.1.98

spm.Decode used in inference with ctranslate2. On onmt_translate I don’t use it, but get some words concatenated with ▁ in out file. I guess openllama.vocab doesn’t match the spm tokenizer.model , since if I try to encode with spm text “hands resting on” it will encode it to different tokens ( _hands, _rest, ing, _on).

somehow I downloaded wrong tokenizer.model, I think. Just redownloaded from https://opennmt-models.s3.amazonaws.com/openllama/tokenizer.model , and it matches openllama.vocab. Sorry to disturb)