Finetuning Llama-7B/13B or MosaicML MPT-7B - Reproduce Vicuna / Alpaca

vince62s · June 7, 2023, 11:27am

@martin_bombin I found some memory adjustement / savings. Even if I am working on safetensors implementation (with sharded checkpoints) the current master (soon to be v3.2) will work smoothly with 13B training / loading.

Let me know if you see the same.

martin_bombin · June 8, 2023, 8:42am

I only needed 27 GB of RAM this time (instead of 62+) but I am getting the following error using the same config.ymal. I am using 3.2.0 version:

[2023-06-08 08:10:15,647 INFO] bnb_NF4 compression of layer ['w_1', 'w_2', 'w_3']                                                                                                                 [29/1969]
bin /home/m.barroso/anaconda3/envs/openNMT_llms/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117_nocublaslt.so                                                                            
[2023-06-08 08:11:08,725 INFO] Adding LoRa layers for linear_values quant None                                                                                                                             
[2023-06-08 08:11:08,762 INFO] Adding LoRa layers for linear_query quant None                                                                                                                              
[2023-06-08 08:11:08,780 INFO] Adding LoRa layers for linear_keys quant None                                                                                                                               [2023-06-08 08:11:08,853 INFO] Adding LoRa layers for final_linear quant None                                                                                                                              
[2023-06-08 08:11:10,082 INFO] Switching model to half() for FusedAdam legacy                                                                                                                              
[2023-06-08 08:11:10,082 INFO] Non quantized layer compute is fp16                                                                                                                                         
Traceback (most recent call last):                                                                                                                                                                         
  File "/home/m.barroso/OpenNMT_nllb/llms/OpenNMT-py/onmt/bin/train.py", line 71, in <module>                                                                                                              
    main()                                                                                                                                                                                                 
  File "/home/m.barroso/OpenNMT_nllb/llms/OpenNMT-py/onmt/bin/train.py", line 67, in main                                                                                                                  
    train(opt)                                                                                                                                                                                             
  File "/home/m.barroso/OpenNMT_nllb/llms/OpenNMT-py/onmt/bin/train.py", line 52, in train                                                                                                                 
    train_process(opt, device_id=0)                                                                                                                                                                        
  File "/home/m.barroso/OpenNMT_nllb/llms/OpenNMT-py/onmt/train_single.py", line 185, in main                                                                                                              
    model = build_model(model_opt, opt, vocabs, checkpoint)                                                                                                                                                
  File "/home/m.barroso/OpenNMT_nllb/llms/OpenNMT-py/onmt/model_builder.py", line 378, in build_model                                                                                                      
    model.load_state_dict(                                                                                                                                                                                 
  File "/home/m.barroso/OpenNMT_nllb/llms/OpenNMT-py/onmt/models/model.py", line 93, in load_state_dict                                                                                                    
    raise ValueError(                                                                                                                                                                                      
ValueError: Extra keys in model state_dict do not match the model config dict_keys(['decoder.transformer_layers.0.feed_forward.w_1.bias', 'decoder.transformer_layers.0.feed_forward.w_2.bias', 'decoder.tr
ansformer_layers.0.feed_forward.w_3.bias', 'decoder.transformer_layers.1.feed_forward.w_1.bias', 'decoder.transformer_layers.1.feed_forward.w_2.bias', 'decoder.transformer_layers.1.feed_forward.w_3.bias'
, 'decoder.transformer_layers.2.feed_forward.w_1.bias', 'decoder.transformer_layers.2.feed_forward.w_2.bias',

vince62s · June 8, 2023, 8:46am

Yes, the reason is the new option add_ffnbias.
You have two choices:

you reconvert the original model with the updated converter so that you generate a model without the bias in the checkpoint
you add add_ffnbias: true in your yaml file to match the older config

But I recommend #1
Thanks for reporting this.

martin_bombin · June 8, 2023, 10:41am

Yes, 1 solve it, now it only takes 26 GB of RAM to load the 13 B model, and it goes way faster, congrats.

vince62s · June 22, 2023, 12:05pm

sersh · June 22, 2023, 3:59pm

Hi! Trying to merge trained lora weights and got this error:

Traceback (most recent call last):
  File "/python-tests/opennmt/OpenNMT-py/tools/lora_weights.py", line 47, in <module>
    lora_checkpoint = load_checkpoint(opt.lora_weights)
  File "/home/sersh/miniconda3/envs/main/lib/python3.10/site-packages/onmt/models/model_saver.py", line 54, in load_checkpoint
    if "0.weight" in checkpoint["generator"]:
TypeError: argument of type 'NoneType' is not iterable

I trained openllama3b model with 4bit quants:


#4/8bit
quant_layers: ['linear_values', 'linear_query', 'linear_keys', 'final_linear', 'w_1', 'w_2', 'w_3']
quant_type: "bnb_NF4"

#LoRa
lora_layers: ['linear_values', 'linear_query', 'linear_keys', 'final_linear']
lora_rank: 32
lora_dropout: 0.05
lora_alpha: 64
lora_embedding: false

I am doing something wrong? Tried with v3.2.0 +pytorch 1.13.1 and with new v.3.3.0 + pytorch 2.0.1

vince62s · June 22, 2023, 4:21pm

this line should not be line 54, in master it’s 55 maybe something is wrong with your install

what is the line above this one, do you see this:
# fix v2 compatibility
if “generator” in checkpoint.keys():
if “0.weight” in checkpoint[“generator”]:

sersh · June 22, 2023, 4:28pm

Yes, I see. Right now I am not on master, but on v3.2.0
I made workaround:

 if "generator" in checkpoint.keys():
            if not checkpoint["generator"]:
                checkpoint["generator"] = {}
            if "0.weight" in checkpoint["generator"]:
                checkpoint["generator"]["weight"] = checkpoint["generator"].pop(
                    "0.weight"
                )
            if "0.bias" in checkpoint["generator"]:
                checkpoint["generator"]["bias"] = checkpoint["generator"].pop("0.bias")

And model merged. And I was able to export it in ctranslate2.
But now in inference I see this strange output: and this page▁reflects that ( _ between words ). How to avoid it? I just use spm.Decode() with tokenizer.model from openllama

vince62s · June 22, 2023, 4:34pm

can you confirm you can generate without any issue using OpenNMT-py first ?

sersh · June 22, 2023, 5:41pm

I gen same results with onmt_translate. Some words are concatenated with _. Checked the results as tokens - it looks good: 'The', '▁client', '▁is', '▁clearly', '▁enjoying', '▁the', '▁experience', but after sentencepiece Decode() - The client is clearly▁enjoying the experience. Strange. Looks like sentencepiece issue.

vince62s · June 22, 2023, 6:06pm

Show me the commandline and inference config file, it’s impossible to help you if I don’t have all the info

sersh · June 22, 2023, 7:27pm

transforms: [sentencepiece]

#### Subword
src_subword_model: "tokenizer.model"
tgt_subword_model: "tokenizer.model"
src_subword_nbest: 1
src_subword_alpha: 0.0
tgt_subword_nbest: 1
tgt_subword_alpha: 0.0
# Model info
model: "merged.pt"
# Inference
max_length: 1048
gpu: 0
batch_type: tokens
batch_size: 1048
#fp16:
precision: fp16
beam_size: 1
report_time: true

onmt_translate --config inference.yaml --src test.txt --out test.hyp

But it seems sentencepiece tokenizer issue. Here is small test file to reproduce error:

import sentencepiece as spm

tokenizer = spm.SentencePieceProcessor(model_file='tokenizer.model')

test = tokenizer.Decode(['▁hands', '▁resting', '▁on'])
print(test)
# hands▁resting on

Of course simple .replace(‘▁’, ’ ') will solve issue, just wondering why is that.

vince62s · June 22, 2023, 7:59pm

I don’t understand why you would use smp.decode, onmt_translate already performs de detokenization.
what is your version of sentencepiece ? I think I remember it needs to be < 0.1.98

sersh · June 22, 2023, 8:04pm

spm.Decode used in inference with ctranslate2. On onmt_translate I don’t use it, but get some words concatenated with ▁ in out file. I guess openllama.vocab doesn’t match the spm tokenizer.model , since if I try to encode with spm text “hands resting on” it will encode it to different tokens ( _hands, _rest, ing, _on).

sersh · June 22, 2023, 8:15pm

somehow I downloaded wrong tokenizer.model, I think. Just redownloaded from https://opennmt-models.s3.amazonaws.com/openllama/tokenizer.model , and it matches openllama.vocab. Sorry to disturb)