Fine-Tuning Llama-2 quantized with CT2

akocherovskiy · August 29, 2023, 8:05am

Hello. Is it possible to Fine-Tune Llama-2 quantized with CT2?
I try to use this code:

import torch

base_model_name = "/root/llama-2-7b-chat-ct2/"

bnb_config = BitsAndBytesConfig(
   # load_in_4bit=True,
   # bnb_4bit_quant_type="nf4",
   bnb_4bit_compute_dtype=torch.bfloat16,
)

device_map = {"": 0}

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=bnb_config,
    device_map=device_map,
    trust_remote_code=True,
    use_auth_token=True,
    from_tf=True
    
)
base_model.config.use_cache = False

# More info: https://github.com/huggingface/transformers/pull/24906
base_model.config.pretraining_tp = 1 

peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)

tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

output_dir = "./results_3"

training_args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=32,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    logging_steps=10,
    max_steps=100
)

max_seq_length = 512

trainer = SFTTrainer(
    model=base_model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_args,
)

trainer.train()

import os
output_dir = os.path.join(output_dir, "final_checkpoint")
trainer.model.save_pretrained(output_dir)

And got this error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[14], line 13
      5 bnb_config = BitsAndBytesConfig(
      6   #  load_in_4bit=True,
      7    # bnb_4bit_quant_type="nf4",
      8    bnb_4bit_compute_dtype=torch.bfloat16,
      9 )
     11 device_map = {"": 0}
---> 13 base_model = AutoModelForCausalLM.from_pretrained(
     14     base_model_name,
     15     quantization_config=bnb_config,
     16     device_map=device_map,
     17     trust_remote_code=True,
     18     use_auth_token=True,
     19     from_tf=True
     20     
     21 )
     22 base_model.config.use_cache = False
     24 # More info: https://github.com/huggingface/transformers/pull/24906

File /usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py:516, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    514 elif type(config) in cls._model_mapping.keys():
    515     model_class = _get_model_class(config, cls._model_mapping)
--> 516     return model_class.from_pretrained(
    517         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    518     )
    519 raise ValueError(
    520     f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    521     f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
    522 )

File /usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:3057, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   3054 try:
   3055     from .modeling_tf_pytorch_utils import load_tf2_checkpoint_in_pytorch_model
-> 3057     model, loading_info = load_tf2_checkpoint_in_pytorch_model(
   3058         model, resolved_archive_file, allow_missing_keys=True, output_loading_info=True
   3059     )
   3060 except ImportError:
   3061     logger.error(
   3062         "Loading a TensorFlow model in PyTorch, requires both PyTorch and TensorFlow to be installed."
   3063         " Please see https://pytorch.org/ and https://www.tensorflow.org/install/ for installation"
   3064         " instructions."
   3065     )

File /usr/local/lib/python3.10/dist-packages/transformers/modeling_tf_pytorch_utils.py:442, in load_tf2_checkpoint_in_pytorch_model(pt_model, tf_checkpoint_path, tf_inputs, allow_missing_keys, output_loading_info)
    440 # Instantiate and load the associated TF 2.0 model
    441 tf_model_class_name = "TF" + pt_model.__class__.__name__  # Add "TF" at the beginning
--> 442 tf_model_class = getattr(transformers, tf_model_class_name)
    443 tf_model = tf_model_class(pt_model.config)
    445 if tf_inputs is None:

File /usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py:1123, in _LazyModule.__getattr__(self, name)
   1121     value = getattr(module, name)
   1122 else:
-> 1123     raise AttributeError(f"module {self.__name__} has no attribute {name}")
   1125 setattr(self, name, value)
   1126 return value

AttributeError: module transformers has no attribute TFLlamaForCausalLM

guillaumekln · August 29, 2023, 8:29am

Hi,

It is not possible to fine-tune a model that is already converted to CTranslate2.

You should first fine-tune the model with a training framework (e.g. with Transformers) and then convert to CTranslate2.

akocherovskiy · August 29, 2023, 8:35am

Thank you for your answer. After finetunning I have an original model like llama-2-7b-chat (pytorch_model.bin) and biases of weight (adapter_model.bin). Should I convert both?

guillaumekln · August 29, 2023, 8:44am

The adapter weights should be merged with the main model weights. Search for the method merge_and_unload. There are many examples doing that. (Note this is not directly related with CTranslate2.)

akocherovskiy · August 29, 2023, 8:58am

Thank you!

ofyayla · September 2, 2023, 8:53am

Hello,

I’m encountering the same situation. I performed fine-tuning on the llama-2-7b-chat-hf model. Following the fine-tuning, I utilized the ct2-transformers-converter command to convert the model. However, the converted model is not delivering the expected response performance.

I wonder if there’s something crucial that I might have overlooked or wrong. Your assistance is greatly appreciated.

Here’s how I used the converter CLI:

ct2-transformers-converter --model fine-tuned-model --copy_files tokenizer.model \
    --output_dir new-ct2-model --quantization int8_float32 --low_cpu_mem_usage

Here’s the content of the fine-tuned-model directory:

config.json
special_tokens_map.json
generation_config.json
tokenizer_config.json
pytorch_model-00001-of-00002.bin
pytorch_model-00002-of-00002.bin
tokenizer.json
tokenizer.model
pytorch_model.bin.index.json

After the conversion, the new-ct2-model folder contains the following files:

config.json
model.bin
tokenizer.model
vocabulary.json

However, it had some issues. Subsequently, I included the additional files from the fine-tuned-model directory:

generation_config.json
pytorch_model.bin.index.json
special_tokens_map.json
tokenizer.json

By the way, I’d like to run it on a machine with a CPU. Could you please share the optimal configurations for CPU inference?
Any insights you can provide would be extremely valuable. Thank you for your support.