使用LoRA微调Qwen0.5b

目前微调方法已经基本上固定，LoRA类微调方法击败传统Adapter类，Prompt-Tuning类微调方法稳坐微调老大哥的第一把交椅。

LoRA类：LoRA、QLoRA、AdaLoRA、LoRA+、DoRA……

Prompt-Tuning类：Prompt-Tuning、P-Tuning、P-TuningV2、Prefix-Tuning……

1 LoRA介绍

1.1 原理

一般我们将B初始化为zero矩阵，A使用kaiming初始化或者均匀初始化。

除此之外，我们还对deltaW进行缩放，alpha和r，一般来说我们将alpha这只为r的两倍。

2 项目介绍

本项目使用Firefly进行微调。

Firefly 是一个开源的大模型训练项目，支持对主流的大模型进行预训练、指令微调和DPO，包括但不限于Yi-1.5、Llama3、Gemma、Qwen1.5、MiniCPM、Llama、InternLM、Baichuan、ChatGLM、Yi、Deepseek、Qwen、Orion、Ziya、Xverse、Mistral、Mixtral-8x7B、Zephyr、Vicuna、Bloom等。本项目支持全量参数训练、LoRA、QLoRA高效训练，支持预训练、SFT、DPO。

2.1 训练语料

数据构建：我们按照Firefly项目Firefly/data/dummy_data.jsonl的数据格式准备我们的数据。

数据格式：

Firefly项目格式：

Alpaca-Instruct格式：

数据准备：

本次我们使用甄嬛数据集（格式为Alpaca-Instruct），可以从here下载得到. 使用以下代码将Alpaca-Instruct微调格式替换为Firefly项目格式。

import json

sft_data_list = ""
with open("huanhuan.jsonl", "r", encoding="utf-8") as f:
    data_list = f.readlines()
    for id, data in enumerate(data_list):
        json_data = json.loads(data)
        sft_data = {
            "conversation_id": id,
            "conversation": [
                {
                    "human": json_data["instruction"] + json_data["input"],
                    "assistant": json_data["output"]
                }
            ]
        }
        sft_data_list += json.dumps(sft_data, ensure_ascii=False) + "\n"

with open("merged_huanhuan.jsonl", "w", encoding="utf-8") as f:
    f.write(sft_data_list)

Qwen1.5 Prompt Template：Firefly项目会自动帮我们处理。

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
你是谁？<|im_end|>
<|im_start|>assistant
我是一个有用的助手。<|im_end|>

2.2 配置文件

找到Firefly/train_args/sft/lora/qwen1.5-0.5b-sft-lora.json文件，进入修改

output_dir：模型训练完LoRA权重位置
model_name_or_path：Qwen模型权重位置
train_file：微调数据文件位置
template_name：使用qwen的template，如上一小节所示

{
    "output_dir": "output/firefly-qwen1.5-0.5b-sft-huanhuan-lora",
    "model_name_or_path": "",
    "train_file": "./data/merged_huanhuan.jsonl",
    "template_name": "qwen",
    "train_mode": "lora",
    "num_train_epochs": 1,
    "per_device_train_batch_size": 16,
    "gradient_accumulation_steps": 1,
    "learning_rate": 2e-4,
    "max_seq_length": 1024,
    "logging_steps": 100,
    "save_steps": 100,
    "save_total_limit": 1,
    "lr_scheduler_type": "constant_with_warmup",
    "warmup_steps": 100,
    "lora_rank": 64,
    "lora_alpha": 128,
    "lora_dropout": 0.05,

    "gradient_checkpointing": true,
    "disable_tqdm": false,
    "optim": "paged_adamw_32bit",
    "seed": 42,
    "fp16": true,
    "report_to": "tensorboard",
    "dataloader_num_workers": 0,
    "save_strategy": "steps",
    "weight_decay": 0,
    "max_grad_norm": 0.3,
    "remove_unused_columns": false
}

2.3 训练

此处使用一张A100 40G参与微调，batch_size设置为16，累计梯度更新step设置为1，使用的显存区间在14~19G波动，训练时间在3分钟左右。

CUDA_VISIBLE_DEVICES=0 python train.py --train_args_file train_args/sft/lora/qwen1.5-0.5b-sft-lora.json

2.4 推理

方式1：自定义推理

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from peft import PeftModel

mode_path = './qwen/Qwen1.5-7B-Chat/'
lora_path = 'lora_path'

# 加载tokenizer
tokenizer = AutoTokenizer.from_pretrained(mode_path)

# 加载模型
model = AutoModelForCausalLM.from_pretrained(mode_path, device_map="auto",torch_dtype=torch.bfloat16)

# 加载lora权重
model = PeftModel.from_pretrained(model, model_id=lora_path, config=config)

prompt = "你是谁？"
messages = [
    {"role": "system", "content": "现在你要扮演皇帝身边的女人--甄嬛"},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

model_inputs = tokenizer([text], return_tensors="pt").to('cuda')

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)

方式2：使用Firefly项目自带方法

1 合并lora和预训练模型
2 运行chat

合并lora和预训练模型：

Firefly/script/merge_lora.py得到合并后的模型

Firefly/script/chat/chat.py载入合并后的模型进行推理

使用LoRA微调Qwen0.5b

1 LoRA介绍

1.1 原理

2 项目介绍

2.1 训练语料

2.2 配置文件

2.3 训练

2.4 推理

By crabboss

Related Post

One thought on “使用LoRA微调Qwen0.5b”

You Missed

大模型分布式入门

大模型量化入门

优化器的进化之旅

FlashAttention – 原理解析

使用LoRA微调Qwen0.5b

1 LoRA介绍

1.1 原理

2 项目介绍

2.1 训练语料

2.2 配置文件

2.3 训练

2.4 推理

By crabboss

Related Post

手写LoRA

Firefly项目解析

One thought on “使用LoRA微调Qwen0.5b”

You Missed

大模型分布式入门

大模型量化入门

优化器的进化之旅

FlashAttention – 原理解析