llama_factory.rst 5.11 KB
Newer Older
chenzk's avatar
v1.0  
chenzk committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
LLaMA-Factory
===================================

.. attention:: 
    To be updated for Qwen3.

Here we provide a script for supervised finetuning Qwen2.5 with
`LLaMA-Factory <https://github.com/hiyouga/LLaMA-Factory>`__. This
script for supervised finetuning (SFT) has the following features:

-  Support single-GPU and multi-GPU training;

-  Support full-parameter tuning, LoRA, Q-LoRA, Dora.

In the following, we introduce more details about the usage of the
script.

Installation
------------

Before you start, make sure you have installed the following packages:

1. Follow the instructions of
   `LLaMA-Factory <https://github.com/hiyouga/LLaMA-Factory>`__, and build
   the environment.
2. Install these packages (Optional):

::

   pip install deepspeed
   pip install flash-attn --no-build-isolation

3. If you want to use
   `FlashAttention-2 <https://github.com/Dao-AILab/flash-attention>`__,
   make sure your CUDA is 11.6 and above.

Data Preparation
----------------

LLaMA-Factory provides several training datasets in ``data`` folder, you
can use it directly. If you are using a custom dataset, please prepare
your dataset as follows.

1. Organize your data in a **json** file and put your data in ``data``
   folder. LLaMA-Factory supports dataset in ``alpaca`` or ``sharegpt``
   format.

-  The dataset in ``alpaca`` format should follow the below format:

.. code:: json

   [
     {
       "instruction": "user instruction (required)",
       "input": "user input (optional)",
       "output": "model response (required)",
       "system": "system prompt (optional)",
       "history": [
         ["user instruction in the first round (optional)", "model response in the first round (optional)"],
         ["user instruction in the second round (optional)", "model response in the second round (optional)"]
       ]
     }
   ]

-  The dataset in ``sharegpt`` format should follow the below format:

.. code:: json

   [
     {
       "conversations": [
         {
           "from": "human",
           "value": "user instruction"
         },
         {
           "from": "gpt",
           "value": "model response"
         }
       ],
       "system": "system prompt (optional)",
       "tools": "tool description (optional)"
     }
   ]

2. Provide your dataset definition in ``data/dataset_info.json`` in the
   following format .

-  For ``alpaca`` format dataset, the columns in ``dataset_info.json``
   should be:

.. code:: json

   "dataset_name": {
     "file_name": "dataset_name.json",
     "columns": {
       "prompt": "instruction",
       "query": "input",
       "response": "output",
       "system": "system",
       "history": "history"
     }
   }

-  For ``sharegpt`` format dataset, the columns in ``dataset_info.json``
   should be:

.. code:: json

   "dataset_name": {
       "file_name": "dataset_name.json",
       "formatting": "sharegpt",
       "columns": {
         "messages": "conversations",
         "system": "system",
         "tools": "tools"
       },
       "tags": {
         "role_tag": "from",
         "content_tag": "value",
         "user_tag": "user",
         "assistant_tag": "assistant"
       }
     }

Training
--------

Execute the following training command:

.. code:: bash

   DISTRIBUTED_ARGS="
       --nproc_per_node $NPROC_PER_NODE \
       --nnodes $NNODES \
       --node_rank $NODE_RANK \
       --master_addr $MASTER_ADDR \
       --master_port $MASTER_PORT
     "

   torchrun $DISTRIBUTED_ARGS src/train.py \
       --deepspeed $DS_CONFIG_PATH \
       --stage sft \
       --do_train \
       --use_fast_tokenizer \
       --flash_attn \
       --model_name_or_path $MODEL_PATH \
       --dataset your_dataset \
       --template qwen \
       --finetuning_type lora \
       --lora_target q_proj,v_proj\
       --output_dir $OUTPUT_PATH \
       --overwrite_cache \
       --overwrite_output_dir \
       --warmup_steps 100 \
       --weight_decay 0.1 \
       --per_device_train_batch_size 4 \
       --gradient_accumulation_steps 4 \
       --ddp_timeout 9000 \
       --learning_rate 5e-6 \
       --lr_scheduler_type cosine \
       --logging_steps 1 \
       --cutoff_len 4096 \
       --save_steps 1000 \
       --plot_loss \
       --num_train_epochs 3 \
       --bf16 

and enjoy the training process. To make changes to your training, you
can modify the arguments in the training command to adjust the
hyperparameters. One argument to note is ``cutoff_len``, which is the
maximum length of the training data. Control this parameter to avoid OOM
error.

Merge LoRA
----------

If you train your model with LoRA, you probably need to merge adapter
parameters to the main branch. Run the following command to perform the
merging of LoRA adapters.

.. code:: bash

   CUDA_VISIBLE_DEVICES=0 llamafactory-cli export \
       --model_name_or_path path_to_base_model \
       --adapter_name_or_path path_to_adapter \
       --template qwen \
       --finetuning_type lora \
       --export_dir path_to_export \
       --export_size 2 \
       --export_legacy_format False

Conclusion
----------

The above content is the simplest way to use LLaMA-Factory to train
Qwen. Feel free to dive into the details by checking the official repo!