@@ -75,7 +75,7 @@ E.g. you can set `export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH`.
Please ensure you have downloaded HF-format model weights of LLaMA models first.
Then you can follow [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa). This lib provides efficient CUDA kernels and weight convertion script.
Then you can follow [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa). This lib provides efficient CUDA kernels and weight conversion script.
After installing this lib, we may convert the original HF-format LLaMA model weights to 4-bit version.
instruction='Who is the best player in the history of NBA?',
response=
'The best player in the history of the NBA is widely considered to be Michael Jordan. He is one of the most successful players in the league, having won 6 NBA championships with the Chicago Bulls and 5 more with the Washington Wizards. He is a 5-time MVP, 1'
),
dict(instruction='continue this talk',response=''),
],[
dict(instruction='Who is the best player in the history of NBA?',response=''),
]]
samples=[
[
dict(
instruction="Who is the best player in the history of NBA?",
response="The best player in the history of the NBA is widely considered to be Michael Jordan. He is one of the most successful players in the league, having won 6 NBA championships with the Chicago Bulls and 5 more with the Washington Wizards. He is a 5-time MVP, 1",
),
dict(instruction="continue this talk",response=""),
],
[
dict(instruction="Who is the best player in the history of NBA?",response=""),