The effect of the applied chat template can be observed by **setting log level**`INFO`.
LMDeploy supports two methods of adding chat templates:
- One approach is to utilize an existing conversation template by directly configuring a JSON file like the following.
```json
{
"model_name":"your awesome chat template name",
"system":"<|im_start|>system\n",
"meta_instruction":"You are a robot developed by LMDeploy.",
"eosys":"<|im_end|>\n",
"user":"<|im_start|>user\n",
"eoh":"<|im_end|>\n",
"assistant":"<|im_start|>assistant\n",
"eoa":"<|im_end|>",
"separator":"\n",
"capability":"chat",
"stop_words":["<|im_end|>"]
}
```
`model_name` is a required field and can be either the name of an LMDeploy built-in chat template (which can be viewed through `lmdeploy list`), or a new name. Other fields are optional.
1. When `model_name` is the name of a built-in chat template, the non-null fields in the JSON file will override the corresponding attributes of the original chat template.
2. However, when `model_name` is a new name, it will register `BaseChatTemplate` directly as a new chat template. The specific definition can be referred to [BaseChatTemplate](https://github.com/InternLM/lmdeploy/blob/24bd4b9ab6a15b3952e62bcfc72eaba03bce9dcb/lmdeploy/model.py#L113-L188).
- Another approach is to customize a Python chat template class like the existing LMDeploy chat templates. It can be used directly after successful registration. The advantages are a high degree of customization and strong controllability. Below is an example of registering an LMDeploy chat template.
```python
fromlmdeploy.modelimportMODELS,BaseChatTemplate
@MODELS.register_module(name='customized_model')
classCustomizedModel(BaseChatTemplate):
"""A customized chat template."""
def__init__(self,
system='<|im_start|>system\n',
meta_instruction='You are a robot developed by LMDeploy.',
user='<|im_start|>user\n',
assistant='<|im_start|>assistant\n',
eosys='<|im_end|>\n',
eoh='<|im_end|>\n',
eoa='<|im_end|>',
separator='\n',
stop_words=['<|im_end|>','<|action_end|>']):
super().__init__(system=system,
meta_instruction=meta_instruction,
eosys=eosys,
user=user,
eoh=eoh,
assistant=assistant,
eoa=eoa,
separator=separator,
stop_words=stop_words)
fromlmdeployimportChatTemplateConfig,pipeline
messages=[{'role':'user','content':'who are you?'}]
In this example, we register a LMDeploy chat template that sets the model to be created by LMDeploy, so when the user asks who the model is, the model will answer that it was created by LMDeploy.
Based on the information above, we can see that the version of `glibc` is `2.17-325.el7_9.x86_64` and the version of `python3` is `3.6.8-21.el7_9.x86_64`.
### Download and install `debuginfo` library
Download `glibc-debuginfo-common-2.17-325.el7.x86_64.rpm`, `glibc-debuginfo-2.17-325.el7.x86_64.rpm`, and `python3-debuginfo-3.6.8-21.el7.x86_64.rpm` from http://debuginfo.centos.org/7/x86_64.
Long text extrapolation refers to the ability of LLM to handle data longer than the training text during inference. TurboMind engine now support [LlamaDynamicNTKScalingRotaryEmbedding](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L178) and the implementation is consistent with huggingface.
## Usage
You can enable the context length extrapolation abality by modifying the TurbomindEngineConfig. Edit the `session_len` to the expected length and change `rope_scaling_factor` to a number no less than 1.0.
prompt='Use a long prompt to replace this sentence'
gen_config=GenerationConfig(top_p=0.8,
top_k=40,
temperature=0.8,
max_new_tokens=1024)
response=pipe(prompt,gen_config=gen_config)
print(response)
```
## Evaluation
We use several methods to evaluate the long-context-length inference ability of LMDeploy, including [passkey retrieval](#passkey-retrieval), [needle in a haystack](#needle-in-a-haystack) and computing [perplexity](#perplexity)
### Passkey Retrieval
You can try the following code to test how many times LMDeploy can retrieval the special key.
task_description='There is an important info hidden inside a lot of irrelevant text. Find it and memorize them. I will quiz you about the important information there.'
garbage='The grass is green. The sky is blue. The sun is yellow. Here we go. There and back again.'
information_line=f'The pass key is {pass_key}. Remember it. {pass_key} is the pass key.'# noqa: E501
final_question='What is the pass key? The pass key is'
lines=[
task_description,
garbage_prefix,
information_line,
garbage_suffix,
final_question,
]
# inference
prompt=' '.join(lines)
response=pipe([prompt])
print(pass_key,response)
passkey_retrival(session_len,5)
```
### Needle In A Haystack
[OpenCompass](https://github.com/open-compass/opencompass) offers very useful tools to perform needle-in-a-haystack evaluation. For specific instructions, please refer to the [guide](https://github.com/open-compass/opencompass/blob/main/docs/en/advanced_guides/needleinahaystack_eval.md).
### Perplexity
The following codes demonstrate how to use LMDeploy to calculate perplexity.