Commit fe851fbc authored by zhouxiang's avatar zhouxiang
Browse files

0.2.6版本新增文件补充

parent e2d98ddc
# autotest case
We provide a autotest caseset to do regression.
## Preparation before testing
To improve the efficiency of test case execution, we have downloaded the hf model files to a specific path in advance for easy use in test cases. The path where the model files are stored is defined in the `autotest/config.yaml` file with parameter `model_path`.
Since the test cases involve converting the hf model using convert, the converted model storage path is defined in the `autotest/config.yaml` file parameter `dst_path`.
The `autotest/config.yaml` file also defines the supported model table and corresponding model categories, such as the `model_map` parameter, as well as the log storage path `log_path` used during test case execution.
If you want to create a test environment, you need to prepare the above content and modify the config.yaml file as needed.
## How to run testcases
Install required dependencies using the following command line:
```bash
python3 -m pip install -r requirements/test.txt
```
Run pytest command line with case filtering through -m flag or folder name. eg: `-m convert` Filter cases related to convert or `autotest/tools/convert` for the case in the folder. The corresponding results will be stored in the `allure-results` directory.
```bash
pytest autotest -m convert --clean-alluredir --alluredir=allure-results
pytest autotest/tools/convert --clean-alluredir --alluredir=allure-results
```
If you need to generate reports and display report features, you need to install allure according to the [install documentation of allure](https://allurereport.org/docs/gettingstarted-installation/#install-via-the-system-package-manager-for-linux). You can also install it directly using the following command:
```bash
wget https://github.com/allure-framework/allure2/releases/download/2.25.0/allure_2.24.1-1_all.deb
sudo apt-get install -y openjdk-8-jre-headless
sudo dpkg -i ./allure_2.24.1-1_all.deb
```
Then generate the test report and view the corresponding HTML page by using the following command. The generated report will be stored in `allure-reports`.
```bash
allure generate -c -o allure-reports
allure open ./allure-reports
```
## Test case functionality coverage
The testcases are including following models:
tools model - related to tutorials, the case is basic
interface model - interface function cases of pipeline、 restful api and triton server api
The relationship between functionalities and test cases is as follows:
| case model | Function | Test Case File |
| :--------: | :------------------------------: | :--------------------------------------------------: |
| tools | quantization - w4a16 | tools/quantization/test_quantization_w4a16.py |
| tools | quantization - w8a8 | tools/quantization/test_quantization_w8a8.py |
| tools | quantization - kv int8 | tools/quantization/test_quantization_kvint8.py |
| tools | quantization - kv int8 and w4a16 | tools/quantization/test_quantization_kvint8_w4a16.py |
| tools | convert | tools/convert/test_convert.py |
| tools | pipeline chat - turbomind | tools/pipeline/test_pipeline_chat_turbomind.py |
| tools | pipeline chat - pytorch | tools/pipeline/test_pipeline_chat_pytorch.py |
| tools | restful_api chat - turbomind | tools/pipeline/test_restful_chat_turbomind.py |
| tools | restful_api chat - pytorch | tools/pipeline/test_restful_chat_pytorch.py |
| tools | command chat - workspace | tools/chat/test_command_chat_workspace.py |
| tools | command chat - hf turbomind | tools/chat/test_command_chat_hf_turbomind.py |
| tools | command chat - hf pytorch | tools/chat/test_command_chat_hf_pytorch.py |
| interface | command chat - hf pytorch | tools/chat/test_command_chat_hf_pytorch.py |
The modules and models currently covered by the turbomind and pytorch backend is in `autotest/config.yaml` by using turbomind_model and pytorch_model.
## How to add a testcase
If you want add a new model into tool testcase, you should repare the model in your machine <a href="##Preparation before testing">Jump to prepare Section</a> then add it into `autotest/config.yaml`.
## How to add a chatcase template
We have provided some basic cases in the YAML file for dialogue testing.
For CLI command usage with `chat_prompt_case.yaml` file, use `prompt_case.yaml` file for pipeline chat、 restful api and gradio testing.
If you want to add a dialogue case, you need to modify the corresponding YAML file.
The structure and logic of the YAML file are as follows:
```yaml
# casename, please name the case function, eg: This case is used to test whether there is memory ability for previous round information during multi-round dialogue.
memory_test:
- please introduce some attractions in Chengdu: # Round 1 prompt
# output assert rule list, all rules need to be satisfied for the case to pass.
- contain: # The output needs to contain any one of the following items
- chengdu
- contain:
- 熊猫
- panda
- llama2: # For specific models that require different assert logic, the key is the model type and the value is a list of assert rules. This is a example for llama2 model. In this case, other assert rules will become invalid.
- len_g:
10
- please introduce some delicious foods: # Round 2 prompt
# output assert info list
- contain:
- chengdu
- len_g: # The output's length should larger then 10
10
- XXX: # Round 3 prompt
```
common_case:
- 你好,你叫什么名字#hi, what's your name:
- 介绍成都的景点#please introduce attractions in Chengdu:
- contain:
- chengdu
- 成都
- codellama:
- contain:
- chengdu
- 成都
- llama
- internlm2-1_8b:
- contain:
- chengdu
- 成都
- 你好
- end:
- 介绍相应美食#please introduce some delicious foods:
- not_contain:
- 成都
- chengdu
memory_test:
- 介绍成都的景点#please introduce attractions in Chengdu:
- contain:
- chengdu
- 成都
- contain:
- 熊猫
- panda
- 宽窄巷子
- jinli
- leshan
- 历史悠久
- falcon:
- contain:
- chengdu
- 成都
- internlm2-1_8b:
- contain:
- chengdu
- 成都
- internlm2-20b:
- contain:
- chengdu
- 成都
- 介绍相应美食#please introduce some delicious foods:
- contain:
- 成都
- chengdu
- contain:
- 火锅
- hotpot
- hot pot
- 四川
- falcon:
- len_g:
10
- internlm2-1_8b:
- contain:
- chengdu
- 成都
- internlm2-20b:
- contain:
- chengdu
- 成都
model_path: /mnt/bigdisk/qa_test_models
dst_path: /nvme/qa_test_models/autotest_model
log_path: /nvme/qa_test_models/autotest_model/log
dataset_path: /nvme/qa_test_models/...dataset
tp_config:
internlm-chat-20b: 2
internlm2-chat-20b: 2
Baichuan2-13B-Chat: 2
Mixtral-8x7B-Instruct-v0.1: 2
internlm2-20b: 2
turbomind_model:
- meta-llama/Llama-2-7b-chat
- internlm/internlm2-chat-1_8b
- internlm/internlm-chat-7b
- internlm/internlm-chat-20b
- internlm/internlm2-chat-7b
- internlm/internlm2-chat-20b
- internlm/internlm2-chat-7b-4bits
- internlm/internlm2-chat-20b-4bits
- Qwen/Qwen-7B-Chat
- Qwen/Qwen-14B-Chat
- lmdeploy/llama2-chat-7b-w4
- baichuan-inc/Baichuan2-7B-Chat
- 01-ai/Yi-6B-Chat
- internlm/internlm2-1_8b
- internlm/internlm2-20b
- codellama/CodeLlama-7b-Instruct-hf
pytorch_model:
- meta-llama/Llama-2-7b-chat
- internlm/internlm-chat-7b
- internlm/internlm-chat-20b
- internlm/internlm2-chat-7b
- internlm/internlm2-chat-20b
- baichuan-inc/Baichuan2-7B-Chat
- baichuan-inc/Baichuan2-13B-Chat
- THUDM/chatglm2-6b
- tiiuae/falcon-7b
- 01-ai/Yi-6B-Chat
- internlm/internlm2-1_8b
- internlm/internlm2-20b
- Qwen/Qwen1.5-7B-Chat
- mistralai/Mistral-7B-Instruct-v0.1
- mistralai/Mixtral-8x7B-Instruct-v0.1
- google/gemma-7b-it
- deepseek-ai/deepseek-moe-16b-chat
quatization_case_config:
w4a16:
- meta-llama/Llama-2-7b-chat
- internlm/internlm-chat-20b
- Qwen/Qwen-7B-Chat
- Qwen/Qwen-14B-Chat
- internlm/internlm2-chat-20b
- baichuan-inc/Baichuan2-7B-Chat
- internlm/internlm2-20b
kvint8: # more models are supported kvint8 quantization, but the chat response are not good, already removed
- meta-llama/Llama-2-7b-chat
- internlm/internlm-chat-20b
- internlm/internlm2-chat-20b
kvint8_w4a16:
- meta-llama/Llama-2-7b-chat
- internlm/internlm-chat-20b
- internlm/internlm2-chat-20b
- internlm/internlm2-20b
- Qwen/Qwen-7B-Chat
- Qwen/Qwen-14B-Chat
- baichuan-inc/Baichuan2-7B-Chat
w8a8:
- meta-llama/Llama-2-7b-chat
- internlm/internlm-chat-20b
- internlm/internlm2-chat-20b
- internlm/internlm2-chat-7b
- 01-ai/Yi-6B-Chat
- internlm/internlm2-20b
import os
import pytest
import yaml
cli_prompt_case_file = 'autotest/chat_prompt_case.yaml'
common_prompt_case_file = 'autotest/prompt_case.yaml'
config_file = 'autotest/config.yaml'
@pytest.fixture(scope='session')
def config():
config_path = os.path.join(config_file)
with open(config_path) as f:
env_config = yaml.load(f.read(), Loader=yaml.SafeLoader)
return env_config
@pytest.fixture(scope='session')
def cli_case_config():
case_path = os.path.join(cli_prompt_case_file)
with open(case_path) as f:
case_config = yaml.load(f.read(), Loader=yaml.SafeLoader)
return case_config
@pytest.fixture(scope='class', autouse=True)
def common_case_config():
case_path = os.path.join(common_prompt_case_file)
with open(case_path) as f:
case_config = yaml.load(f.read(), Loader=yaml.SafeLoader)
return case_config
def _init_cli_case_list():
case_path = os.path.join(cli_prompt_case_file)
with open(case_path) as f:
case_config = yaml.load(f.read(), Loader=yaml.SafeLoader)
global global_cli_case_List
global_cli_case_List = list(case_config.keys())
def _init_common_case_list():
case_path = os.path.join(common_prompt_case_file)
with open(case_path) as f:
case_config = yaml.load(f.read(), Loader=yaml.SafeLoader)
global global_common_case_List
global_common_case_List = list(case_config.keys())
import pytest
from pytest import assume
from lmdeploy import GenerationConfig, TurbomindEngineConfig, pipeline
@pytest.mark.order(8)
@pytest.mark.pipeline_turbomind_func
@pytest.mark.timeout(240)
@pytest.mark.flaky(reruns=0)
class TestPipelineTurbomindFuncRegression:
@pytest.mark.parametrize('model', ['internlm/internlm2-chat-20b'])
def test_backend_config_tp(self, config, model):
with pytest.raises(AssertionError, match='tp should be 2\\^n'):
model_path = '/'.join([config.get('model_path'), model])
backend_config = TurbomindEngineConfig(tp=100)
pipe = pipeline(model_path, backend_config=backend_config)
del pipe
@pytest.mark.parametrize('model', ['internlm/internlm2-chat-20b'])
def test_backend_config_session_len(self, config, model):
model_path = '/'.join([config.get('model_path'), model])
backend_config = TurbomindEngineConfig(session_len=10)
pipe = pipeline(model_path, backend_config=backend_config)
response = pipe(['Hi, pls intro yourself', 'Shanghai is'])
del pipe
for i in range(2):
assert response[i].finish_reason == 'length', str(response[i])
assert response[i].generate_token_len == 0, str(response[i])
@pytest.mark.parametrize('model', ['internlm/internlm2-chat-20b'])
def test_gen_config_test(self, config, model):
model_path = '/'.join([config.get('model_path'), model])
pipe = pipeline(model_path)
# test min_new_tokens
gen_config = GenerationConfig(min_new_tokens=200, ignore_eos=True)
response = pipe(['Hi, pls intro yourself', 'Shanghai is'],
gen_config=gen_config)
for i in range(2):
with assume:
assert response[i].finish_reason == 'length', str(response[i])
with assume:
assert response[i].session_id == i
# test stop_words
gen_config = GenerationConfig(stop_words=[' and', '浦', ' to'],
random_seed=1,
temperature=0.01)
response = pipe(['Hi, pls intro yourself', 'Shanghai is'],
gen_config=gen_config)
with assume:
assert '浦' not in response[0].text and response[
0].finish_reason == 'stop' and response[
0].generate_token_len < 20, str(response[0])
with assume:
assert ' and' not in response[1].text and ' to ' not in response[
1].text and response[1].finish_reason == 'stop' and response[
1].generate_token_len < 20, str(response[1])
# test bad_words
gen_config = GenerationConfig(bad_words=[' and', '浦', ' to'],
temperature=0.01,
random_seed=1)
response = pipe(['Hi, pls intro yourself', 'Shanghai is'],
gen_config=gen_config)
with assume:
assert '浦' not in response[0].text and '蒲' in response[
0].text, str(response[0])
with assume:
assert ' and' not in response[1].text and ' to ' not in response[
1].text, str(response[1])
# test special_words
gen_config = GenerationConfig(skip_special_tokens=False)
response = pipe(['Hi, pls intro yourself', 'Shanghai is'],
gen_config=gen_config)
for i in range(2):
with assume:
assert response[i].finish_reason == 'length' or response[
i].finish_reason == 'stop', str(response[i])
# test max_new_tokens
gen_config = GenerationConfig(max_new_tokens=5)
response = pipe(['Hi, pls intro yourself', 'Shanghai is'],
gen_config=gen_config)
for i in range(2):
with assume:
assert response[i].finish_reason == 'length', str(response[i])
with assume:
assert response[i].generate_token_len == 6, str(response[i])
# test max_new_tokens with ignore_eos
gen_config = GenerationConfig(ignore_eos=True, max_new_tokens=1024)
response = pipe(['Hi, pls intro yourself', 'Shanghai is'],
gen_config=gen_config)
for i in range(2):
with assume:
assert response[i].finish_reason == 'length', str(response[i])
with assume:
assert response[i].generate_token_len == 1025, str(response[i])
# test repetition_penalty
gen_config = GenerationConfig(repetition_penalty=0.1, random_seed=1)
response = pipe('Shanghai is', gen_config=gen_config)
with assume:
assert response.finish_reason == 'length', str(response)
with assume:
assert 'a 上海 is a 上海, ' * 10 in response.text, str(response)
del pipe
@pytest.mark.parametrize('model', ['internlm/internlm2-chat-20b'])
def future_test_backend_config_cache_max_entry_count(self, config, model):
model_path = '/'.join([config.get('model_path'), model])
backend_config = TurbomindEngineConfig(cache_max_entry_count=-1)
pipe = pipeline(model_path, backend_config=backend_config)
response = pipe(['Hi, pls intro yourself', 'Shanghai is'])
del pipe
for i in range(2):
with assume:
assert response[i].finish_reason == 'length', str(response[i])
@pytest.mark.parametrize('model', ['internlm/internlm2-chat-20b'])
def test_backend_config_max_batch_size2(self, config, model):
model_path = '/'.join([config.get('model_path'), model])
backend_config = TurbomindEngineConfig(max_batch_size=-1)
pipe = pipeline(model_path, backend_config=backend_config)
response = pipe(['Hi, pls intro yourself', 'Shanghai is'])
del pipe
for i in range(2):
with assume:
assert response[i].finish_reason is None, str(response[i])
with assume:
assert response[i].input_token_len == 0, str(response[i])
with assume:
assert response[i].generate_token_len == 0, str(response[i])
with assume:
assert response[i].text == '', str(response[i])
@pytest.mark.parametrize('model', ['internlm/internlm2-chat-20b'])
def test_pipeline_batch_infer(self, config, model):
model_path = '/'.join([config.get('model_path'), model])
pipe = pipeline(model_path)
response = pipe.batch_infer(['Hi, pls intro yourself'] * 10)
del pipe
assert len(response) == 10
for i in range(10):
with assume:
assert response[i].finish_reason is not None, str(response[i])
with assume:
assert response[i].input_token_len > 0, str(response[i])
with assume:
assert response[i].generate_token_len > 0, str(response[i])
with assume:
assert len(response[i].text) > 0, str(response[i])
with assume:
assert response[i].session_id == i
@pytest.mark.parametrize('model', ['internlm/internlm2-chat-20b'])
def test_pipeline_stream_infer(self, config, model):
model_path = '/'.join([config.get('model_path'), model])
pipe = pipeline(model_path)
for outputs in pipe.stream_infer(['Hi, pls intro yourself'] * 3):
with assume:
assert outputs.generate_token_len > 0, str(outputs)
with assume:
assert outputs.input_token_len > 50, str(outputs)
with assume:
assert outputs.session_id in (0, 1, 2), str(outputs)
with assume:
assert outputs.finish_reason in (None, 'stop',
'length'), str(outputs)
continue
with assume:
assert outputs.generate_token_len > 0, str(outputs)
with assume:
assert outputs.finish_reason in ('stop', 'length'), str(outputs)
i = 0
outputs_list = []
for outputs in pipe.stream_infer('Hi, pls intro yourself'):
i += 1
if outputs.finish_reason is None:
with assume:
assert outputs.generate_token_len == i, str(outputs)
else:
with assume:
assert outputs.generate_token_len == i - 1, str(outputs)
with assume:
assert outputs.input_token_len > 50, str(outputs)
with assume:
assert outputs.session_id == 0, str(outputs)
with assume:
assert outputs.finish_reason in (None, 'stop',
'length'), str(outputs)
outputs_list.append(outputs)
continue
for output in outputs_list[0:-1]:
with assume:
assert output.finish_reason is None, str(output)
with assume:
assert outputs_list[-1].finish_reason is not None, str(output)
@pytest.mark.parametrize('model', ['internlm/internlm2-chat-20b'])
def test_pipeline_stream_infer2(self, config, model):
model_path = '/'.join([config.get('model_path'), model])
pipe = pipeline(model_path)
prompts = [{
'role': 'user',
'content': '介绍成都的景点'
}, {
'role': 'user',
'content': '美食呢?'
}]
for outputs in pipe.stream_infer([prompts]):
with assume:
assert outputs.generate_token_len > 0, str(outputs)
with assume:
assert outputs.input_token_len > 50, str(outputs)
with assume:
assert outputs.session_id in (0, 1, 2), str(outputs)
with assume:
assert outputs.finish_reason in (None, 'stop',
'length'), str(outputs)
continue
with assume:
assert outputs.generate_token_len > 0, str(outputs)
with assume:
assert outputs.finish_reason in ('stop', 'length'), str(outputs)
i = 0
outputs_list = []
final_response = ''
for outputs in pipe.stream_infer([prompts]):
i += 1
final_response += outputs.text
if outputs.finish_reason is None:
with assume:
assert outputs.generate_token_len == i, str(outputs)
else:
with assume:
assert outputs.generate_token_len == i - 1, str(outputs)
with assume:
assert outputs.input_token_len > 50, str(outputs)
with assume:
assert outputs.session_id == 0, str(outputs)
with assume:
assert outputs.finish_reason in (None, 'stop',
'length'), str(outputs)
outputs_list.append(outputs)
continue
print(final_response)
for output in outputs_list[0:-1]:
with assume:
assert output.finish_reason is None, str(output)
with assume:
assert outputs_list[-1].finish_reason is not None, str(output)
with assume:
assert '成都' in final_response.lower(), str(output)
del pipe
import pytest
from utils.get_run_config import get_tp_num
from lmdeploy import TurbomindEngineConfig, pipeline
@pytest.mark.order(8)
@pytest.mark.pipeline_func
@pytest.mark.timeout(600)
class TestPipelineLongtextFunc:
def test_long_test_chat_7b(self, config):
model = 'internlm/internlm2-chat-7b'
tp_config = get_tp_num(config, model)
model_path = '/'.join([config.get('model_path'), model])
backend_config = TurbomindEngineConfig(rope_scaling_factor=2.0,
session_len=210000,
tp=tp_config)
pipe = pipeline(model_path, backend_config=backend_config)
prompt = '今 天 心 ' * int(200000 / 6)
# batch infer
pipe(prompt)
# stream infer
for outputs in pipe.stream_infer(prompt):
continue
prompts = ['今 天 心 ' * int(200000 / 6)] * 2
# batch infer
pipe(prompts)
# stream infer
for outputs in pipe.stream_infer(prompts):
continue
def test_long_test_chat_20b(self, config):
model = 'internlm/internlm2-chat-20b'
tp_config = get_tp_num(config, model)
model_path = '/'.join([config.get('model_path'), model])
backend_config = TurbomindEngineConfig(rope_scaling_factor=2.0,
session_len=210000,
tp=tp_config)
pipe = pipeline(model_path, backend_config=backend_config)
prompt = '今 天 心 ' * int(200000 / 6)
# batch infer
pipe(prompt)
# stream infer
for outputs in pipe.stream_infer(prompt):
continue
prompts = ['今 天 心 ' * int(200000 / 6)] * 2
# batch infer
pipe(prompts)
# stream infer
for outputs in pipe.stream_infer(prompts):
continue
def test_long_test_20b(self, config):
model = 'internlm/internlm2-20b'
tp_config = get_tp_num(config, model)
model_path = '/'.join([config.get('model_path'), model])
backend_config = TurbomindEngineConfig(rope_scaling_factor=2.0,
session_len=210000,
tp=tp_config)
pipe = pipeline(model_path, backend_config=backend_config)
prompt = '今 天 心 ' * int(200000 / 6)
# batch infer
pipe(prompt)
# stream infer
for outputs in pipe.stream_infer(prompt):
continue
prompts = ['今 天 心 ' * int(200000 / 6)] * 2
# batch infer
pipe(prompts)
# stream infer
for outputs in pipe.stream_infer(prompts):
continue
import pytest
from utils.restful_return_check import (assert_chat_completions_batch_return,
assert_chat_completions_stream_return,
assert_chat_interactive_batch_return,
assert_chat_interactive_stream_return)
from lmdeploy.serve.openai.api_client import APIClient
BASE_HTTP_URL = 'http://localhost'
DEFAULT_PORT = 23333
MODEL = 'internlm/internlm2-chat-20b'
MODEL_NAME = 'internlm2-chat-20b'
BASE_URL = ':'.join([BASE_HTTP_URL, str(DEFAULT_PORT)])
@pytest.mark.order(8)
@pytest.mark.pytorch
@pytest.mark.flaky(reruns=2)
class TestRestfulInterfaceChatCompletions:
def test_chat_completions_ignore_eos_batch(self):
api_client = APIClient(BASE_URL)
for output in api_client.chat_completions_v1(
model=MODEL_NAME,
messages='Hi, what is your name?',
ignore_eos=True,
max_tokens=100,
temperature=0.01):
continue
assert_chat_completions_batch_return(output, MODEL_NAME)
assert output.get('usage').get(
'completion_tokens') == 101 or output.get('usage').get(
'completion_tokens') == 100
assert output.get('choices')[0].get('finish_reason') == 'length'
def test_chat_completions_ignore_eos_stream(self):
api_client = APIClient(BASE_URL)
outputList = []
for output in api_client.chat_completions_v1(
model=MODEL_NAME,
messages='Hi, what is your name?',
ignore_eos=True,
stream=True,
max_tokens=100,
temperature=0.01):
outputList.append(output)
assert_chat_completions_stream_return(outputList[0], MODEL_NAME, True,
False)
assert_chat_completions_stream_return(outputList[-1], MODEL_NAME,
False, True)
for index in range(1, len(outputList) - 1):
assert_chat_completions_stream_return(outputList[index],
MODEL_NAME)
assert outputList[-1].get('choices')[0].get(
'finish_reason') == 'length'
assert len(outputList) == 102
def test_chat_completions_max_tokens_batch(self):
api_client = APIClient(BASE_URL)
for output in api_client.chat_completions_v1(
model=MODEL_NAME,
messages='Hi, pls intro yourself',
max_tokens=5,
temperature=0.01):
continue
assert_chat_completions_batch_return(output, MODEL_NAME)
assert output.get('choices')[0].get('finish_reason') == 'length'
assert output.get('usage').get('completion_tokens') == 6 or output.get(
'usage').get('completion_tokens') == 5
def test_chat_completions_max_tokens_stream(self):
api_client = APIClient(BASE_URL)
outputList = []
for output in api_client.chat_completions_v1(
model=MODEL_NAME,
messages='Hi, pls intro yourself',
stream=True,
max_tokens=5,
temperature=0.01):
outputList.append(output)
assert_chat_completions_stream_return(outputList[0], MODEL_NAME, True,
False)
assert_chat_completions_stream_return(outputList[-1], MODEL_NAME,
False, True)
for index in range(1, len(outputList) - 1):
assert_chat_completions_stream_return(outputList[index],
MODEL_NAME)
assert outputList[-1].get('choices')[0].get(
'finish_reason') == 'length'
assert len(outputList) == 7
def test_chat_completions_repetition_penalty_stream(self):
api_client = APIClient(BASE_URL)
outputList = []
response = ''
for output in api_client.chat_completions_v1(
model=MODEL_NAME,
messages='Hi, pls intro yourself',
stream=True,
repetition_penalty=0.1,
temperature=0.01,
max_tokens=200):
outputList.append(output)
assert_chat_completions_stream_return(outputList[0], MODEL_NAME, True,
False)
assert_chat_completions_stream_return(outputList[-1], MODEL_NAME,
False, True)
for index in range(1, len(outputList) - 1):
assert_chat_completions_stream_return(outputList[index],
MODEL_NAME)
response += outputList[index].get('choices')[0].get('delta').get(
'content')
assert 'pls pls ' * 5 in response or \
'Hi, pls intro yourself\n' * 5 in response
def test_chat_completions_topp_min_batch(self):
api_client = APIClient(BASE_URL)
outputList = []
for i in range(3):
for output in api_client.chat_completions_v1(
model=MODEL_NAME,
messages='Shanghai is',
top_p=0.1,
temperature=0.01):
outputList.append(output)
assert_chat_completions_batch_return(output, MODEL_NAME)
print(output)
assert outputList[0].get('choices')[0].get('message').get(
'content') == outputList[1].get('choices')[0].get('message').get(
'content')
assert outputList[1].get('choices')[0].get('message').get(
'content') == outputList[2].get('choices')[0].get('message').get(
'content')
def test_chat_completions_topp_min_stream(self):
api_client = APIClient(BASE_URL)
responseList = []
for i in range(3):
outputList = []
response = ''
for output in api_client.chat_completions_v1(
model=MODEL_NAME,
messages='Hi, pls intro yourself',
stream=True,
top_p=0.1,
temperature=0.01):
outputList.append(output)
assert_chat_completions_stream_return(outputList[0], MODEL_NAME,
True, False)
assert_chat_completions_stream_return(outputList[-1], MODEL_NAME,
False, True)
for index in range(1, len(outputList) - 1):
assert_chat_completions_stream_return(outputList[index],
MODEL_NAME)
response += outputList[index].get('choices')[0].get(
'delta').get('content')
responseList.append(response)
assert responseList[0] == responseList[1]
assert responseList[1] == responseList[2]
def test_chat_completions_longinput_stream(self):
api_client = APIClient(BASE_URL)
outputList = []
for output in api_client.chat_completions_v1(
model=MODEL_NAME,
messages='Hi, pls intro yourself' * 10000,
stream=True,
temperature=0.01):
outputList.append(output)
assert_chat_completions_stream_return(outputList[0], MODEL_NAME, True,
False)
assert_chat_completions_stream_return(outputList[0], MODEL_NAME, True,
False)
for index in range(1, len(outputList) - 1):
assert_chat_completions_stream_return(outputList[index],
MODEL_NAME)
assert outputList[1].get('choices')[0].get('finish_reason') == 'length'
assert outputList[1].get('choices')[0].get('delta').get(
'content') == ''
assert len(outputList) == 2
@pytest.mark.order(8)
@pytest.mark.pytorch
@pytest.mark.flaky(reruns=2)
class TestRestfulInterfaceChatInteractive:
def test_chat_interactive_ignore_eos_batch(self):
api_client = APIClient(BASE_URL)
for output in api_client.chat_interactive_v1(
prompt='Hi, what is your name?',
ignore_eos=True,
request_output_len=100,
temperature=0.01):
continue
assert_chat_interactive_batch_return(output)
assert output.get('tokens') == 100
assert output.get('finish_reason') == 'length'
def test_chat_interactive_ignore_eos_stream(self):
api_client = APIClient(BASE_URL)
outputList = []
for output in api_client.chat_interactive_v1(
prompt='Hi, what is your name?',
ignore_eos=True,
stream=True,
request_output_len=100,
temperature=0.01):
outputList.append(output)
print(output)
assert_chat_interactive_stream_return(outputList[-1],
True,
index=len(outputList) - 2)
for index in range(0, len(outputList) - 1):
assert_chat_interactive_stream_return(outputList[index],
index=index)
assert output.get('finish_reason') == 'length'
assert len(outputList) == 101
def test_chat_interactive_max_tokens_batch(self):
api_client = APIClient(BASE_URL)
for output in api_client.chat_interactive_v1(
prompt='Hi, pls intro yourself',
request_output_len=5,
temperature=0.01):
continue
assert_chat_interactive_batch_return(output)
assert output.get('finish_reason') == 'length'
assert output.get('tokens') == 5
def test_chat_interactive_max_tokens_stream(self):
api_client = APIClient(BASE_URL)
outputList = []
for output in api_client.chat_interactive_v1(
prompt='Hi, pls intro yourself',
stream=True,
request_output_len=5,
temperature=0.01):
outputList.append(output)
assert_chat_interactive_stream_return(outputList[-1],
True,
index=len(outputList) - 2)
for index in range(0, len(outputList) - 1):
assert_chat_interactive_stream_return(outputList[index],
index=index)
assert output.get('finish_reason') == 'length'
assert len(outputList) == 6
def test_chat_interactive_topp_min_batch(self):
api_client = APIClient(BASE_URL)
outputList = []
for i in range(3):
for output in api_client.chat_interactive_v1(prompt='Shanghai is',
top_p=0.01,
temperature=0.01):
continue
assert_chat_interactive_batch_return(output)
outputList.append(output)
print(output)
assert outputList[0] == outputList[1]
assert outputList[1] == outputList[2]
def test_chat_interactive_topp_min_stream(self):
api_client = APIClient(BASE_URL)
responseList = []
for i in range(3):
outputList = []
response = ''
for output in api_client.chat_interactive_v1(
model=MODEL_NAME,
prompt='Hi, pls intro yourself',
stream=True,
top_p=0.01,
temperature=0.01):
outputList.append(output)
assert_chat_interactive_stream_return(outputList[-1],
True,
index=len(outputList) - 2)
for index in range(0, len(outputList) - 1):
assert_chat_interactive_stream_return(outputList[index],
index=index)
response += outputList[index].get('text')
responseList.append(response)
assert responseList[0] == responseList[1]
assert responseList[1] == responseList[2]
import pytest
from utils.restful_return_check import (assert_chat_completions_batch_return,
assert_chat_completions_stream_return,
assert_chat_interactive_batch_return,
assert_chat_interactive_stream_return)
from lmdeploy.serve.openai.api_client import APIClient
BASE_HTTP_URL = 'http://localhost'
DEFAULT_PORT = 23333
MODEL = 'internlm/internlm2-chat-20b'
MODEL_NAME = 'internlm2-chat-20b'
BASE_URL = ':'.join([BASE_HTTP_URL, str(DEFAULT_PORT)])
@pytest.mark.order(8)
@pytest.mark.turbomind
@pytest.mark.flaky(reruns=2)
class TestRestfulInterfaceChatCompletions:
def test_chat_completions_ignore_eos_batch(self):
api_client = APIClient(BASE_URL)
for output in api_client.chat_completions_v1(
model=MODEL_NAME,
messages='Hi, what is your name?',
ignore_eos=True,
max_tokens=100,
temperature=0.01):
continue
assert_chat_completions_batch_return(output, MODEL_NAME)
assert output.get('usage').get('completion_tokens') == 101
assert output.get('choices')[0].get('finish_reason') == 'length'
def test_chat_completions_ignore_eos_stream(self):
api_client = APIClient(BASE_URL)
outputList = []
for output in api_client.chat_completions_v1(
model=MODEL_NAME,
messages='Hi, what is your name?',
ignore_eos=True,
stream=True,
max_tokens=100,
temperature=0.01):
outputList.append(output)
assert_chat_completions_stream_return(outputList[0], MODEL_NAME, True,
False)
assert_chat_completions_stream_return(outputList[-1], MODEL_NAME,
False, True)
for index in range(1, len(outputList) - 1):
assert_chat_completions_stream_return(outputList[index],
MODEL_NAME)
assert outputList[-1].get('choices')[0].get(
'finish_reason') == 'length'
assert len(outputList) == 103
def test_chat_completions_max_tokens_batch(self):
api_client = APIClient(BASE_URL)
for output in api_client.chat_completions_v1(
model=MODEL_NAME,
messages='Hi, pls intro yourself',
max_tokens=5,
temperature=0.01):
continue
assert_chat_completions_batch_return(output, MODEL_NAME)
assert output.get('choices')[0].get('finish_reason') == 'length'
assert output.get('usage').get('completion_tokens') == 6
def test_chat_completions_max_tokens_stream(self):
api_client = APIClient(BASE_URL)
outputList = []
for output in api_client.chat_completions_v1(
model=MODEL_NAME,
messages='Hi, pls intro yourself',
stream=True,
max_tokens=5,
temperature=0.01):
outputList.append(output)
assert_chat_completions_stream_return(outputList[0], MODEL_NAME, True,
False)
assert_chat_completions_stream_return(outputList[-1], MODEL_NAME,
False, True)
for index in range(1, len(outputList) - 1):
assert_chat_completions_stream_return(outputList[index],
MODEL_NAME)
assert outputList[-1].get('choices')[0].get(
'finish_reason') == 'length'
assert len(outputList) == 8
def test_chat_completions_repetition_penalty_stream(self):
api_client = APIClient(BASE_URL)
outputList = []
response = ''
for output in api_client.chat_completions_v1(
model=MODEL_NAME,
messages='Hi, pls intro yourself',
stream=True,
repetition_penalty=0.1,
temperature=0.01,
max_tokens=200):
outputList.append(output)
assert_chat_completions_stream_return(outputList[0], MODEL_NAME, True,
False)
assert_chat_completions_stream_return(outputList[-1], MODEL_NAME,
False, True)
for index in range(1, len(outputList) - 1):
assert_chat_completions_stream_return(outputList[index],
MODEL_NAME)
response += outputList[index].get('choices')[0].get('delta').get(
'content')
assert 'pls pls ' * 5 in response or \
'Hi, pls intro yourself\n' * 5 in response
def test_chat_completions_topp_min_batch(self):
api_client = APIClient(BASE_URL)
outputList = []
for i in range(3):
for output in api_client.chat_completions_v1(
model=MODEL_NAME, messages='Shanghai is', top_p=0.1):
outputList.append(output)
assert_chat_completions_batch_return(output, MODEL_NAME)
assert outputList[0].get('choices')[0].get('message').get(
'content') == outputList[1].get('choices')[0].get('message').get(
'content')
assert outputList[1].get('choices')[0].get('message').get(
'content') == outputList[2].get('choices')[0].get('message').get(
'content')
def test_chat_completions_topp_min_stream(self):
api_client = APIClient(BASE_URL)
responseList = []
for i in range(3):
outputList = []
response = ''
for output in api_client.chat_completions_v1(
model=MODEL_NAME,
messages='Hi, pls intro yourself',
stream=True,
top_p=0.1):
outputList.append(output)
assert_chat_completions_stream_return(outputList[0], MODEL_NAME,
True, False)
assert_chat_completions_stream_return(outputList[-1], MODEL_NAME,
False, True)
for index in range(1, len(outputList) - 1):
assert_chat_completions_stream_return(outputList[index],
MODEL_NAME)
response += outputList[index].get('choices')[0].get(
'delta').get('content')
responseList.append(response)
assert responseList[0] == responseList[1]
assert responseList[1] == responseList[2]
def test_chat_completions_longinput_stream(self):
api_client = APIClient(BASE_URL)
outputList = []
for output in api_client.chat_completions_v1(
model=MODEL_NAME,
messages='Hi, pls intro yourself' * 10000,
stream=True,
temperature=0.01):
outputList.append(output)
assert_chat_completions_stream_return(outputList[0], MODEL_NAME, True,
False)
assert outputList[1].get('choices')[0].get('finish_reason') == 'length'
assert outputList[1].get('choices')[0].get('delta').get(
'content') == ''
assert len(outputList) == 2
@pytest.mark.order(8)
@pytest.mark.turbomind
@pytest.mark.flaky(reruns=2)
class TestRestfulInterfaceChatInteractive:
def test_chat_interactive_ignore_eos_batch(self):
api_client = APIClient(BASE_URL)
for output in api_client.chat_interactive_v1(
prompt='Hi, what is your name?',
ignore_eos=True,
request_output_len=100,
temperature=0.01):
continue
assert_chat_interactive_batch_return(output)
assert output.get('tokens') == 101
assert output.get('finish_reason') == 'length'
def test_chat_interactive_ignore_eos_stream(self):
api_client = APIClient(BASE_URL)
outputList = []
for output in api_client.chat_interactive_v1(
prompt='Hi, what is your name?',
ignore_eos=True,
stream=True,
request_output_len=100,
temperature=0.01):
outputList.append(output)
assert_chat_interactive_stream_return(outputList[-1],
True,
index=len(outputList) - 2)
for index in range(0, len(outputList) - 1):
assert_chat_interactive_stream_return(outputList[index],
index=index)
assert output.get('finish_reason') == 'length'
assert len(outputList) == 102
def test_chat_interactive_max_tokens_batch(self):
api_client = APIClient(BASE_URL)
for output in api_client.chat_interactive_v1(
prompt='Hi, pls intro yourself',
request_output_len=5,
temperature=0.01):
continue
assert_chat_interactive_batch_return(output)
assert output.get('finish_reason') == 'length'
assert output.get('tokens') == 6
def test_chat_interactive_max_tokens_stream(self):
api_client = APIClient(BASE_URL)
outputList = []
for output in api_client.chat_interactive_v1(
prompt='Hi, pls intro yourself',
stream=True,
request_output_len=5,
temperature=0.01):
outputList.append(output)
assert_chat_interactive_stream_return(outputList[-1],
True,
index=len(outputList) - 2)
for index in range(0, len(outputList) - 1):
assert_chat_interactive_stream_return(outputList[index],
index=index)
assert output.get('finish_reason') == 'length'
assert len(outputList) == 7
def test_chat_interactive_topp_min_batch(self):
api_client = APIClient(BASE_URL)
outputList = []
for i in range(3):
for output in api_client.chat_interactive_v1(prompt='Shanghai is',
top_p=0.01):
continue
assert_chat_interactive_batch_return(output)
outputList.append(output)
assert outputList[0] == outputList[1]
assert outputList[1] == outputList[2]
def test_chat_interactive_topp_min_stream(self):
api_client = APIClient(BASE_URL)
responseList = []
for i in range(3):
outputList = []
response = ''
for output in api_client.chat_interactive_v1(
model=MODEL_NAME,
prompt='Hi, pls intro yourself',
stream=True,
top_p=0.01):
outputList.append(output)
assert_chat_interactive_stream_return(outputList[-1],
True,
index=len(outputList) - 2)
for index in range(0, len(outputList) - 1):
assert_chat_interactive_stream_return(outputList[index],
index=index)
response += outputList[index].get('text')
responseList.append(response)
assert responseList[0] == responseList[1]
assert responseList[1] == responseList[2]
common_case:
- 你好,你叫什么名字#hi, what's your name:
- 介绍相应美食#please introduce some delicious foods:
- not_contain:
- 成都
- chengdu
- internlm2-1_8b:
- len_g:
10
memory_test:
- 介绍成都的景点#please introduce attractions in Chengdu:
- contain:
- chengdu
- 成都
- contain:
- 熊猫
- panda
- 宽窄巷子
- jinli
- leshan
- 历史悠久
- falcon:
- contain:
- chengdu
- 成都
- internlm2-1_8b:
- contain:
- chengdu
- 成都
- internlm2-20b:
- contain:
- chengdu
- 成都
- 介绍相应美食#please introduce some delicious foods:
- contain:
- 成都
- chengdu
- 四川
- contain:
- 火锅
- hotpot
- hot pot
- 夫妻肺片
- falcon:
- len_g:
10
- internlm2-1_8b:
- contain:
- chengdu
- 成都
- internlm2-20b:
- contain:
- chengdu
- 成都
chinese_poem_case:
- 给我一首中文打油诗,需要添加标点符号。和,请用中文回答Give me a Chinese poem in Chinese:
- contain:
- ","
- "。"
- len_g:
5
- llama-2:
- contain:
- poem
- poetry
- len_g:
5
- codellama:
- contain:
- poem
- poetry
- len_g:
5
- internlm2-1_8b:
- len_g:
5
- internlm2-20b:
- len_g:
5
- falcon:
- len_g:
5
english_poem_case:
- write a romantic English poem:
- contain:
- " "
- contain:
- "."
- ","
- contain:
- love
- poem
- len_g:
100
- internlm2-1_8b:
- len_g:
100
- internlm2-20b:
- len_g:
100
- falcon:
- len_g:
1
emoji_case:
- 请输出👍赞的emoji#print output the emoji of good👍:
- contain:
- 👍
- 😊
- baichuan2-7b:
- contain:
- 👍
- 😊
- \u2714
-
- emoji
- '!'
traditional_chinese_case:
- 使用繁體介紹香港維多利亞港:
- contain:
- victoria
- 維多利亞港
- 维多利亚港
- codellama:
- contain:
- victoria
- 維多利亞港
- 维多利亚港
- hong kong
- internlm2-20b:
- contain:
- victoria
- 維多利亞港
- 维多利亚港
- hong kong
- 香港
- llama-2:
- contain:
- victoria
- 維多利亞港
- 维多利亚港
- apologize
- falcon:
- len_g:
1
[pytest]
python_files = test*_*.py # test file
python_classes = Test* # test class
python_functions = test_* # test function
pytest_runtest_call.tryfirst = True
filterwarnings = ignore::UserWarning
reruns = 2
reruns_delay = 10
import allure
import conftest
import pytest
from utils.config_utils import (get_cuda_prefix_by_workerid,
get_torch_model_list)
from utils.run_client_chat import hf_command_line_test
conftest._init_cli_case_list()
case_list = conftest.global_cli_case_List
def getCaseList():
return case_list
@pytest.mark.order(10)
@pytest.mark.usefixtures('cli_case_config')
@pytest.mark.hf_pytorch_chat
@pytest.mark.gpu_num_1
@pytest.mark.parametrize('usercase', getCaseList())
@pytest.mark.parametrize('model', get_torch_model_list(tp_num=1))
def test_hf_pytorch_chat_tp1(config, model, cli_case_config, usercase,
worker_id):
result, chat_log, msg = hf_command_line_test(
config,
usercase,
cli_case_config.get(usercase),
model,
'torch',
cuda_prefix=get_cuda_prefix_by_workerid(worker_id))
if chat_log is not None:
allure.attach.file(chat_log,
attachment_type=allure.attachment_type.TEXT)
assert result, msg
@pytest.mark.order(10)
@pytest.mark.usefixtures('cli_case_config')
@pytest.mark.hf_pytorch_chat
@pytest.mark.gpu_num_2
@pytest.mark.parametrize('usercase', getCaseList())
@pytest.mark.parametrize('model', get_torch_model_list(tp_num=2))
def test_hf_pytorch_chat_tp2(config, model, cli_case_config, usercase,
worker_id):
result, chat_log, msg = hf_command_line_test(
config,
usercase,
cli_case_config.get(usercase),
model,
'torch',
cuda_prefix=get_cuda_prefix_by_workerid(worker_id, tp_num=2))
if chat_log is not None:
allure.attach.file(chat_log,
attachment_type=allure.attachment_type.TEXT)
assert result, msg
@pytest.mark.order(10)
@pytest.mark.usefixtures('cli_case_config')
@pytest.mark.hf_pytorch_chat
@pytest.mark.pr_test
@pytest.mark.xdist_group(name='pr_test')
@pytest.mark.parametrize('usercase', getCaseList())
@pytest.mark.parametrize('model', ['internlm/internlm2-chat-20b'])
def test_hf_pytorch_chat_pr(config, model, cli_case_config, usercase):
result, chat_log, msg = hf_command_line_test(
config,
usercase,
cli_case_config.get(usercase),
model,
'torch',
cuda_prefix='CUDA_VISIBLE_DEVICES=5,6')
if chat_log is not None:
allure.attach.file(chat_log,
attachment_type=allure.attachment_type.TEXT)
assert result, msg
import allure
import conftest
import pytest
from utils.config_utils import (get_cuda_prefix_by_workerid,
get_turbomind_model_list)
from utils.run_client_chat import hf_command_line_test
conftest._init_cli_case_list()
case_list = conftest.global_cli_case_List
def getCaseList():
return case_list
@pytest.mark.order(10)
@pytest.mark.usefixtures('cli_case_config')
@pytest.mark.hf_turbomind_chat
@pytest.mark.gpu_num_1
@pytest.mark.parametrize('usercase', getCaseList())
@pytest.mark.parametrize('model', get_turbomind_model_list(tp_num=1))
def test_hf_turbomind_chat_tp1(config, model, cli_case_config, usercase,
worker_id):
result, chat_log, msg = hf_command_line_test(
config,
usercase,
cli_case_config.get(usercase),
model,
'turbomind',
cuda_prefix=get_cuda_prefix_by_workerid(worker_id))
if chat_log is not None:
allure.attach.file(chat_log,
attachment_type=allure.attachment_type.TEXT)
assert result, msg
@pytest.mark.order(10)
@pytest.mark.usefixtures('cli_case_config')
@pytest.mark.hf_turbomind_chat
@pytest.mark.gpu_num_2
@pytest.mark.parametrize('usercase', getCaseList())
@pytest.mark.parametrize('model', get_turbomind_model_list(tp_num=2))
def test_hf_turbomind_chat_tp2(config, model, cli_case_config, usercase,
worker_id):
result, chat_log, msg = hf_command_line_test(
config,
usercase,
cli_case_config.get(usercase),
model,
'turbomind',
cuda_prefix=get_cuda_prefix_by_workerid(worker_id, tp_num=2))
if chat_log is not None:
allure.attach.file(chat_log,
attachment_type=allure.attachment_type.TEXT)
assert result, msg
@pytest.mark.order(10)
@pytest.mark.usefixtures('cli_case_config')
@pytest.mark.hf_turbomind_chat
@pytest.mark.pr_test
@pytest.mark.xdist_group(name='pr_test')
@pytest.mark.parametrize('usercase', getCaseList())
@pytest.mark.parametrize(
'model',
['internlm/internlm2-chat-20b', 'internlm/internlm2-chat-20b-inner-w4a16'])
def test_hf_turbomind_chat_pr(config, model, cli_case_config, usercase):
result, chat_log, msg = hf_command_line_test(
config,
usercase,
cli_case_config.get(usercase),
model,
'turbomind',
cuda_prefix='CUDA_VISIBLE_DEVICES=5,6')
if chat_log is not None:
allure.attach.file(chat_log,
attachment_type=allure.attachment_type.TEXT)
assert result, msg
import allure
import conftest
import pytest
from utils.config_utils import (get_cuda_prefix_by_workerid,
get_turbomind_model_list)
from utils.run_client_chat import command_line_test
conftest._init_cli_case_list()
prompt_list = conftest.global_cli_case_List
def getPromptCaseList():
return prompt_list
def getModelList(tp_num):
return [
item for item in get_turbomind_model_list(tp_num)
if 'kvint8' not in item.lower()
]
@pytest.mark.order(10)
@pytest.mark.usefixtures('cli_case_config')
@pytest.mark.command_chat
@pytest.mark.gpu_num_1
@pytest.mark.parametrize('usercase', getPromptCaseList())
@pytest.mark.parametrize('model', getModelList(tp_num=1))
def test_workspace_chat_tp1(config, cli_case_config, usercase, model,
worker_id):
result, chat_log, msg = command_line_test(
config,
usercase,
cli_case_config.get(usercase),
model,
'turbomind',
cuda_prefix=get_cuda_prefix_by_workerid(worker_id))
if chat_log is not None:
allure.attach.file(chat_log,
attachment_type=allure.attachment_type.TEXT)
assert result, msg
@pytest.mark.order(10)
@pytest.mark.usefixtures('cli_case_config')
@pytest.mark.command_chat
@pytest.mark.gpu_num_2
@pytest.mark.parametrize('usercase', getPromptCaseList())
@pytest.mark.parametrize('model', getModelList(tp_num=2))
def test_workspace_chat_tp2(config, cli_case_config, usercase, model,
worker_id):
result, chat_log, msg = command_line_test(
config,
usercase,
cli_case_config.get(usercase),
model,
'turbomind',
cuda_prefix=get_cuda_prefix_by_workerid(worker_id, tp_num=2))
if chat_log is not None:
allure.attach.file(chat_log,
attachment_type=allure.attachment_type.TEXT)
assert result, msg
@pytest.mark.order(10)
@pytest.mark.usefixtures('cli_case_config')
@pytest.mark.command_chat
@pytest.mark.pr_test
@pytest.mark.parametrize('usercase', getPromptCaseList())
@pytest.mark.parametrize(
'model',
['internlm/internlm2-chat-20b', 'internlm/internlm2-chat-20b-inner-w4a16'])
def test_workspace_chat_pr(config, cli_case_config, usercase, model):
result, chat_log, msg = command_line_test(
config,
usercase,
cli_case_config.get(usercase),
model,
'turbomind',
None,
cuda_prefix='CUDA_VISIBLE_DEVICES=5,6')
if chat_log is not None:
allure.attach.file(chat_log,
attachment_type=allure.attachment_type.TEXT)
assert result, msg
import os
import subprocess
from subprocess import PIPE
import allure
import pytest
from utils.config_utils import (get_cuda_prefix_by_workerid,
get_turbomind_model_list)
from utils.get_run_config import get_command_with_extra, get_model_name
@pytest.mark.order(5)
@pytest.mark.convert
@pytest.mark.parametrize('model', get_turbomind_model_list())
def test_convert(config, model, worker_id):
convert(config, model, get_cuda_prefix_by_workerid(worker_id))
@pytest.mark.order(5)
@pytest.mark.convert
@pytest.mark.pr_test
@pytest.mark.xdist_group(name='pr_test')
@pytest.mark.parametrize(
'model',
['internlm/internlm2-chat-20b', 'internlm/internlm2-chat-20b-inner-w4a16'])
def test_convert_pr(config, model):
convert(config, model, 'CUDA_VISIBLE_DEVICES=5')
def convert(config, model_case, cuda_prefix):
origin_model_path = config.get('model_path') + '/' + model_case
dst_path = config.get('dst_path') + '/workspace_' + model_case
log_path = config.get('log_path')
model_name = get_model_name(model_case)
if 'w4' in model_case or '4bits' in model_case:
cmd = get_command_with_extra(' '.join([
'lmdeploy convert', model_name, origin_model_path, '--dst-path',
dst_path, '--model-format awq --group-size 128'
]),
config,
model_name,
True,
cuda_prefix=cuda_prefix)
else:
cmd = get_command_with_extra(' '.join([
'lmdeploy convert', model_name, origin_model_path, '--dst-path',
dst_path
]),
config,
model_name,
True,
cuda_prefix=cuda_prefix)
convert_log = os.path.join(log_path,
'convert_' + model_case.split('/')[1] + '.log')
print('reproduce command convert: ' + cmd + '\n')
with open(convert_log, 'w') as f:
# remove existing workspace
subprocess.run([' '.join(['rm -rf', dst_path])],
stdout=f,
stderr=f,
shell=True,
text=True,
encoding='utf-8')
f.writelines('reproduce command convert: ' + cmd + '\n')
# convert
convertRes = subprocess.run([cmd],
stdout=f,
stderr=PIPE,
shell=True,
text=True,
encoding='utf-8')
f.writelines(convertRes.stderr)
# check result
result = convertRes.returncode == 0
allure.attach.file(convert_log,
attachment_type=allure.attachment_type.TEXT)
assert result, convertRes.stderr
import os
import fire
import yaml
from lmdeploy import pipeline
from lmdeploy.messages import (GenerationConfig, PytorchEngineConfig,
TurbomindEngineConfig)
cli_prompt_case_file = 'autotest/chat_prompt_case.yaml'
common_prompt_case_file = 'autotest/prompt_case.yaml'
config_file = 'autotest/config.yaml'
def main(type: str, model, tp: int = 1):
config_path = os.path.join(config_file)
with open(config_path) as f:
env_config = yaml.load(f.read(), Loader=yaml.SafeLoader)
case_path = os.path.join(common_prompt_case_file)
with open(case_path) as f:
case_config = yaml.load(f.read(), Loader=yaml.SafeLoader)
run_pipeline_chat_test(env_config, case_config, model, tp, type)
def run_pipeline_chat_test(config, cases_info, model_case, tp, type):
model_path = config.get('model_path')
hf_path = model_path + '/' + model_case
if 'pytorch' == type:
backend_config = PytorchEngineConfig(tp=tp)
else:
if 'kvint8' in model_case and ('w4' in model_case
or '4bits' in model_case):
backend_config = TurbomindEngineConfig(tp=tp,
model_format='awq',
quant_policy=4)
elif 'kvint8' in model_case:
backend_config = TurbomindEngineConfig(tp=tp,
model_format='hf',
quant_policy=4)
elif 'w4' in model_case or '4bits' in model_case:
backend_config = TurbomindEngineConfig(tp=tp, model_format='awq')
else:
backend_config = TurbomindEngineConfig(tp=tp)
pipe = pipeline(hf_path, backend_config=backend_config)
# run testcases
gen_config = GenerationConfig(temperature=0.01)
for case in cases_info.keys():
if (case == 'memory_test'
or case == 'emoji_case') and 'chat' not in model_case.lower():
continue
case_info = cases_info.get(case)
print('case:' + case)
prompts = []
for prompt_detail in case_info:
prompt = list(prompt_detail.keys())[0]
if 'chat' not in model_case.lower(): # base model
prompts.append(prompt)
else: # chat model
prompts.append({'role': 'user', 'content': prompt})
print('prompt:' + prompt)
if 'chat' not in model_case.lower(): # base model
response = pipe(prompts, gen_config=gen_config)[-1].text
else: # chat model
response = pipe([prompts], gen_config=gen_config)[0].text
if 'chat' in model_case.lower():
prompts.append({'role': 'assistant', 'content': response})
print('output:' + response)
if __name__ == '__main__':
fire.Fire(main)
import os
from multiprocessing import Process
import pytest
from utils.config_utils import get_cuda_id_by_workerid, get_torch_model_list
from utils.pipeline_chat import (assert_pipeline_chat_log,
run_pipeline_chat_test)
def getModelList(tp_num):
return [
item for item in get_torch_model_list(tp_num)
if 'falcon' not in item.lower() and 'chatglm2' not in item.lower()
]
@pytest.mark.order(6)
@pytest.mark.usefixtures('common_case_config')
@pytest.mark.pipeline_chat_pytorch
@pytest.mark.gpu_num_1
@pytest.mark.flaky(reruns=0)
@pytest.mark.parametrize('model', getModelList(tp_num=1))
def test_pipeline_chat_pytorch_tp1(config, common_case_config, model,
worker_id):
if 'gw' in worker_id:
os.environ['CUDA_VISIBLE_DEVICES'] = get_cuda_id_by_workerid(worker_id)
p = Process(target=run_pipeline_chat_test,
args=(config, common_case_config, model, 'pytorch'))
p.start()
p.join()
# assert script
assert_pipeline_chat_log(config, common_case_config, model)
@pytest.mark.order(6)
@pytest.mark.usefixtures('common_case_config')
@pytest.mark.pipeline_chat_pytorch
@pytest.mark.gpu_num_2
@pytest.mark.flaky(reruns=0)
@pytest.mark.parametrize('model', getModelList(tp_num=2))
def test_pipeline_chat_pytorch_tp2(config, common_case_config, model,
worker_id):
if 'gw' in worker_id:
os.environ['CUDA_VISIBLE_DEVICES'] = get_cuda_id_by_workerid(worker_id,
tp_num=2)
p = Process(target=run_pipeline_chat_test,
args=(config, common_case_config, model, 'pytorch'))
p.start()
p.join()
# assert script
assert_pipeline_chat_log(config, common_case_config, model)
@pytest.mark.order(6)
@pytest.mark.usefixtures('common_case_config')
@pytest.mark.pipeline_chat_pytorch
@pytest.mark.flaky(reruns=0)
@pytest.mark.pr_test
@pytest.mark.parametrize('model', ['internlm/internlm2-chat-20b'])
def test_pipeline_chat_pytorch_pr(config, common_case_config, model):
p = Process(target=run_pipeline_chat_test,
args=(config, common_case_config, model, 'pytorch'))
p.start()
p.join()
# assert script
assert_pipeline_chat_log(config, common_case_config, model)
import os
from multiprocessing import Process
import pytest
from utils.config_utils import get_all_model_list, get_cuda_id_by_workerid
from utils.pipeline_chat import (assert_pipeline_chat_log,
run_pipeline_chat_test)
@pytest.mark.order(6)
@pytest.mark.usefixtures('common_case_config')
@pytest.mark.pipeline_chat
@pytest.mark.gpu_num_1
@pytest.mark.flaky(reruns=0)
@pytest.mark.parametrize('model', get_all_model_list(tp_num=1))
def test_pipeline_chat_tp1(config, common_case_config, model, worker_id):
if 'gw' in worker_id:
os.environ['CUDA_VISIBLE_DEVICES'] = get_cuda_id_by_workerid(worker_id)
p = Process(target=run_pipeline_chat_test,
args=(config, common_case_config, model, 'turbomind'))
p.start()
p.join()
assert_pipeline_chat_log(config, common_case_config, model)
@pytest.mark.order(6)
@pytest.mark.usefixtures('common_case_config')
@pytest.mark.pipeline_chat
@pytest.mark.gpu_num_2
@pytest.mark.flaky(reruns=0)
@pytest.mark.parametrize('model', get_all_model_list(tp_num=2))
def test_pipeline_chat_tp2(config, common_case_config, model, worker_id):
if 'gw' in worker_id:
os.environ['CUDA_VISIBLE_DEVICES'] = get_cuda_id_by_workerid(worker_id,
tp_num=2)
p = Process(target=run_pipeline_chat_test,
args=(config, common_case_config, model, 'turbomind'))
p.start()
p.join()
assert_pipeline_chat_log(config, common_case_config, model)
@pytest.mark.order(6)
@pytest.mark.usefixtures('common_case_config')
@pytest.mark.pipeline_chat
@pytest.mark.flaky(reruns=0)
@pytest.mark.pr_test
@pytest.mark.parametrize(
'model',
['internlm/internlm2-chat-20b', 'internlm/internlm2-chat-20b-inner-w4a16'])
def test_pipeline_chat_pr(config, common_case_config, model):
p = Process(target=run_pipeline_chat_test,
args=(config, common_case_config, model, 'turbomind'))
p.start()
p.join()
assert_pipeline_chat_log(config, common_case_config, model)
import os
import allure
import pytest
from utils.config_utils import get_cuda_prefix_by_workerid
from utils.quantization_utils import quantization
model_list = [
'meta-llama/Llama-2-7b-chat', 'internlm/internlm-chat-20b',
'internlm/internlm2-chat-20b', 'Qwen/Qwen-7B-Chat', 'Qwen/Qwen-14B-Chat',
'internlm/internlm2-20b', 'baichuan-inc/Baichuan2-7B-Chat'
]
@pytest.mark.order(1)
@pytest.mark.quantization_kvint8
@pytest.mark.timeout(900)
@pytest.mark.parametrize('model', model_list)
def test_quantization_kvint8(config, model, worker_id):
quantization_kvint8(config, model + '-inner-kvint8', model,
get_cuda_prefix_by_workerid(worker_id))
def quantization_kvint8(config, quantization_model_name, origin_model_name,
cuda_prefix):
quantization_type = 'kvint8'
result, msg = quantization(config, quantization_model_name,
origin_model_name, quantization_type,
cuda_prefix)
log_path = config.get('log_path')
quantization_log = os.path.join(
log_path, '_'.join([
'quantization', quantization_type,
quantization_model_name.split('/')[1]
]) + '.log')
allure.attach.file(quantization_log,
attachment_type=allure.attachment_type.TEXT)
assert result, msg
import os
import allure
import pytest
from utils.config_utils import get_cuda_prefix_by_workerid
from utils.quantization_utils import quantization
model_list = [
'meta-llama/Llama-2-7b-chat-inner-kvint8',
'internlm/internlm-chat-20b-inner-kvint8',
'internlm/internlm2-chat-20b-inner-kvint8',
'Qwen/Qwen-7B-Chat-inner-kvint8', 'Qwen/Qwen-14B-Chat-inner-kvint8',
'internlm/internlm2-20b-inner-kvint8',
'baichuan-inc/Baichuan2-7B-Chat-inner-kvint8'
]
@pytest.mark.order(4)
@pytest.mark.quantization_kvint8_w4a16
@pytest.mark.timeout(900)
@pytest.mark.parametrize('model', model_list)
def test_quantization_kvint8_w4a16(config, model, worker_id):
quantization_kvint8(config, model + '-w4a16', model,
get_cuda_prefix_by_workerid(worker_id))
def quantization_kvint8(config, quantization_model_name, origin_model_name,
cuda_prefix):
quantization_type = 'w4a16'
result, msg = quantization(config, quantization_model_name,
origin_model_name, quantization_type,
cuda_prefix)
log_path = config.get('log_path')
quantization_log = os.path.join(
log_path, '_'.join([
'quantization', quantization_type,
quantization_model_name.split('/')[1]
]) + '.log')
allure.attach.file(quantization_log,
attachment_type=allure.attachment_type.TEXT)
assert result, msg
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment