Commit 53129bac authored by shihm's avatar shihm
Browse files

upodata readme

parent a9ee04b7
...@@ -56,6 +56,42 @@ docker run -it \ ...@@ -56,6 +56,42 @@ docker run -it \
## 推理 ## 推理
### transformers
#### 单机推理
```bash
python
from transformers import AutoTokenizer, AutoModelForCausalLM
import os
import torch
os.environ['TRANSFORMERS_OFFLINE'] = '1'
os.environ['MODELSCOPE_OFFLINE'] = '1'
model_path = "/path/to/Baichuan-M3-235B"
model = AutoModelForCausalLM.from_pretrained(
model_path,
trust_remote_code=True,
device_map="auto",
torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
messages = [{"role": "user", "content": "I've been having headaches lately, especially worse in the afternoon. What should I do?"}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
thinking_mode='on'
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=32768,
temperature=0.6
)
response = tokenizer.decode(generated_ids[0][len(model_inputs.input_ids[0]):], skip_special_tokens=True)
print(response)
```
### vllm ### vllm
#### 多机推理 #### 多机推理
...@@ -145,43 +181,6 @@ curl http://localhost:8000/v1/chat/completions \ ...@@ -145,43 +181,6 @@ curl http://localhost:8000/v1/chat/completions \
### transformers
#### 单机推理
```bash
python
from transformers import AutoTokenizer, AutoModelForCausalLM
import os
import torch
os.environ['TRANSFORMERS_OFFLINE'] = '1'
os.environ['MODELSCOPE_OFFLINE'] = '1'
model_path = "/path/to/Baichuan-M3-235B"
model = AutoModelForCausalLM.from_pretrained(
model_path,
trust_remote_code=True,
device_map="auto",
torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
messages = [{"role": "user", "content": "I've been having headaches lately, especially worse in the afternoon. What should I do?"}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
thinking_mode='on'
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=32768,
temperature=0.6
)
response = tokenizer.decode(generated_ids[0][len(model_inputs.input_ids[0]):], skip_special_tokens=True)
print(response)
```
### 精度 ### 精度
`DCU与GPU精度一致,推理框架:vllm,transformers` `DCU与GPU精度一致,推理框架:vllm,transformers`
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment