Unverified Commit c3bfbd20 authored by zhongdaor-nv's avatar zhongdaor-nv Committed by GitHub
Browse files

chore: add top_p=1 recommendation for gpt-oss to avoid token repetition (#5550)


Signed-off-by: default avatarzhongdaor <zhongdaor@nvidia.com>
parent 35323da2
...@@ -509,6 +509,22 @@ flowchart TD ...@@ -509,6 +509,22 @@ flowchart TD
- Check Docker daemon is running with GPU support - Check Docker daemon is running with GPU support
- Ensure sufficient disk space for model weights and container images - Ensure sufficient disk space for model weights and container images
5. **Token Repetition / Generation Won't Stop**
- When using `reasoning_effort: high`, the model may produce repeated tokens and fail to stop
- **Solution**: Set `top_p=1` in your request. These are the [recommended sampling parameters from OpenAI](https://huggingface.co/openai/gpt-oss-120b/discussions/21)
- Example request with recommended parameters:
```bash
curl localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "openai/gpt-oss-120b",
"messages": [{"role": "user", "content": "Hello"}],
"chat_template_args": {
"reasoning_effort": "high"
},
"top_p": 1,
"max_tokens": 300
}'
```
## Next Steps ## Next Steps
- **Production Deployment**: For multi-node deployments, see the [Multi-node Guide](https://github.com/ai-dynamo/dynamo/tree/main/examples/basics/multinode/README.md) - **Production Deployment**: For multi-node deployments, see the [Multi-node Guide](https://github.com/ai-dynamo/dynamo/tree/main/examples/basics/multinode/README.md)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment