router/src/server.rs · 82f87ada6f08114ae198abb0829d087f311cf5bc · OpenDAS / text-generation-inference

Disable `decoder_input_details` on OpenAI-compatible chat streaming, pass temp... · 82f87ada

Jacob Keisling authored Jan 23, 2024

Disable `decoder_input_details` on OpenAI-compatible chat streaming, pass temp and top-k from API (#1470)

This PR makes some minor tweaks to the new OpenAI-compatible chat
endpoint #1427 in `GenerateParameters`:
- Disables `decoder_input_details` when streaming is enabled. This was
causing all streaming chat requests to fail before, since
[`decoder_input_details`==true is not enabled when streaming
tokens](https://github.com/huggingface/text-generation-inference/blob/98e5faff9daec6170cc2b0f963f2d73cf846b341/router/src/validation.rs#L406).
- Passes through `temperature` and `top_p` hyperparameters from the API
request to `GenerateParameters`

## Testing

```bash
curl localhost:8080/v1/chat/completions \
    -X POST \
    -d '{
  "model": "",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is deep learning?"
    }
  ],
  "stream": true, 
  "max_tokens": 20
}' \                                   
    -H 'Content-Type: application/json'
```

Should work correctly. Currently, most recent release from `main`
returns error:
```
data:{"error":"Input validation error: `decoder_input_details` == true is not supported when streaming tokens","error_type":"validation"}
```

It's my first time contributing to this project, so I could be missing
something. Would especially appreciate @drbh's eyes on this one

82f87ada

server.rs 37.1 KB

Replace server.rs