• Jacob Keisling's avatar
    Disable `decoder_input_details` on OpenAI-compatible chat streaming, pass temp... · 82f87ada
    Jacob Keisling authored
    Disable `decoder_input_details` on OpenAI-compatible chat streaming, pass temp and top-k from API (#1470)
    
    This PR makes some minor tweaks to the new OpenAI-compatible chat
    endpoint #1427 in `GenerateParameters`:
    - Disables `decoder_input_details` when streaming is enabled. This was
    causing all streaming chat requests to fail before, since
    [`decoder_input_details`==true is not enabled when streaming
    tokens](https://github.com/huggingface/text-generation-inference/blob/98e5faff9daec6170cc2b0f963f2d73cf846b341/router/src/validation.rs#L406).
    - Passes through `temperature` and `top_p` hyperparameters from the API
    request to `GenerateParameters`
    
    ## Testing
    
    ```bash
    curl localhost:8080/v1/chat/completions \
        -X POST \
        -d '{
      "model": "",
      "messages": [
        {
          "role": "system",
          "content": "You are a helpful assistant."
        },
        {
          "role": "user",
          "content": "What is deep learning?"
        }
      ],
      "stream": true, 
      "max_tokens": 20
    }' \                                   
        -H 'Content-Type: application/json'
    ```
    
    Should work correctly. Currently, most recent release from `main`
    returns error:
    ```
    data:{"error":"Input validation error: `decoder_input_details` == true is not supported when streaming tokens","error_type":"validation"}
    ```
    
    It's my first time contributing to this project, so I could be missing
    something. Would especially appreciate @drbh's eyes on this one
    82f87ada
server.rs 37.1 KB