"vscode:/vscode.git/clone" did not exist on "a9e4cea0cdb350de950b9bccd989fad19f826d8d"
  • Lucain's avatar
    Fix index in ChatCompletionChunk (#1648) · 23fba672
    Lucain authored
    Fix a small inconsistency compared the OpenAI's chat-completion behavior
    (introduced in
    https://github.com/huggingface/text-generation-inference/pull/1427 cc
    @drbh). When using `stream=True`, each chunk has an `index` value in
    `ChatCompletionChoice`. This index is not meant to be the index of the
    generated token but the index of the choice, which is always 0 (since
    TGI always return a single choice).
    
    See https://platform.openai.com/docs/api-reference/chat/object:
    > index _integer_
    > The index of the choice in the list of choices.
    
    ---
    
    So instead of 
    
    ```js
    data:{"id":"","object":"text_completion","created":1710508199,"model":"HuggingFaceH4/zephyr-7b-beta","system_fingerprint":"1.4.3-sha-e6bb3ff8","choices":[{"index":1,"delta":{"role":"assistant","content":"I"},"logprobs":null,"finish_reason":null}]}
    data:{"id":"","object":"text_completion","created":1710508199,"model":"HuggingFaceH4/zephyr-7b-beta","system_fingerprint":"1.4.3-sha-e6bb3ff8","choices":[{"index":2,"delta":{"role":"assistant","content":"'"},"logprobs":null,"finish_reason":null}]}
    data:{"id":"","object":"text_completion","created":1710508199,"model":"HuggingFaceH4/zephyr-7b-beta","system_fingerprint":"1.4.3-sha-e6bb3ff8","choices":[{"index":3,"delta":{"role":"assistant","content":"m"},"logprobs":null,"finish_reason":"length"}]}
    ```
    
    if should return
    ```js
    data:{"id":"","object":"text_completion","created":1710508199,"model":"HuggingFaceH4/zephyr-7b-beta","system_fingerprint":"1.4.3-sha-e6bb3ff8","choices":[{"index":0,"delta":{"role":"assistant","content":"I"},"logprobs":null,"finish_reason":null}]}
    data:{"id":"","object":"text_completion","created":1710508199,"model":"HuggingFaceH4/zephyr-7b-beta","system_fingerprint":"1.4.3-sha-e6bb3ff8","choices":[{"index":0,"delta":{"role":"assistant","content":"'"},"logprobs":null,"finish_reason":null}]}
    data:{"id":"","object":"text_completion","created":1710508199,"model":"HuggingFaceH4/zephyr-7b-beta","system_fingerprint":"1.4.3-sha-e6bb3ff8","choices":[{"index":0,"delta":{"role":"assistant","content":"m"},"logprobs":null,"finish_reason":"length"}]}
    ```
    
    **EDIT:** I also edited ToolCall.index to be always `0` (instead of the
    generated token index) but for this one I'm actually unsure. It might be
    the index of the tool in the array of tools? OpenAI's documentation
    doesn't provide any information about it:
    > index _integer_
    
    ---
    
    I also noticed that in OpenAI's example, the last chunk doesn't have a
    delta and is the only one that has a `finish_reason` returning. TGI is
    slightly different since the last chunk has both the last delta (i.e.
    the last generated token) + the finish reason. I don't think this is
    worth fixing since it is not a requirement according to the docs/specs
    (at least not that I know of).
    23fba672
lib.rs 31.8 KB