"vscode:/vscode.git/clone" did not exist on "5b11c5dc779b7e42022d8353b1b1aa6fb9b758f3"
  1. 03 Jul, 2024 1 commit
  2. 01 Jul, 2024 1 commit
    • drbh's avatar
      fix: prefer serde structs over custom functions (#2127) · 9eefb2f6
      drbh authored
      
      
      * fix: prefer enum for chat object
      
      * fix: adjust typo
      
      * fix: enum CompletionType not ObjectType
      
      * fix: adjust typo
      
      * feat: leverage serde for conditional deser
      
      * fix: adjust HubTokenizerConfig after rebase
      
      * fix: update create_post_processor logic for token type
      
      * fix: adjust unwrap syntax in template
      
      * Fixing the post processor.
      
      ---------
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      9eefb2f6
  3. 27 Jun, 2024 2 commits
  4. 25 Jun, 2024 2 commits
    • drbh's avatar
      Enable multiple LoRa adapters (#2010) · 04e1af94
      drbh authored
      
      
      * feat: first draft load multiple lora
      
      * feat: load weights within layer and refactor lora pass
      
      * fix: refactor and reduce lora math
      
      * feat: baseline impl single request multi lora support
      
      * feat: prefer lorax implementation and port loading logic
      
      * fix: prefer adapter_data and refactors
      
      * feat: perfer loraxs custom punica kernels and add mlp loras
      
      * fix: adjust batch for bgmv
      
      * fix: adjust adapter_segments logic when in batch
      
      * fix: refactor and move changes to v3 proto
      
      * fix: pass model_id for all flash causal lms
      
      * fix: pass model_id for all causal and seq2seq lms
      
      * fix: add model_id to model test
      
      * feat: add lora support to mistral and refactors
      
      * feat: prefer model id in request
      
      * fix: include rust code for adapter id
      
      * feat: bump launcher and add new lora docs
      
      * feat: support base model generation and refactors
      
      * fix: rename doc to retry ci build
      
      * feat: support if vlm models
      
      * fix: add adapter_data param and avoid missing layers
      
      * fix: add adapter_data param to phi and neox
      
      * fix: update all models forwards to include adapter_data
      
      * fix: add model_id to IdeficsCausalLM
      
      * Update lora.md
      
      Fixed a typo
      
      * Update lora.md
      
      Fixing spam image
      
      * fix: add lora kernel to dockerfile, support running without kernels and refactors
      
      * fix: avoid dockerfile conflict
      
      * fix: refactors and adjust flash llama lora logic
      
      * fix: skip llama test due to CI issue (temp)
      
      * fix: skip llama test CI (temp) 2
      
      * fix: revert skips and prefer updated ci token for tests
      
      * fix: refactors and helpful comments
      
      * fix: add noop in TensorParallelAdapterRowLinear too
      
      * fix: refactor and move shard_lora_weights logic
      
      * fix: exit early if no adapter_data
      
      ---------
      Co-authored-by: default avatarDerek <datavistics@gmail.com>
      04e1af94
    • sunxichen's avatar
      fix ChatCompletion and ChatCompletionChunk object string not compatible with... · b69f0780
      sunxichen authored
      
      fix ChatCompletion and ChatCompletionChunk object string not compatible with standard openai api (#2089)
      Co-authored-by: default avatarsunxichen <sun.xc@digitalcnzz.com>
      b69f0780
  5. 13 Jun, 2024 1 commit
    • drbh's avatar
      implement Open Inference Protocol endpoints (#1942) · f433f1f7
      drbh authored
      * feat: add kserve feature and basic routes
      
      * feat: implement infer endpoint wrapper around generate
      
      * fix: refactor and improve types
      
      * fix: improve infer and simplify
      
      * fix: cleanup and improve api docs
      
      * fix: refactor and encapsulate kserve feat in file
      
      * fix: remove typos after rebase
      f433f1f7
  6. 11 Jun, 2024 1 commit
  7. 04 Jun, 2024 1 commit
    • OlivierDehaene's avatar
      feat: add SchedulerV3 (#1996) · 757223b3
      OlivierDehaene authored
      - Refactor code to allow supporting multiple versions of the
      generate.proto at the same time
      - Add v3/generate.proto (ISO to generate.proto for now but allow for
      future changes without impacting v2 backends)
      - Add Schedule trait to abstract queuing and batching mechanisms that
      will be different in the future
      - Add SchedulerV2/V3 impl
      757223b3
  8. 27 May, 2024 1 commit
    • drbh's avatar
      Processor config chat template (#1954) · 0732b9d2
      drbh authored
      This PR loads the `processor_config` similar to the `tokenizer_config`
      and uses the processor_config's chat_template if the tokenizer_config
      does not include one. These changes enable chat with idefics2
      0732b9d2
  9. 23 May, 2024 1 commit
  10. 17 May, 2024 1 commit
    • Nicolas Patry's avatar
      Removing some unused code. (#1915) · a60fa840
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      a60fa840
  11. 16 May, 2024 3 commits
    • Nicolas Patry's avatar
      Types. (#1909) · b3dd3902
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      b3dd3902
    • Nicolas Patry's avatar
      Fixing types. (#1906) · f5d43414
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      f5d43414
    • phangiabao98's avatar
      OpenAI function calling compatible support (#1888) · d8402eaf
      phangiabao98 authored
      # What does this PR do?
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      https://github.com/huggingface/text-generation-inference/issues/1887
      
      ## Before submitting
      - [no ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [yes] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ yes] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [yes ] Did you make sure to update the documentation with your
      changes? Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation
      
      ).
      - [ yes] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      @Narsil
      
       -->
      
      ---------
      Co-authored-by: default avatarBao Phan <baopg@inter-k.com>
      d8402eaf
  12. 03 May, 2024 1 commit
  13. 30 Apr, 2024 1 commit
    • drbh's avatar
      Handle images in chat api (#1828) · c99ecd77
      drbh authored
      This PR allows for messages to be formatted as simple strings, or as an
      array of objects including image urls. This is done by formatting
      content arrays into a simple string.
      
      Example using `llava-hf/llava-v1.6-mistral-7b-hf` 
      
      ```bash
      curl localhost: 3000/v1/chat/completions \
      -X POST \
      -H 'Content-Type: application/json' \
      -d '{
          "model": "tgi",
          "messages": [
              {
                  "role": "user",
                  "content": [
                      {
                          "type": "text",
                          "text": "Whats in this image?"
                      },
                      {
                          "type": "image_url",
                          "image_url": {
                              "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png"
                          }
                      }
                  ]
              }
          ],
          "stream": false,
          "max_tokens": 20,
          "seed": 42
      }'
      ```
      
      is equivlant to this more simple request
      
      ```bash
      curl localhost: 3000/v1/chat/completions \
      -X POST \
      -H 'Content-Type: application/json' \
      -d '{
          "model": "tgi",
          "messages": [
              {
                  "role": "user",
                  "content": "Whats in this image?\n![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png)
      
      "
              }
          ],
          "stream": false,
          "max_tokens": 20,
          "seed": 42
      }'
      ```
      
      output
      ```
      # {"id":"","object":"text_completion","created":1714406985,"model":"llava-hf/llava-v1.6-mistral-7b-hf","system_fingerprint":"2.0.1-native","choices":[{"index":0,"message":{"role":"assistant","content":" This is an illustration of an anthropomorphic rabbit in a spacesuit, standing on what"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":2945,"completion_tokens":20,"total_tokens":2965}}%
      ```
      
      ---------
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      c99ecd77
  14. 25 Apr, 2024 1 commit
    • Nicolas Patry's avatar
      Use the generation config. (#1808) · ee47973a
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      ee47973a
  15. 23 Apr, 2024 3 commits
  16. 18 Apr, 2024 1 commit
  17. 17 Apr, 2024 1 commit
  18. 16 Apr, 2024 1 commit
  19. 10 Apr, 2024 1 commit
  20. 09 Apr, 2024 1 commit
    • Nicolas Patry's avatar
      Adding Llava-Next (Llava 1.6) with full support. (#1709) · 4634b00c
      Nicolas Patry authored
      # What does this PR do?
      
      - Changed all models to extract `embed_tokens` in order to enable llava
      to separately call the embeddings and the core model layers.
      - Added VlmCausalLM to inherit from FlashMistral in order to be
      maximally supported. The only added logics sits on top and parses images
      into pixel values, preallocates input_ids space for the image
      embeddings, and passes them for the model.
      - Added Clip for the vision tower.
      - Didn't add flash for the vision tower since there's no padding anyway.
      - Added heuristic (potentially incomplete) to calculate number of
      features *before* calculating the clip patches (allows for easier logic
      reuse of the LLM under the hood).
      
      
      Still needs to be done:
      
      - [x] Implement the image parsing in the controller side, to avoid
      downloading n times per TP shard and also refusing requests too large
      early and avoid issues where the truncation actually truncates the
      image.
      - [ ] Make sure it works with quantization properly.
      - [x] Make sure it works with TP>1
      
      
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      4634b00c
  21. 28 Mar, 2024 1 commit
    • drbh's avatar
      fix: adjust logprob response logic (#1682) · 818aee37
      drbh authored
      This PR fixes a bug with `ChatCompletionLogprobs` where if
      `top_tokens.len() == 0` empty results were returned.
      
      ```bash
       curl http://localhost:3000/v1/chat/completions \
          -X POST \
          -H 'Content-Type: application/json' \
          -d '{
        "model": "tgi",
        "logprobs": true,
        "messages": [
          {
            "role": "user",
            "content": "What is deep learning?"
          }
        ],
        "stream": false,
        "max_tokens": 20
      }'
      ```
      
      response
      
      
      ```json
      {"id":"","object":"text_completion","created":1711588522,"model":"google/gemma-2b-it","system_fingerprint":"1.4.4-native","choices":[{"index":0,"message":{"role":"assistant","content":"**Deep learning** is a subset of machine learning (ML) that emphasizes the creation of **artificial"},"logprobs":{"content":[{"token":"**","logprob":-0.22558594,"top_logprobs":[]},{"token":"Deep","logprob":-0.0014877319,"top_logprobs":[]},{"token":" learning","logprob":-0.12695312,"top_logprobs":[]},{"token":"**","logprob":-0.055664062,"top_logprobs":[]},{"token":" is","logprob":-0.00090026855,"top_logprobs":[]},{"token":" a","logprob":-0.006072998,"top_logprobs":[]},{"token":" subset","logprob":-2.25,"top_logprobs":[]},{"token":" of","logprob":-0.00031089783,"top_logprobs":[]},{"token":" machine","logprob":-0.091308594,"top_logprobs":[]},{"token":" learning","logprob":-0.00002348423,"top_logprobs":[]},{"token":" (","logprob":-1.671875,"top_logprobs":[]},{"token":"ML","logprob":-0.00040626526,"top_logprobs":[]},{"token":")","logprob":-0.00016212463,"top_logprobs":[]},{"token":" that","logprob":-0.13769531,"top_logprobs":[]},{"token":" emphasizes","logprob":-4.03125,"top_logprobs":[]},{"token":" the","logprob":-0.2890625,"top_logprobs":[]},{"token":" creation","logprob":-3.109375,"top_logprobs":[]},{"token":" of","logprob":-0.00024032593,"top_logprobs":[]},{"token":" **","logprob":-1.2265625,"top_logprobs":[]},{"token":"artificial","logprob":-0.10546875,"top_logprobs":[]}]},"finish_reason":"length"}],"usage":{"prompt_tokens":15,"completion_tokens":20,"total_tokens":35}}
      ```
      818aee37
  22. 21 Mar, 2024 1 commit
    • drbh's avatar
      fix: improve tool type, bump pydantic and outlines (#1650) · de6cb15f
      drbh authored
      This PR resolves a couple 
      
      - [X] adjusts the tool response to align with openai's tools response
      type
      - [X] bumps pydantic to `2.6.4` in all apps (resolves dependency issue
      when running tests)
      - [X] bump `outlines` version and fix import for new name
      de6cb15f
  23. 16 Mar, 2024 1 commit
    • Lucain's avatar
      Fix index in ChatCompletionChunk (#1648) · 23fba672
      Lucain authored
      Fix a small inconsistency compared the OpenAI's chat-completion behavior
      (introduced in
      https://github.com/huggingface/text-generation-inference/pull/1427 cc
      @drbh). When using `stream=True`, each chunk has an `index` value in
      `ChatCompletionChoice`. This index is not meant to be the index of the
      generated token but the index of the choice, which is always 0 (since
      TGI always return a single choice).
      
      See https://platform.openai.com/docs/api-reference/chat/object:
      > index _integer_
      > The index of the choice in the list of choices.
      
      ---
      
      So instead of 
      
      ```js
      data:{"id":"","object":"text_completion","created":1710508199,"model":"HuggingFaceH4/zephyr-7b-beta","system_fingerprint":"1.4.3-sha-e6bb3ff8","choices":[{"index":1,"delta":{"role":"assistant","content":"I"},"logprobs":null,"finish_reason":null}]}
      data:{"id":"","object":"text_completion","created":1710508199,"model":"HuggingFaceH4/zephyr-7b-beta","system_fingerprint":"1.4.3-sha-e6bb3ff8","choices":[{"index":2,"delta":{"role":"assistant","content":"'"},"logprobs":null,"finish_reason":null}]}
      data:{"id":"","object":"text_completion","created":1710508199,"model":"HuggingFaceH4/zephyr-7b-beta","system_fingerprint":"1.4.3-sha-e6bb3ff8","choices":[{"index":3,"delta":{"role":"assistant","content":"m"},"logprobs":null,"finish_reason":"length"}]}
      ```
      
      if should return
      ```js
      data:{"id":"","object":"text_completion","created":1710508199,"model":"HuggingFaceH4/zephyr-7b-beta","system_fingerprint":"1.4.3-sha-e6bb3ff8","choices":[{"index":0,"delta":{"role":"assistant","content":"I"},"logprobs":null,"finish_reason":null}]}
      data:{"id":"","object":"text_completion","created":1710508199,"model":"HuggingFaceH4/zephyr-7b-beta","system_fingerprint":"1.4.3-sha-e6bb3ff8","choices":[{"index":0,"delta":{"role":"assistant","content":"'"},"logprobs":null,"finish_reason":null}]}
      data:{"id":"","object":"text_completion","created":1710508199,"model":"HuggingFaceH4/zephyr-7b-beta","system_fingerprint":"1.4.3-sha-e6bb3ff8","choices":[{"index":0,"delta":{"role":"assistant","content":"m"},"logprobs":null,"finish_reason":"length"}]}
      ```
      
      **EDIT:** I also edited ToolCall.index to be always `0` (instead of the
      generated token index) but for this one I'm actually unsure. It might be
      the index of the tool in the array of tools? OpenAI's documentation
      doesn't provide any information about it:
      > index _integer_
      
      ---
      
      I also noticed that in OpenAI's example, the last chunk doesn't have a
      delta and is the only one that has a `finish_reason` returning. TGI is
      slightly different since the last chunk has both the last delta (i.e.
      the last generated token) + the finish reason. I don't think this is
      worth fixing since it is not a requirement according to the docs/specs
      (at least not that I know of).
      23fba672
  24. 01 Mar, 2024 1 commit
  25. 29 Feb, 2024 1 commit
  26. 28 Feb, 2024 1 commit
    • drbh's avatar
      Support tools (#1587) · 9b6db5f7
      drbh authored
      This work in progress PR begins to add support for tools. Tools relies
      on grammar support and still has some unsolved challenges. Opening the
      PR for visibility and feedback
      9b6db5f7
  27. 22 Feb, 2024 1 commit
  28. 21 Feb, 2024 2 commits
  29. 20 Feb, 2024 1 commit
  30. 15 Feb, 2024 2 commits
    • Aaron Mihalik's avatar
      Added `name` field to OpenAI compatible API Messages (#1563) · c55abac3
      Aaron Mihalik authored
      # What does this PR do?
      
      Literally just adds the name field to the Message class.
      
      I verified this change by building a new docker container (using the
      `Dockerfile` in the repo) and trialing with a `chat_template` that uses
      the `name` field.
      
      Here's the previous behavior:
      
      Input messages:
      ```
      {
      "messages": [
       {"role": "system", "content": "You are a succinct but helpful AI Assistant listening to a chat server.  Address everyone by @<username>"},
       {"role": "user", "name": "Aaron", "content": "Hello There!"},
       {"role": "assistant", "content": "  Hello @aaron! How can I assist you today?"},
       {"role": "user", "name": "Sally", "content": "Hiya everyone.  Is @aaron is this room?"}
      ],
        "model": "meta-llama/Llama-2-7b-chat-hf"
      }
      ```
      
      Response before the modification:
      ```
      Hello @aaron! Yes, you are in the chat room. How can I assist you today? 😊
      
      Hiya everyone! *waves* It's great to see you all here. Is there something on your mind that you'd like to talk about or ask? I'm here to listen and help in any way I can. 🤖
      ```
      
      Response after my modification:
      ```
      Hello @Sally! Yes, @aaron is currently in the chat room. How may I assist you today?
      ```
      
      Fixes #1558 
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [x] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation
      
      ).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      @Narsil
      
      ---------
      Co-authored-by: default avatarAaron Mihalik <aaron.mihalik@parsons.us>
      Co-authored-by: default avatardrbh <david.richard.holtz@gmail.com>
      c55abac3
    • drbh's avatar
      Outlines guided generation (#1539) · cef0553d
      drbh authored
      This WIP PR starts to add grammar support via outlines, currently this
      PR supports very simple regex grammars and does not optimize for
      precompiling or caching grammar fsm's.
      
      todo:
      - [X] add simple outlines guidance to `NextTokenChooser`
      - [X] update protos for grammar
      - [X] update generation params API
      - [X] constrain simple grammar
      - [ ] support parsing more complex grammar into fsm
      - [ ] support all outline support grammar types
      - [ ] explore optimizations to avoid recompiling grammars
      
      guided request
      ```bash
      curl -s 'http://localhost:3000/generate' \
      --header 'Content-Type: application/json' \
      --data-raw '{
          "inputs": "make an email for david: \n",
          "parameters": {
              "max_new_tokens": 6,
              "grammar": "[\\w-]+@([\\w-]+\\.)+[\\w-]+"
          }
      }' | jq
      ```
      response
      ```json
      {
        "generated_text": "david@example.com"
      }
      ```
      
      unguided request
      ```bash
      curl -s 'http://localhost:3000/generate' \
      --header 'Content-Type: application/json' \
      --data '{
          "inputs": "make an email for david: \n",
          "parameters": {
              "max_new_tokens": 6
          }
      }' | jq
      ```
      response
      ```json
      {
        "generated_text": "    email = 'david"
      }
      ```
      cef0553d
  31. 13 Feb, 2024 1 commit
  32. 09 Feb, 2024 1 commit