1. 28 Feb, 2024 1 commit
    • drbh's avatar
      Support tools (#1587) · 9b6db5f7
      drbh authored
      This work in progress PR begins to add support for tools. Tools relies
      on grammar support and still has some unsolved challenges. Opening the
      PR for visibility and feedback
      9b6db5f7
  2. 22 Feb, 2024 1 commit
  3. 21 Feb, 2024 2 commits
  4. 20 Feb, 2024 1 commit
  5. 15 Feb, 2024 2 commits
    • Aaron Mihalik's avatar
      Added `name` field to OpenAI compatible API Messages (#1563) · c55abac3
      Aaron Mihalik authored
      # What does this PR do?
      
      Literally just adds the name field to the Message class.
      
      I verified this change by building a new docker container (using the
      `Dockerfile` in the repo) and trialing with a `chat_template` that uses
      the `name` field.
      
      Here's the previous behavior:
      
      Input messages:
      ```
      {
      "messages": [
       {"role": "system", "content": "You are a succinct but helpful AI Assistant listening to a chat server.  Address everyone by @<username>"},
       {"role": "user", "name": "Aaron", "content": "Hello There!"},
       {"role": "assistant", "content": "  Hello @aaron! How can I assist you today?"},
       {"role": "user", "name": "Sally", "content": "Hiya everyone.  Is @aaron is this room?"}
      ],
        "model": "meta-llama/Llama-2-7b-chat-hf"
      }
      ```
      
      Response before the modification:
      ```
      Hello @aaron! Yes, you are in the chat room. How can I assist you today? 😊
      
      Hiya everyone! *waves* It's great to see you all here. Is there something on your mind that you'd like to talk about or ask? I'm here to listen and help in any way I can. 🤖
      ```
      
      Response after my modification:
      ```
      Hello @Sally! Yes, @aaron is currently in the chat room. How may I assist you today?
      ```
      
      Fixes #1558 
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [x] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation
      
      ).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      @Narsil
      
      ---------
      Co-authored-by: default avatarAaron Mihalik <aaron.mihalik@parsons.us>
      Co-authored-by: default avatardrbh <david.richard.holtz@gmail.com>
      c55abac3
    • drbh's avatar
      Outlines guided generation (#1539) · cef0553d
      drbh authored
      This WIP PR starts to add grammar support via outlines, currently this
      PR supports very simple regex grammars and does not optimize for
      precompiling or caching grammar fsm's.
      
      todo:
      - [X] add simple outlines guidance to `NextTokenChooser`
      - [X] update protos for grammar
      - [X] update generation params API
      - [X] constrain simple grammar
      - [ ] support parsing more complex grammar into fsm
      - [ ] support all outline support grammar types
      - [ ] explore optimizations to avoid recompiling grammars
      
      guided request
      ```bash
      curl -s 'http://localhost:3000/generate' \
      --header 'Content-Type: application/json' \
      --data-raw '{
          "inputs": "make an email for david: \n",
          "parameters": {
              "max_new_tokens": 6,
              "grammar": "[\\w-]+@([\\w-]+\\.)+[\\w-]+"
          }
      }' | jq
      ```
      response
      ```json
      {
        "generated_text": "david@example.com"
      }
      ```
      
      unguided request
      ```bash
      curl -s 'http://localhost:3000/generate' \
      --header 'Content-Type: application/json' \
      --data '{
          "inputs": "make an email for david: \n",
          "parameters": {
              "max_new_tokens": 6
          }
      }' | jq
      ```
      response
      ```json
      {
        "generated_text": "    email = 'david"
      }
      ```
      cef0553d
  6. 13 Feb, 2024 1 commit
  7. 09 Feb, 2024 1 commit
  8. 08 Feb, 2024 1 commit
  9. 07 Feb, 2024 1 commit
  10. 01 Feb, 2024 1 commit
    • drbh's avatar
      fix: tokenizer config should use local model path when possible (#1518) · ee1cf51c
      drbh authored
      
      
      This PR fixes the issue with loading a local tokenizer config.
      Previously the default functionality would look in the current working
      directory. Now if a local model path is specified we will check that
      directory for the tokenizer_config.
      
      ## Examples of valid commands
      
      uses tokenizer_config from hub
      ```
      text-generation-launcher --model-id HuggingFaceH4/zephyr-7b-beta
      ```
      
      use tokenizer_config from local model path
      ```
      text-generation-launcher \
        --model-id ~/.cache/huggingface/hub/models--HuggingFaceH4--zephyr-7b-beta/snapshots/dc24cabd13eacd3ae3a5fe574bd645483a335a4a/
      ```
      
      use specific tokenizer_config file
      ```
       text-generation-launcher \
        --model-id ~/.cache/huggingface/hub/models--HuggingFaceH4--zephyr-7b-beta/snapshots/dc24cabd13eacd3ae3a5fe574bd645483a335a4a/ \
        --tokenizer-config-path ~/.cache/huggingface/hub/models--HuggingFaceH4--zephyr-7b-beta/snapshots/dc24cabd13eacd3ae3a5fe574bd645483a335a4a/tokenizer_config.json
      
      
      ```
      
      ---------
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      ee1cf51c
  11. 26 Jan, 2024 2 commits
    • Nicolas Patry's avatar
      ebecc061
    • Nicolas Patry's avatar
      Trying to fix that flaky test. (#1491) · 4c7315dd
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      4c7315dd
  12. 25 Jan, 2024 1 commit
    • Nicolas Patry's avatar
      Add a new `/tokenize` route to get the tokenized input (#1471) · 86c8335f
      Nicolas Patry authored
      # What does this PR do?
      
      
      Ideally this is done client side, but this is a recurring request,
      therefore we implemented it.
      
      - Runs only if rust tokenizer is present (not encumbering the main
      inference pipeline is important).
      - Returns simple results, ID, text (gotten with offsets from the
      original string) and offsets (so users can do things like highlighting
      text).
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      86c8335f
  13. 23 Jan, 2024 1 commit
    • Jacob Keisling's avatar
      Disable `decoder_input_details` on OpenAI-compatible chat streaming, pass temp... · 82f87ada
      Jacob Keisling authored
      Disable `decoder_input_details` on OpenAI-compatible chat streaming, pass temp and top-k from API (#1470)
      
      This PR makes some minor tweaks to the new OpenAI-compatible chat
      endpoint #1427 in `GenerateParameters`:
      - Disables `decoder_input_details` when streaming is enabled. This was
      causing all streaming chat requests to fail before, since
      [`decoder_input_details`==true is not enabled when streaming
      tokens](https://github.com/huggingface/text-generation-inference/blob/98e5faff9daec6170cc2b0f963f2d73cf846b341/router/src/validation.rs#L406).
      - Passes through `temperature` and `top_p` hyperparameters from the API
      request to `GenerateParameters`
      
      ## Testing
      
      ```bash
      curl localhost:8080/v1/chat/completions \
          -X POST \
          -d '{
        "model": "",
        "messages": [
          {
            "role": "system",
            "content": "You are a helpful assistant."
          },
          {
            "role": "user",
            "content": "What is deep learning?"
          }
        ],
        "stream": true, 
        "max_tokens": 20
      }' \                                   
          -H 'Content-Type: application/json'
      ```
      
      Should work correctly. Currently, most recent release from `main`
      returns error:
      ```
      data:{"error":"Input validation error: `decoder_input_details` == true is not supported when streaming tokens","error_type":"validation"}
      ```
      
      It's my first time contributing to this project, so I could be missing
      something. Would especially appreciate @drbh's eyes on this one
      82f87ada
  14. 18 Jan, 2024 1 commit
    • drbh's avatar
      feat: support raise_exception, bos and eos tokens (#1450) · 3ccb3bb0
      drbh authored
      This PR adds support to handle the custom jinja function
      `raise_exception` and passes the `bos` and `eos` tokens into the
      template
      
      Additionally this PR adds 3 tests to validate and show examples of what
      can and cannot be parsed currently.
      
      ```bash
      cargo test --package text-generation-router --lib -- infer::tests --nocapture
      #     Finished test [unoptimized + debuginfo] target(s) in 7.82s
      #      Running unittests src/lib.rs (target/debug/deps/text_generation_router-18a0bbf99c2ca1b4)
      
      # running 3 tests
      # test infer::tests::test_chat_template_valid_with_raise ... ok
      # test infer::tests::test_chat_template ... ok
      # test infer::tests::test_chat_template_invalid_with_raise ... ok
      
      # test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 15 filtered out; finished in 0.00s
      ```
      3ccb3bb0
  15. 16 Jan, 2024 1 commit
    • drbh's avatar
      feat: supports openai chat completions API (#1427) · 0eabc835
      drbh authored
      This PR adds support to make TGI a drop in replacement for OpenAI
      clients by exposing the same HTTP interface.
      
      Notes
      - TGI inits a single model at startup so the `model` field is unused in
      HTTP requests.
      - `max_tokens` and `stream` should work as expected but other params may
      be (unimplemented or not supported)
      
      General approach
      - fetch the `tokenizer_config` at startup from the hub
      - pass `tokenizer_config` into `Infer` so we have it at request time
      - use the `chat_template` on the config to format chat request
      - parse jinja template and render chat string
      - pass inputs into existing generate function
      - wrap generation output in expected structure before returning
      
      # How to test
      
      ### Streaming curl
      ```bash
      curl localhost:3000/v1/chat/completions \
          -X POST \
          -d '{
        "model": "tgi",
        "messages": [
          {
            "role": "system",
            "content": "You are a helpful assistant."
          },
          {
            "role": "user",
            "content": "What is deep learning?"
          }
        ],
        "stream": true,
        "max_tokens": 20
      }' \
          -H 'Content-Type: application/json'
      ```
      
      
      It is also possible to use the `openai` python library and change the
      base url
      
      ###  🌊 STREAMING REQUEST
      ```python
      from openai import OpenAI
      
      # init the client but point it to TGI
      client = OpenAI(
          base_url="http://localhost:3000/v1",
          api_key="not needed for a local LLM"
      )
      
      chat_completion = client.chat.completions.create(
          model="tgi",
          messages=[
              {"role": "system", "content": "You are a helpful assistant." },
              {"role": "user", "content": "What is deep learning?"}
          ],
          stream=True
      )
      
      # iterate and print stream
      for message in chat_completion:
          print(message)
      
      # ChatCompletionChunk(id='', choices=[Choice(delta=ChoiceDelta(content=' that', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=2, logprobs=None)], created=1704486761, model='', object='text_completion', system_fingerprint='')
      ```
      
      ### 🚗 SYNCHRONOUS REQUEST
      ```python
      from openai import OpenAI
      
      # init the client but point it to TGI
      client = OpenAI(
          base_url="http://localhost:3000/v1",
          api_key="not needed for a local LLM"
      )
      
      chat_completion = client.chat.completions.create(
          model="tgi",
          messages=[
              {"role": "system", "content": "You are a helpful assistant." },
              {"role": "user", "content": "What is deep learning?"}
          ],
          stream=False
      )
      
      print(chat_completion)
      # ChatCompletion(id='', choices=[Choice(finish_reason=None, index=0, logprobs=None, message=ChatCompletionMessage(content='\nDeep learning is a new field of research that has been gaining traction in the last ...', role='assistant', function_call=None, tool_calls=None))], created=1704486762, model='', object='text_completion', system_fingerprint='', usage=CompletionUsage(completion_tokens=100, prompt_tokens=76, total_tokens=176))
      ```
      
      
      ## How to run dev
      
      ```bash
      cd text-generation-inference/server
      MASTER_ADDR=127.0.0.1 MASTER_PORT=5555 text-generation-server serve --trust-remote-code gpt2
      ```
      
      ***note many of the existing `chat_templates` use non standard `jinja`
      (ie. adding a `raise` to the template) which will throw an error when
      parsing; hence using `upstage/SOLAR-10.7B-Instruct-v1.0` since it has a
      valid template
      ```bash
      cd text-generation-inference/router
      cargo run -- --tokenizer-name upstage/SOLAR-10.7B-Instruct-v1.0
      ```
      
      trigger
      ```bash
      curl localhost:3000/v1/chat/completions \
          -X POST \
          -d '{ "model": "gpt-3.5-turbo", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "What is the IP address of the Google DNS servers?" } ], "stream": true, "max_tokens": 20, "logprobs": true }' \
          -H 'Content-Type: application/json'
      ```
      
      ^ supports `stream: true` and `stream: false` requests
      0eabc835
  16. 13 Dec, 2023 1 commit
  17. 25 Nov, 2023 1 commit
    • Nicolas Patry's avatar
      Exllama v2 (#1211) · ed2a3f61
      Nicolas Patry authored
      # What does this PR do?
      
      See #1165
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation
      
      ).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      
      ---------
      Co-authored-by: default avatarFlorian Zimmermeister <flozi00.fz@gmail.com>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-24-153.ec2.internal>
      ed2a3f61
  18. 23 Nov, 2023 1 commit
  19. 04 Oct, 2023 1 commit
    • Nicolas Patry's avatar
      Modify the default for `max_new_tokens`. (#1097) · 6df43da0
      Nicolas Patry authored
      # What does this PR do?
      
      Now clients which do not specify a max_length will be implying
      `max_new_tokens = max_total_tokens - input_length`.
      This is a serious change, but which seems more in line with what users
      expect from standing server.
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation
      
      ).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      
      ---------
      Co-authored-by: default avatarOlivierDehaene <olivier@huggingface.co>
      6df43da0
  20. 28 Aug, 2023 1 commit
    • Nicolas Patry's avatar
      Rebased #617 (#868) · 211b54ac
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation
      
      ).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      
      ---------
      Co-authored-by: default avatarVincent Brouwers <vincent.brouwers@ing.com>
      211b54ac
  21. 05 Jun, 2023 1 commit
  22. 02 Jun, 2023 1 commit
  23. 02 May, 2023 1 commit
  24. 26 Apr, 2023 2 commits
  25. 25 Apr, 2023 1 commit
  26. 21 Apr, 2023 1 commit
  27. 18 Apr, 2023 1 commit
  28. 09 Mar, 2023 3 commits
  29. 07 Mar, 2023 1 commit
  30. 02 Mar, 2023 1 commit
  31. 28 Feb, 2023 1 commit
  32. 27 Feb, 2023 1 commit
  33. 24 Feb, 2023 1 commit
  34. 03 Feb, 2023 1 commit