1. 16 Jan, 2024 1 commit
    • drbh's avatar
      feat: supports openai chat completions API (#1427) · 0eabc835
      drbh authored
      This PR adds support to make TGI a drop in replacement for OpenAI
      clients by exposing the same HTTP interface.
      
      Notes
      - TGI inits a single model at startup so the `model` field is unused in
      HTTP requests.
      - `max_tokens` and `stream` should work as expected but other params may
      be (unimplemented or not supported)
      
      General approach
      - fetch the `tokenizer_config` at startup from the hub
      - pass `tokenizer_config` into `Infer` so we have it at request time
      - use the `chat_template` on the config to format chat request
      - parse jinja template and render chat string
      - pass inputs into existing generate function
      - wrap generation output in expected structure before returning
      
      # How to test
      
      ### Streaming curl
      ```bash
      curl localhost:3000/v1/chat/completions \
          -X POST \
          -d '{
        "model": "tgi",
        "messages": [
          {
            "role": "system",
            "content": "You are a helpful assistant."
          },
          {
            "role": "user",
            "content": "What is deep learning?"
          }
        ],
        "stream": true,
        "max_tokens": 20
      }' \
          -H 'Content-Type: application/json'
      ```
      
      
      It is also possible to use the `openai` python library and change the
      base url
      
      ###  🌊 STREAMING REQUEST
      ```python
      from openai import OpenAI
      
      # init the client but point it to TGI
      client = OpenAI(
          base_url="http://localhost:3000/v1",
          api_key="not needed for a local LLM"
      )
      
      chat_completion = client.chat.completions.create(
          model="tgi",
          messages=[
              {"role": "system", "content": "You are a helpful assistant." },
              {"role": "user", "content": "What is deep learning?"}
          ],
          stream=True
      )
      
      # iterate and print stream
      for message in chat_completion:
          print(message)
      
      # ChatCompletionChunk(id='', choices=[Choice(delta=ChoiceDelta(content=' that', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=2, logprobs=None)], created=1704486761, model='', object='text_completion', system_fingerprint='')
      ```
      
      ### 🚗 SYNCHRONOUS REQUEST
      ```python
      from openai import OpenAI
      
      # init the client but point it to TGI
      client = OpenAI(
          base_url="http://localhost:3000/v1",
          api_key="not needed for a local LLM"
      )
      
      chat_completion = client.chat.completions.create(
          model="tgi",
          messages=[
              {"role": "system", "content": "You are a helpful assistant." },
              {"role": "user", "content": "What is deep learning?"}
          ],
          stream=False
      )
      
      print(chat_completion)
      # ChatCompletion(id='', choices=[Choice(finish_reason=None, index=0, logprobs=None, message=ChatCompletionMessage(content='\nDeep learning is a new field of research that has been gaining traction in the last ...', role='assistant', function_call=None, tool_calls=None))], created=1704486762, model='', object='text_completion', system_fingerprint='', usage=CompletionUsage(completion_tokens=100, prompt_tokens=76, total_tokens=176))
      ```
      
      
      ## How to run dev
      
      ```bash
      cd text-generation-inference/server
      MASTER_ADDR=127.0.0.1 MASTER_PORT=5555 text-generation-server serve --trust-remote-code gpt2
      ```
      
      ***note many of the existing `chat_templates` use non standard `jinja`
      (ie. adding a `raise` to the template) which will throw an error when
      parsing; hence using `upstage/SOLAR-10.7B-Instruct-v1.0` since it has a
      valid template
      ```bash
      cd text-generation-inference/router
      cargo run -- --tokenizer-name upstage/SOLAR-10.7B-Instruct-v1.0
      ```
      
      trigger
      ```bash
      curl localhost:3000/v1/chat/completions \
          -X POST \
          -d '{ "model": "gpt-3.5-turbo", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "What is the IP address of the Google DNS servers?" } ], "stream": true, "max_tokens": 20, "logprobs": true }' \
          -H 'Content-Type: application/json'
      ```
      
      ^ supports `stream: true` and `stream: false` requests
      0eabc835
  2. 13 Dec, 2023 1 commit
  3. 25 Nov, 2023 1 commit
    • Nicolas Patry's avatar
      Exllama v2 (#1211) · ed2a3f61
      Nicolas Patry authored
      # What does this PR do?
      
      See #1165
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation
      
      ).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      
      ---------
      Co-authored-by: default avatarFlorian Zimmermeister <flozi00.fz@gmail.com>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-24-153.ec2.internal>
      ed2a3f61
  4. 23 Nov, 2023 1 commit
  5. 04 Oct, 2023 1 commit
    • Nicolas Patry's avatar
      Modify the default for `max_new_tokens`. (#1097) · 6df43da0
      Nicolas Patry authored
      # What does this PR do?
      
      Now clients which do not specify a max_length will be implying
      `max_new_tokens = max_total_tokens - input_length`.
      This is a serious change, but which seems more in line with what users
      expect from standing server.
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation
      
      ).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      
      ---------
      Co-authored-by: default avatarOlivierDehaene <olivier@huggingface.co>
      6df43da0
  6. 28 Aug, 2023 1 commit
    • Nicolas Patry's avatar
      Rebased #617 (#868) · 211b54ac
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation
      
      ).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      
      ---------
      Co-authored-by: default avatarVincent Brouwers <vincent.brouwers@ing.com>
      211b54ac
  7. 05 Jun, 2023 1 commit
  8. 02 Jun, 2023 1 commit
  9. 02 May, 2023 1 commit
  10. 26 Apr, 2023 2 commits
  11. 25 Apr, 2023 1 commit
  12. 21 Apr, 2023 1 commit
  13. 18 Apr, 2023 1 commit
  14. 09 Mar, 2023 3 commits
  15. 07 Mar, 2023 1 commit
  16. 02 Mar, 2023 1 commit
  17. 28 Feb, 2023 1 commit
  18. 27 Feb, 2023 1 commit
  19. 24 Feb, 2023 1 commit
  20. 03 Feb, 2023 1 commit
  21. 02 Feb, 2023 2 commits
  22. 01 Feb, 2023 1 commit
  23. 31 Jan, 2023 3 commits
  24. 30 Jan, 2023 1 commit
  25. 15 Dec, 2022 1 commit
  26. 12 Dec, 2022 1 commit
  27. 27 Oct, 2022 1 commit
  28. 21 Oct, 2022 1 commit
  29. 20 Oct, 2022 1 commit
  30. 17 Oct, 2022 2 commits