1. 21 Feb, 2024 1 commit
  2. 16 Feb, 2024 1 commit
  3. 15 Feb, 2024 1 commit
    • drbh's avatar
      Outlines guided generation (#1539) · cef0553d
      drbh authored
      This WIP PR starts to add grammar support via outlines, currently this
      PR supports very simple regex grammars and does not optimize for
      precompiling or caching grammar fsm's.
      
      todo:
      - [X] add simple outlines guidance to `NextTokenChooser`
      - [X] update protos for grammar
      - [X] update generation params API
      - [X] constrain simple grammar
      - [ ] support parsing more complex grammar into fsm
      - [ ] support all outline support grammar types
      - [ ] explore optimizations to avoid recompiling grammars
      
      guided request
      ```bash
      curl -s 'http://localhost:3000/generate' \
      --header 'Content-Type: application/json' \
      --data-raw '{
          "inputs": "make an email for david: \n",
          "parameters": {
              "max_new_tokens": 6,
              "grammar": "[\\w-]+@([\\w-]+\\.)+[\\w-]+"
          }
      }' | jq
      ```
      response
      ```json
      {
        "generated_text": "david@example.com"
      }
      ```
      
      unguided request
      ```bash
      curl -s 'http://localhost:3000/generate' \
      --header 'Content-Type: application/json' \
      --data '{
          "inputs": "make an email for david: \n",
          "parameters": {
              "max_new_tokens": 6
          }
      }' | jq
      ```
      response
      ```json
      {
        "generated_text": "    email = 'david"
      }
      ```
      cef0553d
  4. 08 Feb, 2024 1 commit
  5. 25 Jan, 2024 1 commit
    • Nicolas Patry's avatar
      Add a new `/tokenize` route to get the tokenized input (#1471) · 86c8335f
      Nicolas Patry authored
      # What does this PR do?
      
      
      Ideally this is done client side, but this is a recurring request,
      therefore we implemented it.
      
      - Runs only if rust tokenizer is present (not encumbering the main
      inference pipeline is important).
      - Returns simple results, ID, text (gotten with offsets from the
      original string) and offsets (so users can do things like highlighting
      text).
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      86c8335f
  6. 16 Jan, 2024 1 commit
    • drbh's avatar
      feat: supports openai chat completions API (#1427) · 0eabc835
      drbh authored
      This PR adds support to make TGI a drop in replacement for OpenAI
      clients by exposing the same HTTP interface.
      
      Notes
      - TGI inits a single model at startup so the `model` field is unused in
      HTTP requests.
      - `max_tokens` and `stream` should work as expected but other params may
      be (unimplemented or not supported)
      
      General approach
      - fetch the `tokenizer_config` at startup from the hub
      - pass `tokenizer_config` into `Infer` so we have it at request time
      - use the `chat_template` on the config to format chat request
      - parse jinja template and render chat string
      - pass inputs into existing generate function
      - wrap generation output in expected structure before returning
      
      # How to test
      
      ### Streaming curl
      ```bash
      curl localhost:3000/v1/chat/completions \
          -X POST \
          -d '{
        "model": "tgi",
        "messages": [
          {
            "role": "system",
            "content": "You are a helpful assistant."
          },
          {
            "role": "user",
            "content": "What is deep learning?"
          }
        ],
        "stream": true,
        "max_tokens": 20
      }' \
          -H 'Content-Type: application/json'
      ```
      
      
      It is also possible to use the `openai` python library and change the
      base url
      
      ###  🌊 STREAMING REQUEST
      ```python
      from openai import OpenAI
      
      # init the client but point it to TGI
      client = OpenAI(
          base_url="http://localhost:3000/v1",
          api_key="not needed for a local LLM"
      )
      
      chat_completion = client.chat.completions.create(
          model="tgi",
          messages=[
              {"role": "system", "content": "You are a helpful assistant." },
              {"role": "user", "content": "What is deep learning?"}
          ],
          stream=True
      )
      
      # iterate and print stream
      for message in chat_completion:
          print(message)
      
      # ChatCompletionChunk(id='', choices=[Choice(delta=ChoiceDelta(content=' that', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=2, logprobs=None)], created=1704486761, model='', object='text_completion', system_fingerprint='')
      ```
      
      ### 🚗 SYNCHRONOUS REQUEST
      ```python
      from openai import OpenAI
      
      # init the client but point it to TGI
      client = OpenAI(
          base_url="http://localhost:3000/v1",
          api_key="not needed for a local LLM"
      )
      
      chat_completion = client.chat.completions.create(
          model="tgi",
          messages=[
              {"role": "system", "content": "You are a helpful assistant." },
              {"role": "user", "content": "What is deep learning?"}
          ],
          stream=False
      )
      
      print(chat_completion)
      # ChatCompletion(id='', choices=[Choice(finish_reason=None, index=0, logprobs=None, message=ChatCompletionMessage(content='\nDeep learning is a new field of research that has been gaining traction in the last ...', role='assistant', function_call=None, tool_calls=None))], created=1704486762, model='', object='text_completion', system_fingerprint='', usage=CompletionUsage(completion_tokens=100, prompt_tokens=76, total_tokens=176))
      ```
      
      
      ## How to run dev
      
      ```bash
      cd text-generation-inference/server
      MASTER_ADDR=127.0.0.1 MASTER_PORT=5555 text-generation-server serve --trust-remote-code gpt2
      ```
      
      ***note many of the existing `chat_templates` use non standard `jinja`
      (ie. adding a `raise` to the template) which will throw an error when
      parsing; hence using `upstage/SOLAR-10.7B-Instruct-v1.0` since it has a
      valid template
      ```bash
      cd text-generation-inference/router
      cargo run -- --tokenizer-name upstage/SOLAR-10.7B-Instruct-v1.0
      ```
      
      trigger
      ```bash
      curl localhost:3000/v1/chat/completions \
          -X POST \
          -d '{ "model": "gpt-3.5-turbo", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "What is the IP address of the Google DNS servers?" } ], "stream": true, "max_tokens": 20, "logprobs": true }' \
          -H 'Content-Type: application/json'
      ```
      
      ^ supports `stream: true` and `stream: false` requests
      0eabc835
  7. 18 Dec, 2023 1 commit
  8. 14 Dec, 2023 1 commit
  9. 20 Nov, 2023 1 commit
  10. 23 Oct, 2023 1 commit
  11. 20 Oct, 2023 1 commit
  12. 11 Oct, 2023 1 commit
  13. 04 Oct, 2023 1 commit
    • Nicolas Patry's avatar
      Modify the default for `max_new_tokens`. (#1097) · 6df43da0
      Nicolas Patry authored
      # What does this PR do?
      
      Now clients which do not specify a max_length will be implying
      `max_new_tokens = max_total_tokens - input_length`.
      This is a serious change, but which seems more in line with what users
      expect from standing server.
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation
      
      ).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      
      ---------
      Co-authored-by: default avatarOlivierDehaene <olivier@huggingface.co>
      6df43da0
  14. 27 Sep, 2023 1 commit
    • Nicolas Patry's avatar
      Preping 1.1.0 (#1066) · a0498642
      Nicolas Patry authored
      # What does this PR do?
      
      Upgrade all relevant versions and dependencies.
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      a0498642
  15. 28 Aug, 2023 1 commit
    • Nicolas Patry's avatar
      Rebased #617 (#868) · 211b54ac
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation
      
      ).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      
      ---------
      Co-authored-by: default avatarVincent Brouwers <vincent.brouwers@ing.com>
      211b54ac
  16. 14 Aug, 2023 1 commit
    • Nicolas Patry's avatar
      Fix `tokenizers==0.13.4` . (#838) · 05dd14fd
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      05dd14fd
  17. 13 Jul, 2023 1 commit
  18. 02 Jun, 2023 1 commit
  19. 26 Apr, 2023 2 commits
  20. 25 Apr, 2023 1 commit
  21. 24 Apr, 2023 1 commit
  22. 17 Apr, 2023 1 commit
  23. 09 Apr, 2023 1 commit
  24. 30 Mar, 2023 1 commit
  25. 16 Mar, 2023 1 commit
  26. 09 Mar, 2023 3 commits
  27. 07 Mar, 2023 1 commit
  28. 02 Mar, 2023 1 commit
  29. 16 Feb, 2023 1 commit
  30. 15 Feb, 2023 1 commit
  31. 13 Feb, 2023 1 commit
  32. 03 Feb, 2023 1 commit
  33. 02 Feb, 2023 1 commit
  34. 01 Feb, 2023 1 commit
  35. 31 Jan, 2023 3 commits