1. 25 Jun, 2024 2 commits
  2. 04 Jun, 2024 1 commit
    • OlivierDehaene's avatar
      feat: add SchedulerV3 (#1996) · 757223b3
      OlivierDehaene authored
      - Refactor code to allow supporting multiple versions of the
      generate.proto at the same time
      - Add v3/generate.proto (ISO to generate.proto for now but allow for
      future changes without impacting v2 backends)
      - Add Schedule trait to abstract queuing and batching mechanisms that
      will be different in the future
      - Add SchedulerV2/V3 impl
      757223b3
  3. 27 May, 2024 1 commit
    • drbh's avatar
      Processor config chat template (#1954) · 0732b9d2
      drbh authored
      This PR loads the `processor_config` similar to the `tokenizer_config`
      and uses the processor_config's chat_template if the tokenizer_config
      does not include one. These changes enable chat with idefics2
      0732b9d2
  4. 23 May, 2024 1 commit
    • Nicolas Patry's avatar
      Improving the logging system. (#1938) · 95465346
      Nicolas Patry authored
      - Added a debug log for speculated ids (helps seeing in logs quality of
        a speculator).
      - Remove newlines from child process logs when re-emitting in non JSON
        mode.
      - Made standard level be closer to what's expected (only our binaries
        level).
      - Propagate that level correctly to the shard (was forced into INFO).
      
      
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      95465346
  5. 23 Apr, 2024 2 commits
  6. 17 Apr, 2024 1 commit
  7. 12 Apr, 2024 1 commit
    • Nicolas Patry's avatar
      Improve the defaults for the launcher (#1727) · 1b2670c8
      Nicolas Patry authored
      # What does this PR do?
      
      - Renamed `max_input_length` into `max_input_tokens` for consistency
      (backward compatible change, will yell if both are set.)
      - Will now use the config for `max_input_tokens` `max_total_token` and
      `max_batch_total_tokens`.
      - Capping the values to 16k in order to save VRAM on behalf of users
      (overriddable by simply setting the values).
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      1b2670c8
  8. 11 Apr, 2024 2 commits
  9. 09 Apr, 2024 1 commit
    • Nicolas Patry's avatar
      Adding Llava-Next (Llava 1.6) with full support. (#1709) · 4634b00c
      Nicolas Patry authored
      # What does this PR do?
      
      - Changed all models to extract `embed_tokens` in order to enable llava
      to separately call the embeddings and the core model layers.
      - Added VlmCausalLM to inherit from FlashMistral in order to be
      maximally supported. The only added logics sits on top and parses images
      into pixel values, preallocates input_ids space for the image
      embeddings, and passes them for the model.
      - Added Clip for the vision tower.
      - Didn't add flash for the vision tower since there's no padding anyway.
      - Added heuristic (potentially incomplete) to calculate number of
      features *before* calculating the clip patches (allows for easier logic
      reuse of the LLM under the hood).
      
      
      Still needs to be done:
      
      - [x] Implement the image parsing in the controller side, to avoid
      downloading n times per TP shard and also refusing requests too large
      early and avoid issues where the truncation actually truncates the
      image.
      - [ ] Make sure it works with quantization properly.
      - [x] Make sure it works with TP>1
      
      
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      4634b00c
  10. 21 Feb, 2024 1 commit
  11. 20 Feb, 2024 1 commit
  12. 15 Feb, 2024 1 commit
    • drbh's avatar
      Outlines guided generation (#1539) · cef0553d
      drbh authored
      This WIP PR starts to add grammar support via outlines, currently this
      PR supports very simple regex grammars and does not optimize for
      precompiling or caching grammar fsm's.
      
      todo:
      - [X] add simple outlines guidance to `NextTokenChooser`
      - [X] update protos for grammar
      - [X] update generation params API
      - [X] constrain simple grammar
      - [ ] support parsing more complex grammar into fsm
      - [ ] support all outline support grammar types
      - [ ] explore optimizations to avoid recompiling grammars
      
      guided request
      ```bash
      curl -s 'http://localhost:3000/generate' \
      --header 'Content-Type: application/json' \
      --data-raw '{
          "inputs": "make an email for david: \n",
          "parameters": {
              "max_new_tokens": 6,
              "grammar": "[\\w-]+@([\\w-]+\\.)+[\\w-]+"
          }
      }' | jq
      ```
      response
      ```json
      {
        "generated_text": "david@example.com"
      }
      ```
      
      unguided request
      ```bash
      curl -s 'http://localhost:3000/generate' \
      --header 'Content-Type: application/json' \
      --data '{
          "inputs": "make an email for david: \n",
          "parameters": {
              "max_new_tokens": 6
          }
      }' | jq
      ```
      response
      ```json
      {
        "generated_text": "    email = 'david"
      }
      ```
      cef0553d
  13. 09 Feb, 2024 1 commit
  14. 01 Feb, 2024 1 commit
    • drbh's avatar
      fix: tokenizer config should use local model path when possible (#1518) · ee1cf51c
      drbh authored
      
      
      This PR fixes the issue with loading a local tokenizer config.
      Previously the default functionality would look in the current working
      directory. Now if a local model path is specified we will check that
      directory for the tokenizer_config.
      
      ## Examples of valid commands
      
      uses tokenizer_config from hub
      ```
      text-generation-launcher --model-id HuggingFaceH4/zephyr-7b-beta
      ```
      
      use tokenizer_config from local model path
      ```
      text-generation-launcher \
        --model-id ~/.cache/huggingface/hub/models--HuggingFaceH4--zephyr-7b-beta/snapshots/dc24cabd13eacd3ae3a5fe574bd645483a335a4a/
      ```
      
      use specific tokenizer_config file
      ```
       text-generation-launcher \
        --model-id ~/.cache/huggingface/hub/models--HuggingFaceH4--zephyr-7b-beta/snapshots/dc24cabd13eacd3ae3a5fe574bd645483a335a4a/ \
        --tokenizer-config-path ~/.cache/huggingface/hub/models--HuggingFaceH4--zephyr-7b-beta/snapshots/dc24cabd13eacd3ae3a5fe574bd645483a335a4a/tokenizer_config.json
      
      
      ```
      
      ---------
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      ee1cf51c
  15. 26 Jan, 2024 1 commit
  16. 24 Jan, 2024 1 commit
    • drbh's avatar
      Add messages api compatibility docs (#1478) · 7872b8c5
      drbh authored
      This PR adds a new page to the docs that describes the Messages API and
      how to use it.
      
      Additionally this page will contain cloud provider specific information
      for enabling and using this feature. This PR includes a SageMaker
      example/information.
      7872b8c5
  17. 22 Jan, 2024 1 commit
    • drbh's avatar
      feat: conditionally toggle chat on invocations route (#1454) · 98e5faff
      drbh authored
      This PR adds support for reading the `OAI_ENABLED` env var which will
      changes the function called when the `/invocations` is called.
      
      If `OAI_ENABLED=true` the `chat_completions` method is used otherwise it
      defaults to `compat_generate`.
      
      example running the router
      ```bash
      OAI_ENABLED=true \
        cargo run -- \
        --tokenizer-name mistralai/Mistral-7B-Instruct-v0.2
      ```
      
      example request
      ```bash
      curl localhost:3000/invocations \
          -X POST \
          -d '{ "model": "tgi", "messages": [ { "role": "user", "content": "What is the IP address of the Google DNS servers?" } ], "stream": false, "max_tokens": 20, "logprobs": true, "seed": 0 }' \
          -H 'Content-Type: application/json' | jq 
      ```
      
      **please let me know if any naming changes are needed or if any other
      routes need similar functionality.
      98e5faff
  18. 16 Jan, 2024 1 commit
    • drbh's avatar
      feat: supports openai chat completions API (#1427) · 0eabc835
      drbh authored
      This PR adds support to make TGI a drop in replacement for OpenAI
      clients by exposing the same HTTP interface.
      
      Notes
      - TGI inits a single model at startup so the `model` field is unused in
      HTTP requests.
      - `max_tokens` and `stream` should work as expected but other params may
      be (unimplemented or not supported)
      
      General approach
      - fetch the `tokenizer_config` at startup from the hub
      - pass `tokenizer_config` into `Infer` so we have it at request time
      - use the `chat_template` on the config to format chat request
      - parse jinja template and render chat string
      - pass inputs into existing generate function
      - wrap generation output in expected structure before returning
      
      # How to test
      
      ### Streaming curl
      ```bash
      curl localhost:3000/v1/chat/completions \
          -X POST \
          -d '{
        "model": "tgi",
        "messages": [
          {
            "role": "system",
            "content": "You are a helpful assistant."
          },
          {
            "role": "user",
            "content": "What is deep learning?"
          }
        ],
        "stream": true,
        "max_tokens": 20
      }' \
          -H 'Content-Type: application/json'
      ```
      
      
      It is also possible to use the `openai` python library and change the
      base url
      
      ###  🌊 STREAMING REQUEST
      ```python
      from openai import OpenAI
      
      # init the client but point it to TGI
      client = OpenAI(
          base_url="http://localhost:3000/v1",
          api_key="not needed for a local LLM"
      )
      
      chat_completion = client.chat.completions.create(
          model="tgi",
          messages=[
              {"role": "system", "content": "You are a helpful assistant." },
              {"role": "user", "content": "What is deep learning?"}
          ],
          stream=True
      )
      
      # iterate and print stream
      for message in chat_completion:
          print(message)
      
      # ChatCompletionChunk(id='', choices=[Choice(delta=ChoiceDelta(content=' that', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=2, logprobs=None)], created=1704486761, model='', object='text_completion', system_fingerprint='')
      ```
      
      ### 🚗 SYNCHRONOUS REQUEST
      ```python
      from openai import OpenAI
      
      # init the client but point it to TGI
      client = OpenAI(
          base_url="http://localhost:3000/v1",
          api_key="not needed for a local LLM"
      )
      
      chat_completion = client.chat.completions.create(
          model="tgi",
          messages=[
              {"role": "system", "content": "You are a helpful assistant." },
              {"role": "user", "content": "What is deep learning?"}
          ],
          stream=False
      )
      
      print(chat_completion)
      # ChatCompletion(id='', choices=[Choice(finish_reason=None, index=0, logprobs=None, message=ChatCompletionMessage(content='\nDeep learning is a new field of research that has been gaining traction in the last ...', role='assistant', function_call=None, tool_calls=None))], created=1704486762, model='', object='text_completion', system_fingerprint='', usage=CompletionUsage(completion_tokens=100, prompt_tokens=76, total_tokens=176))
      ```
      
      
      ## How to run dev
      
      ```bash
      cd text-generation-inference/server
      MASTER_ADDR=127.0.0.1 MASTER_PORT=5555 text-generation-server serve --trust-remote-code gpt2
      ```
      
      ***note many of the existing `chat_templates` use non standard `jinja`
      (ie. adding a `raise` to the template) which will throw an error when
      parsing; hence using `upstage/SOLAR-10.7B-Instruct-v1.0` since it has a
      valid template
      ```bash
      cd text-generation-inference/router
      cargo run -- --tokenizer-name upstage/SOLAR-10.7B-Instruct-v1.0
      ```
      
      trigger
      ```bash
      curl localhost:3000/v1/chat/completions \
          -X POST \
          -d '{ "model": "gpt-3.5-turbo", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "What is the IP address of the Google DNS servers?" } ], "stream": true, "max_tokens": 20, "logprobs": true }' \
          -H 'Content-Type: application/json'
      ```
      
      ^ supports `stream: true` and `stream: false` requests
      0eabc835
  19. 10 Jan, 2024 1 commit
  20. 20 Oct, 2023 1 commit
  21. 27 Sep, 2023 1 commit
    • Nicolas Patry's avatar
      Preping 1.1.0 (#1066) · a0498642
      Nicolas Patry authored
      # What does this PR do?
      
      Upgrade all relevant versions and dependencies.
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      a0498642
  22. 28 Aug, 2023 1 commit
    • Nicolas Patry's avatar
      Rebased #617 (#868) · 211b54ac
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation
      
      ).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      
      ---------
      Co-authored-by: default avatarVincent Brouwers <vincent.brouwers@ing.com>
      211b54ac
  23. 28 Jul, 2023 1 commit
  24. 19 Jul, 2023 2 commits
  25. 13 Jul, 2023 2 commits
  26. 10 Jul, 2023 1 commit
  27. 05 Jul, 2023 1 commit
  28. 30 Jun, 2023 2 commits
  29. 16 Jun, 2023 1 commit
  30. 16 May, 2023 1 commit
  31. 09 May, 2023 1 commit
  32. 04 May, 2023 1 commit
  33. 24 Apr, 2023 1 commit
  34. 21 Apr, 2023 1 commit