1. 12 Aug, 2024 1 commit
  2. 09 Aug, 2024 2 commits
  3. 06 Aug, 2024 2 commits
  4. 31 Jul, 2024 1 commit
    • Nicolas Patry's avatar
      Rebase TRT-llm (#2331) · 2b19d671
      Nicolas Patry authored
      * wip
      
      wip
      
      refacto
      
      refacto
      
      Initial setup for CXX binding to TRTLLM
      
      Working FFI call for TGI and TRTLLM backend
      
      Remove unused parameters annd force tokenizer name to be set
      
      Overall build TRTLLM and deps through CMake build system
      
      Enable end to end CMake build
      
      First version loading engines and making it ready for inference
      
      Remembering to check how we can detect support for chunked context
      
      Move to latest TensorRT-LLM version
      
      Specify which default log level to use depending on CMake build type
      
      make leader executor mode working
      
      unconditionally call InitializeBackend on the FFI layer
      
      bind to CUDA::nvml to retrieve compute capabilities at runtime
      
      updated logic and comment to detect cuda compute capabilities
      
      implement the Stream method to send new tokens through a callback
      
      use spdlog release 1.14.1 moving forward
      
      update trtllm to latest version a96cccafcf6365c128f004f779160951f8c0801c
      
      correctly tell cmake to build dependent tensorrt-llm required libraries
      
      create cmake install target to put everything relevant in installation folder
      
      add auth_token CLI argument to provide hf hub authentification token
      
      allow converting huggingface::tokenizers error to TensorRtLlmBackendError
      
      use correct include for spdlog
      
      include guard to build example in cmakelists
      
      working setup of the ffi layer
      
      remove fmt import
      
      use external fmt lib
      
      end to end ffi flow working
      
      make sure to track include/ffi.h to trigger rebuild from cargo
      
      impl the rust backend which currently cannot move the actual computation in background thread
      
      expose shutdown function at ffi layer
      
      impl RwLock scenario for TensorRtLllmBackend
      
      oops missing c++ backend definitions
      
      compute the number of maximum new tokens for each request independently
      
      make sure the context is not dropped in the middle of the async decoding.
      
      remove unnecessary log
      
      add all the necessary plumbery to return the generated content
      
      update invalid doc in cpp file
      
      correctly forward back the log probabilities
      
      remove unneeded scope variable for now
      
      refactor Stream impl for Generation to factorise code
      
      expose the internal missing start/queue timestamp
      
      forward tgi parameters rep/freq penalty
      
      add some more validation about grammar not supported
      
      define a shared struct to hold the result of a decoding step
      
      expose information about potential error happening while decoding
      
      remove logging
      
      add logging in case of decoding error
      
      make sure executor_worker is provided
      
      add initial Dockerfile for TRTLLM backend
      
      add some more information in CMakeLists.txt to correctly install executorWorker
      
      add some more information in CMakeLists.txt to correctly find and install nvrtc wrapper
      
      simplify prebuilt trtllm libraries name definition
      
      do the same name definition stuff for tensorrt_llm_executor_static
      
      leverage pkg-config to probe libraries paths and reuse new install structure from cmake
      
      fix bad copy/past missing nvinfer linkage direction
      
      align all the linker search dependency
      
      add missing pkgconfig folder for MPI in Dockerfile
      
      correctly setup linking search path for runtime layer
      
      fix missing / before tgi lib path
      
      adding missing ld_library_path for cuda stubs in Dockerfile
      
      update tgi entrypoint
      
      commenting out Python part for TensorRT installation
      
      refactored docker image
      
      move to TensorRT-LLM v0.11.0
      
      make docker linter happy with same capitalization rule
      
      fix typo
      
      refactor the compute capabilities detection along with num gpus
      
      update TensorRT-LLM to latest version
      
      update TensorRT install script to latest
      
      update build.rs to link to cuda 12.5
      
      add missing dependant libraries for linking
      
      clean up a bit
      
      install to decoder_attention target
      
      add some custom stuff for nccl linkage
      
      fix envvar CARGO_CFG_TARGET_ARCH set at runtime vs compile time
      
      use std::env::const::ARCH
      
      make sure variable live long enough...
      
      look for cuda 12.5
      
      add some more basic info in README.md
      
      * Rebase.
      
      * Fix autodocs.
      
      * Let's try to enable trtllm backend.
      
      * Ignore backends/v3 by default.
      
      * Fixing client.
      
      * Fix makefile + autodocs.
      
      * Updating the schema thing + redocly.
      
      * Fix trtllm lint.
      
      * Adding pb files ?
      
      * Remove cargo fmt temporarily.
      
      * ?
      
      * Tmp.
      
      * Remove both check + clippy  ?
      
      * Backporting telemetry.
      
      * Backporting 457fb0a1
      
      
      
      * Remove PB from git.
      
      * Fixing PB with default member backends/client
      
      * update TensorRT-LLM to latest version
      
      * provided None for api_key
      
      * link against libtensorrt_llm and not libtensorrt-llm
      
      ---------
      Co-authored-by: default avatarOlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
      Co-authored-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      2b19d671
  5. 19 Jul, 2024 2 commits
  6. 09 Jul, 2024 1 commit
  7. 08 Jul, 2024 1 commit
  8. 03 Jul, 2024 3 commits
  9. 01 Jul, 2024 1 commit
    • drbh's avatar
      fix: prefer serde structs over custom functions (#2127) · 9eefb2f6
      drbh authored
      
      
      * fix: prefer enum for chat object
      
      * fix: adjust typo
      
      * fix: enum CompletionType not ObjectType
      
      * fix: adjust typo
      
      * feat: leverage serde for conditional deser
      
      * fix: adjust HubTokenizerConfig after rebase
      
      * fix: update create_post_processor logic for token type
      
      * fix: adjust unwrap syntax in template
      
      * Fixing the post processor.
      
      ---------
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      9eefb2f6
  10. 27 Jun, 2024 2 commits
  11. 25 Jun, 2024 2 commits
    • drbh's avatar
      Enable multiple LoRa adapters (#2010) · 04e1af94
      drbh authored
      
      
      * feat: first draft load multiple lora
      
      * feat: load weights within layer and refactor lora pass
      
      * fix: refactor and reduce lora math
      
      * feat: baseline impl single request multi lora support
      
      * feat: prefer lorax implementation and port loading logic
      
      * fix: prefer adapter_data and refactors
      
      * feat: perfer loraxs custom punica kernels and add mlp loras
      
      * fix: adjust batch for bgmv
      
      * fix: adjust adapter_segments logic when in batch
      
      * fix: refactor and move changes to v3 proto
      
      * fix: pass model_id for all flash causal lms
      
      * fix: pass model_id for all causal and seq2seq lms
      
      * fix: add model_id to model test
      
      * feat: add lora support to mistral and refactors
      
      * feat: prefer model id in request
      
      * fix: include rust code for adapter id
      
      * feat: bump launcher and add new lora docs
      
      * feat: support base model generation and refactors
      
      * fix: rename doc to retry ci build
      
      * feat: support if vlm models
      
      * fix: add adapter_data param and avoid missing layers
      
      * fix: add adapter_data param to phi and neox
      
      * fix: update all models forwards to include adapter_data
      
      * fix: add model_id to IdeficsCausalLM
      
      * Update lora.md
      
      Fixed a typo
      
      * Update lora.md
      
      Fixing spam image
      
      * fix: add lora kernel to dockerfile, support running without kernels and refactors
      
      * fix: avoid dockerfile conflict
      
      * fix: refactors and adjust flash llama lora logic
      
      * fix: skip llama test due to CI issue (temp)
      
      * fix: skip llama test CI (temp) 2
      
      * fix: revert skips and prefer updated ci token for tests
      
      * fix: refactors and helpful comments
      
      * fix: add noop in TensorParallelAdapterRowLinear too
      
      * fix: refactor and move shard_lora_weights logic
      
      * fix: exit early if no adapter_data
      
      ---------
      Co-authored-by: default avatarDerek <datavistics@gmail.com>
      04e1af94
    • sunxichen's avatar
      fix ChatCompletion and ChatCompletionChunk object string not compatible with... · b69f0780
      sunxichen authored
      
      fix ChatCompletion and ChatCompletionChunk object string not compatible with standard openai api (#2089)
      Co-authored-by: default avatarsunxichen <sun.xc@digitalcnzz.com>
      b69f0780
  12. 13 Jun, 2024 1 commit
    • drbh's avatar
      implement Open Inference Protocol endpoints (#1942) · f433f1f7
      drbh authored
      * feat: add kserve feature and basic routes
      
      * feat: implement infer endpoint wrapper around generate
      
      * fix: refactor and improve types
      
      * fix: improve infer and simplify
      
      * fix: cleanup and improve api docs
      
      * fix: refactor and encapsulate kserve feat in file
      
      * fix: remove typos after rebase
      f433f1f7
  13. 11 Jun, 2024 1 commit
  14. 04 Jun, 2024 1 commit
    • OlivierDehaene's avatar
      feat: add SchedulerV3 (#1996) · 757223b3
      OlivierDehaene authored
      - Refactor code to allow supporting multiple versions of the
      generate.proto at the same time
      - Add v3/generate.proto (ISO to generate.proto for now but allow for
      future changes without impacting v2 backends)
      - Add Schedule trait to abstract queuing and batching mechanisms that
      will be different in the future
      - Add SchedulerV2/V3 impl
      757223b3
  15. 27 May, 2024 1 commit
    • drbh's avatar
      Processor config chat template (#1954) · 0732b9d2
      drbh authored
      This PR loads the `processor_config` similar to the `tokenizer_config`
      and uses the processor_config's chat_template if the tokenizer_config
      does not include one. These changes enable chat with idefics2
      0732b9d2
  16. 23 May, 2024 1 commit
  17. 17 May, 2024 1 commit
    • Nicolas Patry's avatar
      Removing some unused code. (#1915) · a60fa840
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      a60fa840
  18. 16 May, 2024 3 commits
    • Nicolas Patry's avatar
      Types. (#1909) · b3dd3902
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      b3dd3902
    • Nicolas Patry's avatar
      Fixing types. (#1906) · f5d43414
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      f5d43414
    • phangiabao98's avatar
      OpenAI function calling compatible support (#1888) · d8402eaf
      phangiabao98 authored
      # What does this PR do?
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      https://github.com/huggingface/text-generation-inference/issues/1887
      
      ## Before submitting
      - [no ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [yes] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ yes] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [yes ] Did you make sure to update the documentation with your
      changes? Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation
      
      ).
      - [ yes] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      @Narsil
      
       -->
      
      ---------
      Co-authored-by: default avatarBao Phan <baopg@inter-k.com>
      d8402eaf
  19. 03 May, 2024 1 commit
  20. 30 Apr, 2024 1 commit
    • drbh's avatar
      Handle images in chat api (#1828) · c99ecd77
      drbh authored
      This PR allows for messages to be formatted as simple strings, or as an
      array of objects including image urls. This is done by formatting
      content arrays into a simple string.
      
      Example using `llava-hf/llava-v1.6-mistral-7b-hf` 
      
      ```bash
      curl localhost: 3000/v1/chat/completions \
      -X POST \
      -H 'Content-Type: application/json' \
      -d '{
          "model": "tgi",
          "messages": [
              {
                  "role": "user",
                  "content": [
                      {
                          "type": "text",
                          "text": "Whats in this image?"
                      },
                      {
                          "type": "image_url",
                          "image_url": {
                              "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png"
                          }
                      }
                  ]
              }
          ],
          "stream": false,
          "max_tokens": 20,
          "seed": 42
      }'
      ```
      
      is equivlant to this more simple request
      
      ```bash
      curl localhost: 3000/v1/chat/completions \
      -X POST \
      -H 'Content-Type: application/json' \
      -d '{
          "model": "tgi",
          "messages": [
              {
                  "role": "user",
                  "content": "Whats in this image?\n![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png)
      
      "
              }
          ],
          "stream": false,
          "max_tokens": 20,
          "seed": 42
      }'
      ```
      
      output
      ```
      # {"id":"","object":"text_completion","created":1714406985,"model":"llava-hf/llava-v1.6-mistral-7b-hf","system_fingerprint":"2.0.1-native","choices":[{"index":0,"message":{"role":"assistant","content":" This is an illustration of an anthropomorphic rabbit in a spacesuit, standing on what"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":2945,"completion_tokens":20,"total_tokens":2965}}%
      ```
      
      ---------
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      c99ecd77
  21. 25 Apr, 2024 1 commit
    • Nicolas Patry's avatar
      Use the generation config. (#1808) · ee47973a
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      ee47973a
  22. 23 Apr, 2024 3 commits
  23. 18 Apr, 2024 1 commit
  24. 17 Apr, 2024 1 commit
  25. 16 Apr, 2024 1 commit
  26. 10 Apr, 2024 1 commit
  27. 09 Apr, 2024 1 commit
    • Nicolas Patry's avatar
      Adding Llava-Next (Llava 1.6) with full support. (#1709) · 4634b00c
      Nicolas Patry authored
      # What does this PR do?
      
      - Changed all models to extract `embed_tokens` in order to enable llava
      to separately call the embeddings and the core model layers.
      - Added VlmCausalLM to inherit from FlashMistral in order to be
      maximally supported. The only added logics sits on top and parses images
      into pixel values, preallocates input_ids space for the image
      embeddings, and passes them for the model.
      - Added Clip for the vision tower.
      - Didn't add flash for the vision tower since there's no padding anyway.
      - Added heuristic (potentially incomplete) to calculate number of
      features *before* calculating the clip patches (allows for easier logic
      reuse of the LLM under the hood).
      
      
      Still needs to be done:
      
      - [x] Implement the image parsing in the controller side, to avoid
      downloading n times per TP shard and also refusing requests too large
      early and avoid issues where the truncation actually truncates the
      image.
      - [ ] Make sure it works with quantization properly.
      - [x] Make sure it works with TP>1
      
      
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      4634b00c
  28. 28 Mar, 2024 1 commit
    • drbh's avatar
      fix: adjust logprob response logic (#1682) · 818aee37
      drbh authored
      This PR fixes a bug with `ChatCompletionLogprobs` where if
      `top_tokens.len() == 0` empty results were returned.
      
      ```bash
       curl http://localhost:3000/v1/chat/completions \
          -X POST \
          -H 'Content-Type: application/json' \
          -d '{
        "model": "tgi",
        "logprobs": true,
        "messages": [
          {
            "role": "user",
            "content": "What is deep learning?"
          }
        ],
        "stream": false,
        "max_tokens": 20
      }'
      ```
      
      response
      
      
      ```json
      {"id":"","object":"text_completion","created":1711588522,"model":"google/gemma-2b-it","system_fingerprint":"1.4.4-native","choices":[{"index":0,"message":{"role":"assistant","content":"**Deep learning** is a subset of machine learning (ML) that emphasizes the creation of **artificial"},"logprobs":{"content":[{"token":"**","logprob":-0.22558594,"top_logprobs":[]},{"token":"Deep","logprob":-0.0014877319,"top_logprobs":[]},{"token":" learning","logprob":-0.12695312,"top_logprobs":[]},{"token":"**","logprob":-0.055664062,"top_logprobs":[]},{"token":" is","logprob":-0.00090026855,"top_logprobs":[]},{"token":" a","logprob":-0.006072998,"top_logprobs":[]},{"token":" subset","logprob":-2.25,"top_logprobs":[]},{"token":" of","logprob":-0.00031089783,"top_logprobs":[]},{"token":" machine","logprob":-0.091308594,"top_logprobs":[]},{"token":" learning","logprob":-0.00002348423,"top_logprobs":[]},{"token":" (","logprob":-1.671875,"top_logprobs":[]},{"token":"ML","logprob":-0.00040626526,"top_logprobs":[]},{"token":")","logprob":-0.00016212463,"top_logprobs":[]},{"token":" that","logprob":-0.13769531,"top_logprobs":[]},{"token":" emphasizes","logprob":-4.03125,"top_logprobs":[]},{"token":" the","logprob":-0.2890625,"top_logprobs":[]},{"token":" creation","logprob":-3.109375,"top_logprobs":[]},{"token":" of","logprob":-0.00024032593,"top_logprobs":[]},{"token":" **","logprob":-1.2265625,"top_logprobs":[]},{"token":"artificial","logprob":-0.10546875,"top_logprobs":[]}]},"finish_reason":"length"}],"usage":{"prompt_tokens":15,"completion_tokens":20,"total_tokens":35}}
      ```
      818aee37
  29. 21 Mar, 2024 1 commit
    • drbh's avatar
      fix: improve tool type, bump pydantic and outlines (#1650) · de6cb15f
      drbh authored
      This PR resolves a couple 
      
      - [X] adjusts the tool response to align with openai's tools response
      type
      - [X] bumps pydantic to `2.6.4` in all apps (resolves dependency issue
      when running tests)
      - [X] bump `outlines` version and fix import for new name
      de6cb15f