1. 28 Oct, 2024 1 commit
    • Nicolas Patry's avatar
      Choosing input/total tokens automatically based on available VRAM? (#2673) · 0c9b6cdd
      Nicolas Patry authored
      * Choosing input/total tokens automatically based on available VRAM?
      
      * Update doc.
      
      * Remove generated files.
      
      * Trying to fix non chunking targets.
      
      * Attempt #2
      
      * fix.
      
      * QuantLinear is rocm compatible.
      
      * Much simpler logic after the overhead.
      
      * Updating logic + non flash.
      
      * Revert doc text.
      
      * Simple updates.
      
      * Fix integration mt0 (transformers update).
      0c9b6cdd
  2. 24 Sep, 2024 1 commit
    • Nicolas Patry's avatar
      Cleanup Vertex + Chat (#2553) · c032280b
      Nicolas Patry authored
      * Cleanup Vertex + Chat
      
      * logprobs defaults to false.
      
      * Parameters are optional
      
      * Fix  docs.
      
      * Changing back this logprobs default.
      
      * Fixup doc.
      
      * Let's debug that.
      
      * Not unstable.
      
      * Updating Cargo ?
      
      * Wat?
      
      * Dummy change.
      
      * Trying some other install.
      
      * Trying smething.
      
      * Revert everything.
      
      * Update Cargo lock.
      
      * Fixing the pre-commit after rebase.
      c032280b
  3. 05 Sep, 2024 1 commit
  4. 14 Aug, 2024 1 commit
  5. 09 Aug, 2024 1 commit
  6. 31 Jul, 2024 1 commit
    • Nicolas Patry's avatar
      Rebase TRT-llm (#2331) · 2b19d671
      Nicolas Patry authored
      * wip
      
      wip
      
      refacto
      
      refacto
      
      Initial setup for CXX binding to TRTLLM
      
      Working FFI call for TGI and TRTLLM backend
      
      Remove unused parameters annd force tokenizer name to be set
      
      Overall build TRTLLM and deps through CMake build system
      
      Enable end to end CMake build
      
      First version loading engines and making it ready for inference
      
      Remembering to check how we can detect support for chunked context
      
      Move to latest TensorRT-LLM version
      
      Specify which default log level to use depending on CMake build type
      
      make leader executor mode working
      
      unconditionally call InitializeBackend on the FFI layer
      
      bind to CUDA::nvml to retrieve compute capabilities at runtime
      
      updated logic and comment to detect cuda compute capabilities
      
      implement the Stream method to send new tokens through a callback
      
      use spdlog release 1.14.1 moving forward
      
      update trtllm to latest version a96cccafcf6365c128f004f779160951f8c0801c
      
      correctly tell cmake to build dependent tensorrt...
      2b19d671
  7. 01 May, 2024 1 commit
    • Nicolas Patry's avatar
      Adding scripts to prepare load data. (#1841) · 0038e602
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      0038e602
  8. 30 Apr, 2024 1 commit
  9. 16 Feb, 2024 1 commit
  10. 26 Jan, 2024 1 commit
  11. 08 Jun, 2023 1 commit
  12. 25 Apr, 2023 1 commit
  13. 20 Oct, 2022 1 commit
  14. 11 Oct, 2022 1 commit