1. 25 Apr, 2024 1 commit
  2. 23 Apr, 2024 6 commits
    • Nicolas Patry's avatar
      Adding support for `HF_HUB_OFFLINE` support in the router. (#1789) · 4c698fa6
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      4c698fa6
    • drbh's avatar
      fix: avoid frequency and repetition penalty on padding tokens (#1765) · 23d82b8f
      drbh authored
      This PR resolves an issue with the penalty processors during batched
      generation where extra padding tokens incorrectly impact the penalty
      scores.
      
      generation is impacted in the case where at least one item in the batch
      includes a `frequency_penalty`
      
      reproduction script below
      ```python
      import requests
      from concurrent import futures
      import time
      
      headers = {
          "Content-Type": "application/json",
      }
      
      json_data = {
          "inputs": "[INST] Whats the capitol of France? [/INST]",
          "parameters": {
              "max_new_tokens": 100,
              "seed": 20,
              "do_sample": False,
          },
      }
      
      
      json_data2 = {
          "inputs": "<s>[INST]Write a mind bending story: I saw a puppy a cat a rat and a raccoon during my bike ride in the park[/INST]",
          "parameters": {
              "max_new_tokens": 100,
              "seed": 2,
              "do_sample": False,
              # OFFENDING LINE
              "frequency_penalty": 1.05,
          },
      }
      
      base_url = "http://localhost:3000/generate
      
      "
      
      
      def req():
          response = requests.post(base_url, headers=headers, json=json_data)
          print("[req ]", response.json())
      
      
      def req2():
          response = requests.post(base_url, headers=headers, json=json_data2)
          print("[req2]", response.json())
      
      
      n = 1
      
      for i in range(0, 3):
          print(f"- {n} threads -")
          with futures.ThreadPoolExecutor(max_workers=n) as executor:
              executor.submit(req)
              for i in range(3):
                  executor.submit(req2)
      
          n += 1
      
      # - 1 threads -
      # [req ] {'generated_text': ' The capital of France is Paris.'}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      # - 2 threads -
      # [req ] {'generated_text': ' The capital city'}
      # [req2] {'generated_text': ' As""%\n================'}
      # [req2] {'generated_text': ' As""%%$\n================'}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      
      # output with this PR's changes:
      # - 1 threads -
      # [req ] {'generated_text': ' The capital of France is Paris.'}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      # - 2 threads -
      # [req ] {'generated_text': ' The capital city'}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      
      ```
      
      **divergence from expected generation is easier to reproduce with
      batched grammar requests as they are more sensitive to unexpected
      outputs.
      
      this PR resolves the issue by setting the penalty score to 0 where input
      ids are padding tokens (0).
      
      ---------
      Co-authored-by: default avatarOlivierDehaene <olivier@huggingface.co>
      23d82b8f
    • Nicolas Patry's avatar
      Idefics2. (#1756) · bfddfa59
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      bfddfa59
    • Nicolas Patry's avatar
      Phi3 support (#1797) · 986b4044
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      986b4044
    • drbh's avatar
      feat: allow null eos and bos tokens in config (#1791) · 9be1db31
      drbh authored
      This PR resolves an issue loading in tokenizer_configs where the eos or
      bos token is null as in:
      [Qwen/Qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/tokenizer_config.json)
      
      resolves:
      https://github.com/huggingface/text-generation-inference/issues/1545 and
      related to https://github.com/QwenLM/Qwen1.5/issues/162
      9be1db31
    • Lucain's avatar
      Add attribute descriptions for `GenerateParameters` (#1798) · 455cada5
      Lucain authored
      Once https://github.com/huggingface/huggingface.js/pull/629 gets merged,
      we will rely on TGI's specs to generate jsonschema types for
      `text_generation` and `chat_completion`.
      
      This PR adds some documentation for `GenerationParameters`'s properties
      so that they get documented in the downstream tools (TGI docs but also
      `huggingface.js`/`huggingface_hub` inference clients). I mostly took
      inspiration from [the python
      client](https://github.com/huggingface/text-generation-inference/blob/main/clients/python/text_generation/types.py)
      for the descriptions.
      455cada5
  3. 22 Apr, 2024 2 commits
  4. 18 Apr, 2024 2 commits
  5. 17 Apr, 2024 1 commit
  6. 16 Apr, 2024 3 commits
    • drbh's avatar
      fix: bump clients test base url to llama (#1751) · e4d31a40
      drbh authored
      This PR bumps the client tests from `google/flan-t5-xxl` to
      `meta-llama/Llama-2-7b-chat-hf` to resolve issues when calling the
      endpoint and `google/flan-t5-xxl` is not available
      
      run with
      ```bash
      make python-client-tests
      
      clients/python/tests/test_client.py ..............     [ 43%]
      clients/python/tests/test_errors.py ..........         [ 75%]
      clients/python/tests/test_inference_api.py ......      [ 93%]
      clients/python/tests/test_types.py ..                  [100%]
      ```
      
      **note `google/flan-t5-xxl` function is currently unused but still
      included in the `conftest.py`
      e4d31a40
    • Lucain's avatar
      Update response type for `/v1/chat/completions` and `/v1/completions` (#1747) · 00f36535
      Lucain authored
      `/v1/chat/completions` and `/v1/completions` have different output types
      depending on the `stream` parameter. This PR aims at fixing the
      inconsistency in the auto-generated
      [openapi.json](https://huggingface.github.io/text-generation-inference/openapi.json)
      specs.
      
      cc @OlivierDehaene @drbh I reused what had been done for the `/`
      endpoint but haven't tested anything myself. Could you confirm this is
      the correct way of handling things?
      
      Also, should I update the openapi.json file manually? If yes, how can I
      do it?
      00f36535
    • drbh's avatar
      feat: improve tools to include name and add tests (#1693) · 7276d434
      drbh authored
      This PR makes tool calling aware of the name of the function selected. 
      
      Fixes:
      https://github.com/huggingface/text-generation-inference/issues/1657
      
      Thank you @puppetm4st3r for the helpful snippets, large parts of this PR
      are simply refactors of the code shared 🙏
      
      **opening draft PR because small tweaks are needed before merging
      7276d434
  7. 15 Apr, 2024 1 commit
    • Nicolas Patry's avatar
      Fixing CI. (#1748) · 88702d87
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      88702d87
  8. 12 Apr, 2024 8 commits
    • OlivierDehaene's avatar
      v2.0.0 (#1736) · c38a7d7d
      OlivierDehaene authored
      c38a7d7d
    • Ikko Eltociear Ashimine's avatar
      Fix typo in guidance.md (#1735) · 275caa04
      Ikko Eltociear Ashimine authored
      # What does this PR do?
      
      compliation -> compilation
      
      
      ## Before submitting
      - [x] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [x] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      275caa04
    • OlivierDehaene's avatar
      feat: medusa v2 (#1734) · eefea5ee
      OlivierDehaene authored
      eefea5ee
    • Nicolas Patry's avatar
      Improve the defaults for the launcher (#1727) · 1b2670c8
      Nicolas Patry authored
      # What does this PR do?
      
      - Renamed `max_input_length` into `max_input_tokens` for consistency
      (backward compatible change, will yell if both are set.)
      - Will now use the config for `max_input_tokens` `max_total_token` and
      `max_batch_total_tokens`.
      - Capping the values to 16k in order to save VRAM on behalf of users
      (overriddable by simply setting the values).
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      1b2670c8
    • Christof Weickhardt's avatar
      chore(cargo-toml): apply lto fat and codegen-units of one (#1651) · 9d8f21ca
      Christof Weickhardt authored
      # What does this PR do?
      
      I have suggested similar changes over at
      https://github.com/huggingface/text-embeddings-inference/pull/201.
      
      Here being my additional question, why `debug` is enabled during release
      building? (hence I didn't add the flag to script things)
      
      Applying the following optimizations:
      - `lto` (link time optimizations) over all code (including dependencies)
      - Using a single `codegen-unit` to apply optimizations within 1 code
      unit at build time
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      @OlivierDehaene OR @Narsil
      9d8f21ca
    • OlivierDehaene's avatar
    • Nicolas Patry's avatar
      Upgrade EETQ (Fixes the cuda graphs). (#1729) · 6c2c44b8
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      6c2c44b8
    • Nicolas Patry's avatar
      Fp8 Support (#1726) · 408dbc48
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation
      
      ).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      
      ---------
      Co-authored-by: default avatarDong Shin <d0104.shin@gmail.com>
      408dbc48
  9. 11 Apr, 2024 4 commits
  10. 10 Apr, 2024 2 commits
  11. 09 Apr, 2024 2 commits
    • Nicolas Patry's avatar
      Adding Llava-Next (Llava 1.6) with full support. (#1709) · 4634b00c
      Nicolas Patry authored
      # What does this PR do?
      
      - Changed all models to extract `embed_tokens` in order to enable llava
      to separately call the embeddings and the core model layers.
      - Added VlmCausalLM to inherit from FlashMistral in order to be
      maximally supported. The only added logics sits on top and parses images
      into pixel values, preallocates input_ids space for the image
      embeddings, and passes them for the model.
      - Added Clip for the vision tower.
      - Didn't add flash for the vision tower since there's no padding anyway.
      - Added heuristic (potentially incomplete) to calculate number of
      features *before* calculating the clip patches (allows for easier logic
      reuse of the LLM under the hood).
      
      
      Still needs to be done:
      
      - [x] Implement the image parsing in the controller side, to avoid
      downloading n times per TP shard and also refusing requests too large
      early and avoid issues where the truncation actually truncates the
      image.
      - [ ] Make sure it works with quantization properly.
      - [x] Make sure it works with TP>1
      
      
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      4634b00c
    • Nicolas Patry's avatar
      Automatic quantization config. (#1719) · 106d8ee8
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      106d8ee8
  12. 08 Apr, 2024 2 commits
  13. 05 Apr, 2024 4 commits
    • Nicolas Patry's avatar
      Force weights_only (before fully breaking pickle files anyway). (#1710) · 8dca3b04
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      8dca3b04
    • Nicolas Patry's avatar
      Fixing cohere tokenizer. (#1697) · f9958ee1
      Nicolas Patry authored
      f9958ee1
    • Nicolas Patry's avatar
      5062fda4
    • Nicolas Patry's avatar
      Pickle conversion now requires `--trust-remote-code`. (#1704) · c7e570e5
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      c7e570e5
  14. 04 Apr, 2024 1 commit
    • Nicolas Patry's avatar
      Add cuda graphs sizes and make it default. (#1703) · 99874eae
      Nicolas Patry authored
      # What does this PR do?
      
      ```
      text-generation-launcher --model-id XXX # Uses cuda graphs by default
      text-generation-launcher --model-id XXX --cuda-graphs "1,2"  #Restrict the number of cuda graphs which saves VRAM
      text-generation-launcher --model-id XXX --cuda-graphs "0"  # Disabling it entirely
      ```
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      99874eae
  15. 29 Mar, 2024 1 commit