1. 05 Jun, 2024 12 commits
  2. 04 Jun, 2024 14 commits
  3. 03 Jun, 2024 14 commits
    • Aaron Jimenez's avatar
      [docs] Spanish translation of tokenizer_summary.md (#31154) · c73ee133
      Aaron Jimenez authored
      * add tokenizer_summary to es/_toctree.yml
      
      * add tokenizer_summary to es/
      
      * fix link to Transformes XL in en/
      
      * translate until Subword tokenization section
      
      * fix GPT link in en/
      
      * fix other GPT link in en/
      
      * fix typo in en/
      
      * translate the doc
      
      * run make fixup
      
      * Remove .md in Transformer XL link
      
      * fix some link issues in es/
      
      * fix typo
      c73ee133
    • Yih-Dar's avatar
      Fix GPU OOM for `mistral.py::Mask4DTestHard` (#31212) · 8a1a23ae
      Yih-Dar authored
      
      
      * build
      
      * build
      
      * build
      
      * build
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      8a1a23ae
    • miivanov90's avatar
      Set greater_is_better to False if metric_for_best_model ends with "loss" (#31142) · df5abae8
      miivanov90 authored
      * update to not(endswith(loss))
      
      * ruff formatting
      df5abae8
    • Younes Belkada's avatar
      Cohere: Fix copied from (#31213) · 924c46d4
      Younes Belkada authored
      Update modeling_cohere.py
      924c46d4
    • Jade Choghari's avatar
      Wrong translation FR : Contents = Contenu (#31186) · 98dd8423
      Jade Choghari authored
      Update index.md - Contents = Contenu
      
      French typo -
      Contents = Contenu
      98dd8423
    • Qubitium's avatar
      Rename sanity_evaluation to eval_on_start (#31192) · c6c78733
      Qubitium authored
      * Rename sanity_evaluation to eval_on_start
      
      * move arg back to last
      c6c78733
    • Bojun Feng's avatar
      Fix typo in utils (#31169) · c230504b
      Bojun Feng authored
      fix typo
      c230504b
    • Sangbum Daniel Choi's avatar
      fix the get_size_with_aspect_ratio in max_size situation (#30902) · 874ac129
      Sangbum Daniel Choi authored
      
      
      * fix the get_size_with_aspect_ratio in max_size situation
      
      * make fix-up
      
      * add more general solution
      
      * consider when max_size is not defined
      
      * fix typo
      
      * fix typo
      
      * simple fix
      
      * fix error
      
      * fix if else error
      
      * fix error of size overwrite
      
      * fix yolos image processing
      
      * fix detr image processing
      
      * make
      
      * add longest related test script
      
      * Update src/transformers/models/yolos/image_processing_yolos.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * add more test
      
      * add test script about longest size
      
      * remove deprecated
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      874ac129
    • Isotr0py's avatar
      Add Qwen2 GGUF loading support (#31175) · e4628434
      Isotr0py authored
      * add qwen2 gguf support
      
      * Update docs
      
      * fix qwen2 tokenizer
      
      * add qwen2 gguf test
      
      * fix typo in qwen2 gguf test
      
      * format code
      
      * Remove mistral, clarify the error message
      
      * format code
      
      * add typing and update docstring
      e4628434
    • Yih-Dar's avatar
      Fix `test_compile_static_cache` (#30991) · df848acc
      Yih-Dar authored
      
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      df848acc
    • NielsRogge's avatar
      🚨 [Mistral and friends] Update MLP (#31057) · 70c87138
      NielsRogge authored
      Update MLP
      70c87138
    • Joao Gante's avatar
      SlidingWindowCache: reduce differences to other Cache classes (#30970) · d475f767
      Joao Gante authored
      * tmp commit
      
      * sliding window with fewer differences
      
      * make fixup + rebase
      
      * missing overwrite
      d475f767
    • fxmarty's avatar
      Ignore non-causal mask in more cases with SDPA (#30138) · 221aaec6
      fxmarty authored
      * update non-causal mask for sdpa
      
      * add test
      
      * update docstrings
      
      * add one more test
      
      * fix cross attention bug
      
      * gentler atol/rtol
      221aaec6
    • Pavithra Devi M's avatar
      Fix Cannot convert [array()] to EagerTensor of dtype int64 (#31109) · f4f69625
      Pavithra Devi M authored
      While running the model.prepare_tf_dataset() method,
      it raises the error below:
      ```
      TypeError: Cannot convert [array([322.,   1.])] to EagerTensor of dtype int64
      ```
      
      This happens, in  "DataCollatorForSeq2Seq" function when we are try
      to convert the labels to tensors. While converting the labels to tensors,
      the labels can be in the format of list of list or list of ndarrays.
      There is no problem converting the list of list lables. There is a problem
      when the list of ndarrays are float values(like below).
      
      ```
      [array([322.,   1.])]
      ```
      
      so the exception raises while trying to convert this label to tensors using
      below code.
      
      ```
      batch["labels"] = tf.constant(batch["labels"], dtype=tf.int64)
      ```
      
      The labels are always integer values, so this got converted to float
      values in the label padding operation below.
      ```
      batch["labels"] = [
                          call(label)
                          if padding_side == "right"
                          else np.concatenate([[self.label_pad_token_id] * (max_label_length - len(label)), label])
                          for label in labels
                          ]
      ```
      Here we have 2 cases:
      1 - Concatenating an array having integer padding token value with labels.
      2 - Concatenating an empty array with labels.
      
      ----------------------------------------------------------------------------------------
      case 1: Concatenating an array having integer padding token value with labels.
      WORKS EXPECTED:
      ----------------------------------------------------------------------------------------
      ```
      label = np.array([233, 1])
      max_label_length = 4
      label_pad_token_id = -100
      np.concatenate([[label_pad_token_id] * (max_label_length - len(label)), label])
      o/p:
      array([-100, -100,  233,    1])
      ```
      
      ----------------------------------------------------------------------------------------
      Case 2: Concatenating an empty array with labels.
      GIVES THE ISSUE:
      This scenorio can happen when the label has the maximum label length -- No padding needed.
      ----------------------------------------------------------------------------------------
      ```
      label = np.array([233, 1])
      max_label_length = 2
      label_pad_token_id = -100
      np.concatenate([[label_pad_token_id] * (max_label_length - len(label)), label])
      o/p:
      array([233.,   1.])
      ```
      
      ----------------------------------------------------------------------------------------
      Solution:
      ----------------------------------------------------------------------------------------
      We need to concatenate a ndarray of dtype int with labels.
      
      AFTER FIX:
      ----------
      case 1:
      ```
      
      label = np.array([233, 1])
      max_label_length = 4
      label_pad_token_id = -100
      np.concatenate([np.array([label_pad_token_id] * (max_label_length - len(label)), dtype=np.int64),label])
      
      o/p:
      array([-100, -100,  233,    1])
      ```
      
      case 2:
      ```
      
      label = np.array([233, 1])
      max_label_length = 2
      label_pad_token_id = -100
      np.concatenate([np.array([label_pad_token_id] * (max_label_length - len(label)), dtype=np.int64),label])
      
      o/p:
      array([233,   1])
      ```
      f4f69625