1. 08 Aug, 2022 3 commits
  2. 06 Aug, 2022 2 commits
  3. 05 Aug, 2022 13 commits
  4. 04 Aug, 2022 10 commits
  5. 03 Aug, 2022 10 commits
  6. 02 Aug, 2022 2 commits
    • Christopher Akiki's avatar
      Add programming languages (#18434) · 5096a654
      Christopher Akiki authored
      The current wording makes it sound as if the programming languages are part of the 46 natural languages.
      5096a654
    • David's avatar
      Update pipeline word heuristic to work with whitespace in token offsets (#18402) · 042f4203
      David authored
      * Update pipeline word heuristic to work with whitespace in token offsets
      
      This change checks for whitespace in the input string at either the
      character preceding the token or in the first character of the token.
      This works with tokenizers that return offsets excluding whitespace
      between words or with offsets including whitespace.
      
      fixes #18111
      
      starting
      
      * Use smaller model, ensure expected tokenization
      
      * Re-run CI (please squash)
      042f4203