1. 30 Apr, 2024 1 commit
  2. 23 Apr, 2024 1 commit
    • drbh's avatar
      fix: avoid frequency and repetition penalty on padding tokens (#1765) · 23d82b8f
      drbh authored
      This PR resolves an issue with the penalty processors during batched
      generation where extra padding tokens incorrectly impact the penalty
      scores.
      
      generation is impacted in the case where at least one item in the batch
      includes a `frequency_penalty`
      
      reproduction script below
      ```python
      import requests
      from concurrent import futures
      import time
      
      headers = {
          "Content-Type": "application/json",
      }
      
      json_data = {
          "inputs": "[INST] Whats the capitol of France? [/INST]",
          "parameters": {
              "max_new_tokens": 100,
              "seed": 20,
              "do_sample": False,
          },
      }
      
      
      json_data2 = {
          "inputs": "<s>[INST]Write a mind bending story: I saw a puppy a cat a rat and a raccoon during my bike ride in the park[/INST]",
          "parameters": {
              "max_new_tokens": 100,
              "seed": 2,
              "do_sample": False,
              # OFFENDING LINE
              "frequency_penalty": 1.05,
          },
      }
      
      base_url = "http://localhost:3000/generate
      
      "
      
      
      def req():
          response = requests.post(base_url, headers=headers, json=json_data)
          print("[req ]", response.json())
      
      
      def req2():
          response = requests.post(base_url, headers=headers, json=json_data2)
          print("[req2]", response.json())
      
      
      n = 1
      
      for i in range(0, 3):
          print(f"- {n} threads -")
          with futures.ThreadPoolExecutor(max_workers=n) as executor:
              executor.submit(req)
              for i in range(3):
                  executor.submit(req2)
      
          n += 1
      
      # - 1 threads -
      # [req ] {'generated_text': ' The capital of France is Paris.'}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      # - 2 threads -
      # [req ] {'generated_text': ' The capital city'}
      # [req2] {'generated_text': ' As""%\n================'}
      # [req2] {'generated_text': ' As""%%$\n================'}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      
      # output with this PR's changes:
      # - 1 threads -
      # [req ] {'generated_text': ' The capital of France is Paris.'}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      # - 2 threads -
      # [req ] {'generated_text': ' The capital city'}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      # [req2] {'generated_text': " As you were riding your bicycle through Central Park, enjoying some fresh air on an otherwise gloomy day. You couldn't help but notice that it was eerily quiet for this time of year - usually there would be hordes"}
      
      ```
      
      **divergence from expected generation is easier to reproduce with
      batched grammar requests as they are more sensitive to unexpected
      outputs.
      
      this PR resolves the issue by setting the penalty score to 0 where input
      ids are padding tokens (0).
      
      ---------
      Co-authored-by: default avatarOlivierDehaene <olivier@huggingface.co>
      23d82b8f
  3. 28 Mar, 2024 1 commit
  4. 21 Mar, 2024 1 commit
    • drbh's avatar
      fix: improve tool type, bump pydantic and outlines (#1650) · de6cb15f
      drbh authored
      This PR resolves a couple 
      
      - [X] adjusts the tool response to align with openai's tools response
      type
      - [X] bumps pydantic to `2.6.4` in all apps (resolves dependency issue
      when running tests)
      - [X] bump `outlines` version and fix import for new name
      de6cb15f
  5. 01 Mar, 2024 1 commit
  6. 16 Feb, 2024 2 commits
  7. 15 Feb, 2024 1 commit
    • drbh's avatar
      Outlines guided generation (#1539) · cef0553d
      drbh authored
      This WIP PR starts to add grammar support via outlines, currently this
      PR supports very simple regex grammars and does not optimize for
      precompiling or caching grammar fsm's.
      
      todo:
      - [X] add simple outlines guidance to `NextTokenChooser`
      - [X] update protos for grammar
      - [X] update generation params API
      - [X] constrain simple grammar
      - [ ] support parsing more complex grammar into fsm
      - [ ] support all outline support grammar types
      - [ ] explore optimizations to avoid recompiling grammars
      
      guided request
      ```bash
      curl -s 'http://localhost:3000/generate' \
      --header 'Content-Type: application/json' \
      --data-raw '{
          "inputs": "make an email for david: \n",
          "parameters": {
              "max_new_tokens": 6,
              "grammar": "[\\w-]+@([\\w-]+\\.)+[\\w-]+"
          }
      }' | jq
      ```
      response
      ```json
      {
        "generated_text": "david@example.com"
      }
      ```
      
      unguided request
      ```bash
      curl -s 'http://localhost:3000/generate' \
      --header 'Content-Type: application/json' \
      --data '{
          "inputs": "make an email for david: \n",
          "parameters": {
              "max_new_tokens": 6
          }
      }' | jq
      ```
      response
      ```json
      {
        "generated_text": "    email = 'david"
      }
      ```
      cef0553d
  8. 08 Feb, 2024 1 commit
  9. 04 Jul, 2023 1 commit
  10. 20 Jun, 2023 1 commit
  11. 26 May, 2023 1 commit