"examples/trials/mnist-nas/darts_mode/config_darts.yml" did not exist on "2a28a578fc373ecdbc08431b2d3ab4e1d6655146"
  1. 17 Dec, 2024 1 commit
    • Jesse Gross's avatar
      llama: Ensure KV cache is fully defragmented. · 08a832b4
      Jesse Gross authored
      Sometimes the KV cache requires defragmentation even without
      triggering the threshold heuristic. In this case, decoding
      will not being able to find a KV cache slot. This is particularly
      difficult for the caller to handle if it happens in between
      ubatches. To avoid this, we should immediately trigger a defrag.
      
      In addition, a heavily fragmented cache can require more than
      max_moves to defragment. Currently, we stop when we hit the limit
      but this can leave a cache that still does not have adequate space
      even after defragmentation is triggered. Instead, we should do
      multiple batches of processing until everything is complete.
      
      Fixes #7949
      08a832b4