['The theory of special relativity states 1. The speed of light is constant in all inertial reference']
['The theory of special relativity states 1. The speed of light is constant in all inertial reference']
```
```
</hfoption>
Under the hood, `generate` will attempt to reuse the same cache object, removing the need for re-compilation at each call. However, if the batch size or the maximum output length increase between calls, the cache will have to be reinitialized, triggering a new compilation.
<hfoptionid="setup_cache">
> [!WARNING]
</hfoption>
> The `_setup_cache` method is an internal and private method that is still under development. This means it may not be backward compatible and the API design may change in the future.
<hfoptionid="Static Cache">
The `_setup_cache` method doesn't support [`~GenerationMixin.generate`] yet, so this method is a bit more involved. You'll need to write your own function to decode the next token given the current token and position and cache position of previously generated tokens.
A [`StaticCache`] object can be passed to the model's forward pass under the `past_key_values` argument, enabling the use of this object as a static kv-cache. Using this strategy, you can write your own function to decode the next token given the current token and position and cache position of previously generated tokens. You can also pass the [`StaticCache`] object to [`~GenerationMixin.generate`] and use it across calls, like you would do with a dynamic cache.
There are a few important things you must do to enable static kv-cache and torch.compile with the `_setup_cache` method:
There are a few important things you must do to enable static kv-cache and torch.compile with the `StaticCache` method:
1.Access the model's `_setup_cache` method and pass it the [`StaticCache`] class. This is a more flexible method because it allows you to configure parameters like the maximum batch size and sequence length.
1.Initialize the [`StaticCache`] instance before using the model for inference. There you can configure parameters like the maximum batch size and sequence length.
2. Call torch.compile on the model to compile the forward pass with the static kv-cache.
2. Call torch.compile on the model to compile the forward pass with the static kv-cache.
...
@@ -109,24 +113,28 @@ There are a few important things you must do to enable static kv-cache and torch
...
@@ -109,24 +113,28 @@ There are a few important things you must do to enable static kv-cache and torch
'My favorite all time favorite condiment is ketchup. I love it on everything. I love it on my eggs, my fries, my chicken, my burgers, my hot dogs, my sandwiches, my salads, my p']
'My favorite all time favorite condiment is ketchup. I love it on everything. I love it on my eggs, my fries, my chicken, my burgers, my hot dogs, my sandwiches, my salads, my p']
```
```
> [!TIP]
> If you want to reuse the [`StaticCache`] object on a new prompt, be sure to reset its contents with the `.reset()` method
"Using `past_key_values` argument with `generate()` when using a static KV cache is not supported. Please open an issue in Transformers GitHub repository."
self.skipTest("This test requires torch >= 2.3 to run.")
NUM_TOKENS_TO_GENERATE=40
NUM_TOKENS_TO_GENERATE=40
# Note on `EXPECTED_TEXT_COMPLETION`'s diff: the current value matches the original test if the original test
# was changed to have a cache of 53 tokens (as opposed to 4096), on Ampere GPUs.
EXPECTED_TEXT_COMPLETION={
EXPECTED_TEXT_COMPLETION={
7:[
"Simply put, the theory of relativity states that 1) the speed of light is constant, 2) the speed of light is the same for all observers, and 3) the laws of physics are the same for all observers.",
"My favorite all time favorite condiment is ketchup. I love it on everything. I love it on my eggs, my fries, my chicken, my burgers, my hot dogs, my sandwiches, my salads, my p",
],
8:[
8:[
"Simply put, the theory of relativity states that 1) the speed of light is the same for all observers, and 2) the laws of physics are the same for all observers.\nThe first part of the theory of relativity",
"Simply put, the theory of relativity states that 1) the speed of light is constant in all inertial "
"My favorite all time favorite condiment is ketchup. I love it on everything. I love it on my eggs, my fries, my chicken, my burgers, my hot dogs, my sandwiches, my salads, my p",
"reference frames, and 2) the laws of physics are the same for all inertial reference frames.\nThe "
"theory of relativ",
"My favorite all time favorite condiment is ketchup. I love it on everything. I love it on my eggs, "
"my fries, my chicken, my burgers, my hot dogs, my sandwiches, my salads, my p",
],
7:[
"Simply put, the theory of relativity states that 1. surely nothing is faster than light.\nThe theory "
"goes that nothing travels faster than light, but the faster you go, the slower everything else will "
"be.\nThe theory of relativity",
"My favorite all time favorite condiment is ketchup. I love it on hamburgers, hot dogs, fries, eggs, "
"and even on a good old fashioned cheeseburger. I love it on everything. I love it so",
],
],
}
}
...
@@ -706,38 +719,25 @@ class LlamaIntegrationTest(unittest.TestCase):
...
@@ -706,38 +719,25 @@ class LlamaIntegrationTest(unittest.TestCase):