The first call to `xla_generate()` is time-consuming because of tracing, but the successive calls are orders of magnitude faster. Keep in mind that any change in the generation options at any point with trigger re-tracing and thus leading to slow-downs in the generation time.
The first call to `xla_generate()` is time-consuming because of tracing, but the successive calls are orders of magnitude faster. Keep in mind that any change in the generation options at any point will trigger re-tracing and thus leading to slow-downs in the generation time.
We didn’t cover all the text generation options 🤗 Transformers provides in this document. We encourage you to read the documentation for advanced use cases.
...
...
@@ -171,4 +171,4 @@ Here, we leave you with some additional resources if you want to delve deeper in
* Recommended posts for learning more about XLA and TensorFlow graphs in general:
*[XLA: Optimizing Compiler for Machine Learning](https://www.tensorflow.org/xla)
*[Introduction to graphs and tf.function](https://www.tensorflow.org/guide/intro_to_graphs)
*[Better performance with tf.function](https://www.tensorflow.org/guide/function)
\ No newline at end of file
*[Better performance with tf.function](https://www.tensorflow.org/guide/function)