Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
5b1ad0eb
"tests/models/bert/test_modeling_tf_bert.py" did not exist on "c89bdfbe720bc8f41c7dc6db5473a2cb0955f224"
Unverified
Commit
5b1ad0eb
authored
May 16, 2023
by
Joao Gante
Committed by
GitHub
May 16, 2023
Browse files
Docs: add link to assisted generation blog post (#23397)
parent
bbbc5c15
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
2 additions
and
5 deletions
+2
-5
docs/source/en/generation_strategies.mdx
docs/source/en/generation_strategies.mdx
+2
-5
No files found.
docs/source/en/generation_strategies.mdx
View file @
5b1ad0eb
...
...
@@ -338,9 +338,8 @@ For the complete list of the available parameters, refer to the [API documentati
Assisted decoding is a modification of the decoding strategies above that uses an assistant model with the same
tokenizer (ideally a much smaller model) to greedily generate a few candidate tokens. The main model then validates
the candidate tokens in a single forward pass, which speeds up the decoding process. Currently, only greedy search
and sampling are supported with assisted decoding, and doesn't support batched inputs.
<!-- TODO: add link to the blog post about assisted decoding when it exists -->
and sampling are supported with assisted decoding, and doesn't support batched inputs. To learn more about assisted
decoding, check [this blog post](https://huggingface.co/blog/assisted-generation).
To enable assisted decoding, set the `assistant_model` argument with a model.
...
...
@@ -364,8 +363,6 @@ To enable assisted decoding, set the `assistant_model` argument with a model.
When using assisted decoding with sampling methods, you can use the `temperarure` argument to control the randomness
just like in multinomial sampling. However, in assisted decoding, reducing the temperature will help improving latency.
<!-- TODO: link the blog post again to explain why the tradeoff exists -->
```python
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment