Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
5b1ad0eb
"...git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "ae32f3afefcd3288df0af47d8499ae6024c66612"
Unverified
Commit
5b1ad0eb
authored
May 16, 2023
by
Joao Gante
Committed by
GitHub
May 16, 2023
Browse files
Docs: add link to assisted generation blog post (#23397)
parent
bbbc5c15
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
2 additions
and
5 deletions
+2
-5
docs/source/en/generation_strategies.mdx
docs/source/en/generation_strategies.mdx
+2
-5
No files found.
docs/source/en/generation_strategies.mdx
View file @
5b1ad0eb
...
@@ -338,9 +338,8 @@ For the complete list of the available parameters, refer to the [API documentati
...
@@ -338,9 +338,8 @@ For the complete list of the available parameters, refer to the [API documentati
Assisted decoding is a modification of the decoding strategies above that uses an assistant model with the same
Assisted decoding is a modification of the decoding strategies above that uses an assistant model with the same
tokenizer (ideally a much smaller model) to greedily generate a few candidate tokens. The main model then validates
tokenizer (ideally a much smaller model) to greedily generate a few candidate tokens. The main model then validates
the candidate tokens in a single forward pass, which speeds up the decoding process. Currently, only greedy search
the candidate tokens in a single forward pass, which speeds up the decoding process. Currently, only greedy search
and sampling are supported with assisted decoding, and doesn't support batched inputs.
and sampling are supported with assisted decoding, and doesn't support batched inputs. To learn more about assisted
decoding, check [this blog post](https://huggingface.co/blog/assisted-generation).
<!-- TODO: add link to the blog post about assisted decoding when it exists -->
To enable assisted decoding, set the `assistant_model` argument with a model.
To enable assisted decoding, set the `assistant_model` argument with a model.
...
@@ -364,8 +363,6 @@ To enable assisted decoding, set the `assistant_model` argument with a model.
...
@@ -364,8 +363,6 @@ To enable assisted decoding, set the `assistant_model` argument with a model.
When using assisted decoding with sampling methods, you can use the `temperarure` argument to control the randomness
When using assisted decoding with sampling methods, you can use the `temperarure` argument to control the randomness
just like in multinomial sampling. However, in assisted decoding, reducing the temperature will help improving latency.
just like in multinomial sampling. However, in assisted decoding, reducing the temperature will help improving latency.
<!-- TODO: link the blog post again to explain why the tradeoff exists -->
```python
```python
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment