Unverified Commit 8896ebb9 authored by Sherman Siu's avatar Sherman Siu Committed by GitHub
Browse files

Clarify and add missing typical_p argument docstring. (#21095)



* Clarify and add missing typical_p docstring.

* Make the docstring easier to understand.

* Clarify typical_p docstring

Accept the suggestion by @stevhliu for paraphrasing the docstring.
Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>

* Use the same docstring as in GenerationConfig

Follow the suggestion suggested by @stevhliu in the pull request conversation.

* Fix docstring spacing.
Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
parent f30bcd53
......@@ -144,6 +144,12 @@ class PretrainedConfig(PushToHubMixin):
top_p (`float`, *optional*, defaults to 1):
Value that will be used by default in the `generate` method of the model for `top_p`. If set to float < 1,
only the most probable tokens with probabilities that add up to `top_p` or higher are kept for generation.
typical_p (`float`, *optional*, defaults to 1):
Local typicality measures how similar the conditional probability of predicting a target token next is to
the expected conditional probability of predicting a random token next, given the partial text already
generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that
add up to `typical_p` or higher are kept for generation. See [this
paper](https://arxiv.org/pdf/2202.00666.pdf) for more details.
repetition_penalty (`float`, *optional*, defaults to 1):
Parameter for repetition penalty that will be used by default in the `generate` method of the model. 1.0
means no penalty.
......
......@@ -111,8 +111,11 @@ class GenerationConfig(PushToHubMixin):
If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to
`top_p` or higher are kept for generation.
typical_p (`float`, *optional*, defaults to 1.0):
The amount of probability mass from the original distribution to be considered in typical decoding. If set
to 1.0 it takes no effect. See [this paper](https://arxiv.org/pdf/2202.00666.pdf) for more details.
Local typicality measures how similar the conditional probability of predicting a target token next is to
the expected conditional probability of predicting a random token next, given the partial text already
generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that
add up to `typical_p` or higher are kept for generation. See [this
paper](https://arxiv.org/pdf/2202.00666.pdf) for more details.
diversity_penalty (`float`, *optional*, defaults to 0.0):
This value is subtracted from a beam's score if it generates a token same as any beam from other group at a
particular time. Note that `diversity_penalty` is only effective if `group beam search` is enabled.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment