"...git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "48ff6d5109d691e3630169962a5052586aaaf659"
Unverified Commit a27195b1 authored by flozi00's avatar flozi00 Committed by GitHub
Browse files

Update longt5.mdx (#18634)

parent fd9aa82b
...@@ -37,7 +37,7 @@ Tips: ...@@ -37,7 +37,7 @@ Tips:
- [`LongT5ForConditionalGeneration`] is an extension of [`T5ForConditionalGeneration`] exchanging the traditional - [`LongT5ForConditionalGeneration`] is an extension of [`T5ForConditionalGeneration`] exchanging the traditional
encoder *self-attention* layer with efficient either *local* attention or *transient-global* (*tglobal*) attention. encoder *self-attention* layer with efficient either *local* attention or *transient-global* (*tglobal*) attention.
- Unlike the T5 model, LongT5 does not use a task prefix. Furthermore, it uses a different pre-training objective - Unlike the T5 model, LongT5 does not use a task prefix. Furthermore, it uses a different pre-training objective
inspired by the pre-training of `[PegasusForConditionalGeneration]`. inspired by the pre-training of [`PegasusForConditionalGeneration`].
- LongT5 model is designed to work efficiently and very well on long-range *sequence-to-sequence* tasks where the - LongT5 model is designed to work efficiently and very well on long-range *sequence-to-sequence* tasks where the
input sequence exceeds commonly used 512 tokens. It is capable of handling input sequences of a length up to 16,384 tokens. input sequence exceeds commonly used 512 tokens. It is capable of handling input sequences of a length up to 16,384 tokens.
- For *Local Attention*, the sparse sliding-window local attention operation allows a given token to attend only `r` - For *Local Attention*, the sparse sliding-window local attention operation allows a given token to attend only `r`
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment