Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
5af3a1aa
Unverified
Commit
5af3a1aa
authored
Jun 09, 2023
by
Arthur
Committed by
GitHub
Jun 09, 2023
Browse files
[lamaTokenizerFast] Update documentation (#24132)
* Update documentation * nits
parent
62fe7533
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
10 additions
and
0 deletions
+10
-0
docs/source/en/model_doc/llama.mdx
docs/source/en/model_doc/llama.mdx
+1
-0
src/transformers/models/llama/tokenization_llama_fast.py
src/transformers/models/llama/tokenization_llama_fast.py
+9
-0
No files found.
docs/source/en/model_doc/llama.mdx
View file @
5af3a1aa
...
@@ -65,6 +65,7 @@ This model was contributed by [zphang](https://huggingface.co/zphang) with contr
...
@@ -65,6 +65,7 @@ This model was contributed by [zphang](https://huggingface.co/zphang) with contr
- build_inputs_with_special_tokens
- build_inputs_with_special_tokens
- get_special_tokens_mask
- get_special_tokens_mask
- create_token_type_ids_from_sequences
- create_token_type_ids_from_sequences
- update_post_processor
- save_vocabulary
- save_vocabulary
## LlamaModel
## LlamaModel
...
...
src/transformers/models/llama/tokenization_llama_fast.py
View file @
5af3a1aa
...
@@ -48,6 +48,12 @@ class LlamaTokenizerFast(PreTrainedTokenizerFast):
...
@@ -48,6 +48,12 @@ class LlamaTokenizerFast(PreTrainedTokenizerFast):
>>> [1, 15043, 445, 338, 263, 1243]
>>> [1, 15043, 445, 338, 263, 1243]
```
```
If you want to change the `bos_token` or the `eos_token`, make sure to specify them when initializing the model, or
call `tokenizer.update_post_processor()` to make sure that the post-processing is correctly done (otherwise the
values of the first token and final token of an encoded sequence will not be correct). For more details, checkout
[post-processors] (https://huggingface.co/docs/tokenizers/api/post-processors) documentation.
This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should
This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should
refer to this superclass for more information regarding those methods.
refer to this superclass for more information regarding those methods.
...
@@ -108,6 +114,9 @@ class LlamaTokenizerFast(PreTrainedTokenizerFast):
...
@@ -108,6 +114,9 @@ class LlamaTokenizerFast(PreTrainedTokenizerFast):
self
.
can_save_slow_tokenizer
=
False
if
not
self
.
vocab_file
else
True
self
.
can_save_slow_tokenizer
=
False
if
not
self
.
vocab_file
else
True
def
update_post_processor
(
self
):
def
update_post_processor
(
self
):
"""
Updates the underlying post processor with the current `bos_token` and `eos_token`.
"""
bos
=
self
.
bos_token
bos
=
self
.
bos_token
bos_token_id
=
self
.
bos_token_id
bos_token_id
=
self
.
bos_token_id
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment