Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
1f6885ba
"git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "d522afea1324b8156c929f3896df14762c9ea716"
Unverified
Commit
1f6885ba
authored
Nov 01, 2022
by
Steven Liu
Committed by
GitHub
Nov 01, 2022
Browse files
add dataset (#20005)
parent
4f1e5e4e
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
17 additions
and
6 deletions
+17
-6
docs/source/en/quicktour.mdx
docs/source/en/quicktour.mdx
+17
-6
No files found.
docs/source/en/quicktour.mdx
View file @
1f6885ba
...
@@ -432,19 +432,30 @@ Depending on your task, you'll typically pass the following parameters to [`Trai
...
@@ -432,19 +432,30 @@ Depending on your task, you'll typically pass the following parameters to [`Trai
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
```
```
4.
Your preprocessed train and test
dataset
s
:
4.
Load a
dataset:
```py
```py
>>> train_dataset = dataset["train"] # doctest: +SKIP
>>> from datasets import load_dataset
>>> eval_dataset = dataset["eval"] # doctest: +SKIP
>>> dataset = load_dataset("rottten_tomatoes")
```
5. Create a function to tokenize the dataset, and apply it over the entire dataset with [`~datasets.Dataset.map`]:
```py
>>> def tokenize_dataset(dataset):
... return tokenizer(dataset["text"])
>>> dataset = dataset.map(tokenize_dataset, batched=True)
```
```
5
. A [`DataCollator`] to create a batch of examples from your dataset:
6
. A [`DataCollator
WithPadding
`] to create a batch of examples from your dataset:
```py
```py
>>> from transformers import
Default
DataCollator
>>> from transformers import DataCollator
WithPadding
>>> data_collator =
Default
DataCollator
(
)
>>> data_collator = DataCollator
WithPadding(tokenizer=tokenizer
)
```
```
Now gather all these classes in [`Trainer`]:
Now gather all these classes in [`Trainer`]:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment