Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
1fbaa3c1
"git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "e36bd94345af6045108a391f9ac7f4dc557548de"
Unverified
Commit
1fbaa3c1
authored
Feb 09, 2021
by
Anthony MOI
Committed by
GitHub
Feb 09, 2021
Browse files
Fix tokenizers training in notebook (#10110)
parent
85395e49
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 addition
and
1 deletion
+1
-1
notebooks/01-training-tokenizers.ipynb
notebooks/01-training-tokenizers.ipynb
+1
-1
No files found.
notebooks/01-training-tokenizers.ipynb
View file @
1fbaa3c1
...
@@ -229,7 +229,7 @@
...
@@ -229,7 +229,7 @@
"\n",
"\n",
"# We initialize our trainer, giving him the details about the vocabulary we want to generate\n",
"# We initialize our trainer, giving him the details about the vocabulary we want to generate\n",
"trainer = BpeTrainer(vocab_size=25000, show_progress=True, initial_alphabet=ByteLevel.alphabet())\n",
"trainer = BpeTrainer(vocab_size=25000, show_progress=True, initial_alphabet=ByteLevel.alphabet())\n",
"tokenizer.train(
trainer,
[\"big.txt\"])\n",
"tokenizer.train(
files=
[\"big.txt\"]
, trainer=trainer
)\n",
"\n",
"\n",
"print(\"Trained vocab size: {}\".format(tokenizer.get_vocab_size()))"
"print(\"Trained vocab size: {}\".format(tokenizer.get_vocab_size()))"
]
]
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment