Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
3312e96b
"git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "eb3e072a3b24806c72e35e9246e1cf972de1c77f"
Unverified
Commit
3312e96b
authored
Apr 13, 2021
by
Sylvain Gugger
Committed by
GitHub
Apr 13, 2021
Browse files
Doc check: a bit of clean up (#11224)
parent
edca520d
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
22 additions
and
39 deletions
+22
-39
docs/source/main_classes/data_collator.rst
docs/source/main_classes/data_collator.rst
+18
-30
utils/check_repo.py
utils/check_repo.py
+4
-9
No files found.
docs/source/main_classes/data_collator.rst
View file @
3312e96b
<!--- Copyright 2020 The HuggingFace Team. All rights reserved.
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance
with
with
the License. You may obtain a copy of the License at
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed
on
on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for
the
the
specific language governing permissions and limitations under the License.
specific language governing permissions and limitations under the License.
-->
Data Collator
DataCollator
-----------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Data
C
ollators are objects that will form a batch by using a list of elements as input. These
lists of
elements are of
Data
c
ollators are objects that will form a batch by using a list of
dataset
elements as input. These elements are of
the same type as the elements of :obj:`train_dataset` or :obj:`eval_dataset`.
the same type as the elements of :obj:`train_dataset` or :obj:`eval_dataset`.
A data collator will default to :func:`transformers.data.data_collator.default_data_collator` if no `tokenizer` has
To be able to build batches, data collators may apply some processing (like padding). Some of them (like
been provided. This is a function that takes a list of samples from a Dataset as input and collates them into a batch
:class:`~transformers.DataCollatorForLanguageModeling`) also apply some random data augmentation (like random masking)
of a dict-like object. The default collator performs special handling of potential keys:
oin the formed batch.
- ``label``: handles a single value (int or float) per object
- ``label_ids``: handles a list of values per object
This function does not perform any preprocessing. An example of use can be found in glue and ner
.
Examples of use can be found in the :doc:`example scripts <../examples>` or :doc:`example notebooks <../notebooks>`
.
Default data collator
Default data collator
...
@@ -37,47 +33,39 @@ DataCollatorWithPadding
...
@@ -37,47 +33,39 @@ DataCollatorWithPadding
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorWithPadding
.. autoclass:: transformers.data.data_collator.DataCollatorWithPadding
:special-members: __call__
:members:
:members:
DataCollatorForTokenClassification
DataCollatorForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForTokenClassification
.. autoclass:: transformers.data.data_collator.DataCollatorForTokenClassification
:special-members: __call__
:members:
:members:
DataCollatorForSeq2Seq
DataCollatorForSeq2Seq
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForSeq2Seq
.. autoclass:: transformers.data.data_collator.DataCollatorForSeq2Seq
:special-members: __call__
:members:
:members:
DataCollatorForLanguageModeling
DataCollatorForLanguageModeling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForLanguageModeling
.. autoclass:: transformers.data.data_collator.DataCollatorForLanguageModeling
:special-members: __call__
:members: mask_tokens
:members: mask_tokens
DataCollatorForWholeWordMask
DataCollatorForWholeWordMask
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForWholeWordMask
.. autoclass:: transformers.data.data_collator.DataCollatorForWholeWordMask
:special-members: __call__
:members: mask_tokens
:members: mask_tokens
DataCollatorForSOP
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForSOP
:special-members: __call__
:members: mask_tokens
DataCollatorForPermutationLanguageModeling
DataCollatorForPermutationLanguageModeling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForPermutationLanguageModeling
.. autoclass:: transformers.data.data_collator.DataCollatorForPermutationLanguageModeling
:special-members: __call__
:members: mask_tokens
:members: mask_tokens
utils/check_repo.py
View file @
3312e96b
...
@@ -348,6 +348,8 @@ def find_all_documented_objects():
...
@@ -348,6 +348,8 @@ def find_all_documented_objects():
DEPRECATED_OBJECTS
=
[
DEPRECATED_OBJECTS
=
[
"AutoModelWithLMHead"
,
"AutoModelWithLMHead"
,
"BartPretrainedModel"
,
"BartPretrainedModel"
,
"DataCollator"
,
"DataCollatorForSOP"
,
"GlueDataset"
,
"GlueDataset"
,
"GlueDataTrainingArguments"
,
"GlueDataTrainingArguments"
,
"LineByLineTextDataset"
,
"LineByLineTextDataset"
,
...
@@ -385,7 +387,9 @@ DEPRECATED_OBJECTS = [
...
@@ -385,7 +387,9 @@ DEPRECATED_OBJECTS = [
UNDOCUMENTED_OBJECTS
=
[
UNDOCUMENTED_OBJECTS
=
[
"AddedToken"
,
# This is a tokenizers class.
"AddedToken"
,
# This is a tokenizers class.
"BasicTokenizer"
,
# Internal, should never have been in the main init.
"BasicTokenizer"
,
# Internal, should never have been in the main init.
"CharacterTokenizer"
,
# Internal, should never have been in the main init.
"DPRPretrainedReader"
,
# Like an Encoder.
"DPRPretrainedReader"
,
# Like an Encoder.
"MecabTokenizer"
,
# Internal, should never have been in the main init.
"ModelCard"
,
# Internal type.
"ModelCard"
,
# Internal type.
"SqueezeBertModule"
,
# Internal building block (should have been called SqueezeBertLayer)
"SqueezeBertModule"
,
# Internal building block (should have been called SqueezeBertLayer)
"TFDPRPretrainedReader"
,
# Like an Encoder.
"TFDPRPretrainedReader"
,
# Like an Encoder.
...
@@ -403,10 +407,6 @@ UNDOCUMENTED_OBJECTS = [
...
@@ -403,10 +407,6 @@ UNDOCUMENTED_OBJECTS = [
# This list should be empty. Objects in it should get their own doc page.
# This list should be empty. Objects in it should get their own doc page.
SHOULD_HAVE_THEIR_OWN_PAGE
=
[
SHOULD_HAVE_THEIR_OWN_PAGE
=
[
# bert-japanese
"BertJapaneseTokenizer"
,
"CharacterTokenizer"
,
"MecabTokenizer"
,
# Benchmarks
# Benchmarks
"PyTorchBenchmark"
,
"PyTorchBenchmark"
,
"PyTorchBenchmarkArguments"
,
"PyTorchBenchmarkArguments"
,
...
@@ -448,11 +448,6 @@ def ignore_undocumented(name):
...
@@ -448,11 +448,6 @@ def ignore_undocumented(name):
# MMBT model does not really work.
# MMBT model does not really work.
if
name
.
startswith
(
"MMBT"
):
if
name
.
startswith
(
"MMBT"
):
return
True
return
True
# NOT DOCUMENTED BUT NOT ON PURPOSE, SHOULD BE FIXED!
# All data collators should be documented
if
name
.
startswith
(
"DataCollator"
)
or
name
.
endswith
(
"data_collator"
):
return
True
if
name
in
SHOULD_HAVE_THEIR_OWN_PAGE
:
if
name
in
SHOULD_HAVE_THEIR_OWN_PAGE
:
return
True
return
True
return
False
return
False
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment