"git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "7351ef83c1bf0cf01a2498fffdf7df9da7bc3c7f"
Unverified Commit 3312e96b authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Doc check: a bit of clean up (#11224)

parent edca520d
<!--- Copyright 2020 The HuggingFace Team. All rights reserved. ..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
with the License. You may obtain a copy of the License at the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
the specific language governing permissions and limitations under the License. specific language governing permissions and limitations under the License.
--> Data Collator
DataCollator
----------------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------------------------------
DataCollators are objects that will form a batch by using a list of elements as input. These lists of elements are of Data collators are objects that will form a batch by using a list of dataset elements as input. These elements are of
the same type as the elements of :obj:`train_dataset` or :obj:`eval_dataset`. the same type as the elements of :obj:`train_dataset` or :obj:`eval_dataset`.
A data collator will default to :func:`transformers.data.data_collator.default_data_collator` if no `tokenizer` has To be able to build batches, data collators may apply some processing (like padding). Some of them (like
been provided. This is a function that takes a list of samples from a Dataset as input and collates them into a batch :class:`~transformers.DataCollatorForLanguageModeling`) also apply some random data augmentation (like random masking)
of a dict-like object. The default collator performs special handling of potential keys: oin the formed batch.
- ``label``: handles a single value (int or float) per object
- ``label_ids``: handles a list of values per object
This function does not perform any preprocessing. An example of use can be found in glue and ner. Examples of use can be found in the :doc:`example scripts <../examples>` or :doc:`example notebooks <../notebooks>`.
Default data collator Default data collator
...@@ -37,47 +33,39 @@ DataCollatorWithPadding ...@@ -37,47 +33,39 @@ DataCollatorWithPadding
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorWithPadding .. autoclass:: transformers.data.data_collator.DataCollatorWithPadding
:special-members: __call__
:members: :members:
DataCollatorForTokenClassification DataCollatorForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForTokenClassification .. autoclass:: transformers.data.data_collator.DataCollatorForTokenClassification
:special-members: __call__
:members: :members:
DataCollatorForSeq2Seq DataCollatorForSeq2Seq
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForSeq2Seq .. autoclass:: transformers.data.data_collator.DataCollatorForSeq2Seq
:special-members: __call__
:members: :members:
DataCollatorForLanguageModeling DataCollatorForLanguageModeling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForLanguageModeling .. autoclass:: transformers.data.data_collator.DataCollatorForLanguageModeling
:special-members: __call__
:members: mask_tokens :members: mask_tokens
DataCollatorForWholeWordMask DataCollatorForWholeWordMask
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForWholeWordMask .. autoclass:: transformers.data.data_collator.DataCollatorForWholeWordMask
:special-members: __call__
:members: mask_tokens :members: mask_tokens
DataCollatorForSOP
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForSOP
:special-members: __call__
:members: mask_tokens
DataCollatorForPermutationLanguageModeling DataCollatorForPermutationLanguageModeling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForPermutationLanguageModeling .. autoclass:: transformers.data.data_collator.DataCollatorForPermutationLanguageModeling
:special-members: __call__
:members: mask_tokens :members: mask_tokens
...@@ -348,6 +348,8 @@ def find_all_documented_objects(): ...@@ -348,6 +348,8 @@ def find_all_documented_objects():
DEPRECATED_OBJECTS = [ DEPRECATED_OBJECTS = [
"AutoModelWithLMHead", "AutoModelWithLMHead",
"BartPretrainedModel", "BartPretrainedModel",
"DataCollator",
"DataCollatorForSOP",
"GlueDataset", "GlueDataset",
"GlueDataTrainingArguments", "GlueDataTrainingArguments",
"LineByLineTextDataset", "LineByLineTextDataset",
...@@ -385,7 +387,9 @@ DEPRECATED_OBJECTS = [ ...@@ -385,7 +387,9 @@ DEPRECATED_OBJECTS = [
UNDOCUMENTED_OBJECTS = [ UNDOCUMENTED_OBJECTS = [
"AddedToken", # This is a tokenizers class. "AddedToken", # This is a tokenizers class.
"BasicTokenizer", # Internal, should never have been in the main init. "BasicTokenizer", # Internal, should never have been in the main init.
"CharacterTokenizer", # Internal, should never have been in the main init.
"DPRPretrainedReader", # Like an Encoder. "DPRPretrainedReader", # Like an Encoder.
"MecabTokenizer", # Internal, should never have been in the main init.
"ModelCard", # Internal type. "ModelCard", # Internal type.
"SqueezeBertModule", # Internal building block (should have been called SqueezeBertLayer) "SqueezeBertModule", # Internal building block (should have been called SqueezeBertLayer)
"TFDPRPretrainedReader", # Like an Encoder. "TFDPRPretrainedReader", # Like an Encoder.
...@@ -403,10 +407,6 @@ UNDOCUMENTED_OBJECTS = [ ...@@ -403,10 +407,6 @@ UNDOCUMENTED_OBJECTS = [
# This list should be empty. Objects in it should get their own doc page. # This list should be empty. Objects in it should get their own doc page.
SHOULD_HAVE_THEIR_OWN_PAGE = [ SHOULD_HAVE_THEIR_OWN_PAGE = [
# bert-japanese
"BertJapaneseTokenizer",
"CharacterTokenizer",
"MecabTokenizer",
# Benchmarks # Benchmarks
"PyTorchBenchmark", "PyTorchBenchmark",
"PyTorchBenchmarkArguments", "PyTorchBenchmarkArguments",
...@@ -448,11 +448,6 @@ def ignore_undocumented(name): ...@@ -448,11 +448,6 @@ def ignore_undocumented(name):
# MMBT model does not really work. # MMBT model does not really work.
if name.startswith("MMBT"): if name.startswith("MMBT"):
return True return True
# NOT DOCUMENTED BUT NOT ON PURPOSE, SHOULD BE FIXED!
# All data collators should be documented
if name.startswith("DataCollator") or name.endswith("data_collator"):
return True
if name in SHOULD_HAVE_THEIR_OWN_PAGE: if name in SHOULD_HAVE_THEIR_OWN_PAGE:
return True return True
return False return False
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment