Unverified Commit c1aaa439 authored by Patrick von Platen's avatar Patrick von Platen Committed by GitHub
Browse files

[Doctests] Move doctests to new GPU & Fix bugs (#15969)



* test

* up

* up

* Empty test commit

* up

* update tests

* up

* fix some vision models

* correct

* correct docs

* Trigger notification

* finalize

* check

* correct quicktour

* Apply suggestions from code review

* improve doctests

* Trigger Build

* next try

* next try

* and again

* Output current clone information

* Output current clone information

* Correct path

* add tf round again

* revert to daily job
Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
parent f4e4ad34
...@@ -16,35 +16,43 @@ env: ...@@ -16,35 +16,43 @@ env:
OMP_NUM_THREADS: 16 OMP_NUM_THREADS: 16
MKL_NUM_THREADS: 16 MKL_NUM_THREADS: 16
PYTEST_TIMEOUT: 600 PYTEST_TIMEOUT: 600
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
jobs: jobs:
run_doctests: run_doctests:
runs-on: [self-hosted, docker-gpu-test, single-gpu] runs-on: [self-hosted, doc-tests-gpu]
container: container:
image: pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps: steps:
- name: Launcher docker - uses: actions/checkout@v2
uses: actions/checkout@v2 with:
repository: 'huggingface/transformers'
path: transformers
- name: NVIDIA-SMI - name: NVIDIA-SMI
run: | run: |
nvidia-smi nvidia-smi
- name: Install dependencies - name: GPU visibility
working-directory: transformers
run: | run: |
apt -y update && apt install -y libsndfile1-dev utils/print_env_pt.py
pip install --upgrade pip TF_CPP_MIN_LOG_LEVEL=3 python3 -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
pip install .[testing,torch-speech] TF_CPP_MIN_LOG_LEVEL=3 python3 -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
- name: Prepare files for doctests - name: Prepare files for doctests
working-directory: transformers
run: | run: |
python utils/prepare_for_doc_test.py src docs python3 utils/prepare_for_doc_test.py src docs
- name: Run doctests - name: Run doctests
working-directory: transformers
run: | run: |
pytest --doctest-modules $(cat utils/documentation_tests.txt) -sv --doctest-continue-on-failure --doctest-glob="*.mdx" python3 -m pytest --doctest-modules $(cat utils/documentation_tests.txt) -sv --doctest-continue-on-failure --doctest-glob="*.mdx"
- name: Clean files after doctests - name: Clean files after doctests
working-directory: transformers
run: | run: |
python utils/prepare_for_doc_test.py src docs --remove_new_line python3 utils/prepare_for_doc_test.py src docs --remove_new_line
...@@ -99,12 +99,13 @@ The [`pipeline`] can also iterate over an entire dataset. Start by installing th ...@@ -99,12 +99,13 @@ The [`pipeline`] can also iterate over an entire dataset. Start by installing th
pip install datasets pip install datasets
``` ```
Create a [`pipeline`] with the task you want to solve for and the model you want to use. Set the `device` parameter to `0` to place the tensors on a CUDA device: Create a [`pipeline`] with the task you want to solve for and the model you want to use.
```py ```py
>>> import torch
>>> from transformers import pipeline >>> from transformers import pipeline
>>> speech_recognizer = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0) >>> speech_recognizer = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h")
``` ```
Next, load a dataset (see the 🤗 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart.html) for more details) you'd like to iterate over. For example, let's load the [SUPERB](https://huggingface.co/datasets/superb) dataset: Next, load a dataset (see the 🤗 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart.html) for more details) you'd like to iterate over. For example, let's load the [SUPERB](https://huggingface.co/datasets/superb) dataset:
...@@ -264,10 +265,10 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725], ...@@ -264,10 +265,10 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
>>> import tensorflow as tf >>> import tensorflow as tf
>>> tf_predictions = tf.nn.softmax(tf_outputs.logits, axis=-1) >>> tf_predictions = tf.nn.softmax(tf_outputs.logits, axis=-1)
>>> print(tf_predictions) >>> print(tf.math.round(tf_predictions * 10**4) / 10**4)
tf.Tensor( tf.Tensor(
[[0.00206 0.00177 0.01155 0.21209 0.77253] [[0.0021 0.0018 0.0116 0.2121 0.7725]
[0.20842 0.18262 0.19693 0.1755 0.23652]], shape=(2, 5), dtype=float32) [0.2084 0.1826 0.1969 0.1755 0.2365]], shape=(2, 5), dtype=float32)
``` ```
<Tip> <Tip>
......
...@@ -55,7 +55,7 @@ _EXPECTED_OUTPUT_SHAPE = [1, 197, 768] ...@@ -55,7 +55,7 @@ _EXPECTED_OUTPUT_SHAPE = [1, 197, 768]
# Image classification docstring # Image classification docstring
_IMAGE_CLASS_CHECKPOINT = "microsoft/beit-base-patch16-224" _IMAGE_CLASS_CHECKPOINT = "microsoft/beit-base-patch16-224"
_IMAGE_CLASS_EXPECTED_OUTPUT = "'tabby, tabby cat'" _IMAGE_CLASS_EXPECTED_OUTPUT = "tabby, tabby cat"
BEIT_PRETRAINED_MODEL_ARCHIVE_LIST = [ BEIT_PRETRAINED_MODEL_ARCHIVE_LIST = [
"microsoft/beit-base-patch16-224", "microsoft/beit-base-patch16-224",
......
...@@ -46,7 +46,7 @@ _EXPECTED_OUTPUT_SHAPE = [1, 768, 7, 7] ...@@ -46,7 +46,7 @@ _EXPECTED_OUTPUT_SHAPE = [1, 768, 7, 7]
# Image classification docstring # Image classification docstring
_IMAGE_CLASS_CHECKPOINT = "facebook/convnext-tiny-224" _IMAGE_CLASS_CHECKPOINT = "facebook/convnext-tiny-224"
_IMAGE_CLASS_EXPECTED_OUTPUT = "'tabby, tabby cat'" _IMAGE_CLASS_EXPECTED_OUTPUT = "tabby, tabby cat"
CONVNEXT_PRETRAINED_MODEL_ARCHIVE_LIST = [ CONVNEXT_PRETRAINED_MODEL_ARCHIVE_LIST = [
"facebook/convnext-tiny-224", "facebook/convnext-tiny-224",
......
...@@ -51,7 +51,7 @@ _EXPECTED_OUTPUT_SHAPE = [1, 198, 768] ...@@ -51,7 +51,7 @@ _EXPECTED_OUTPUT_SHAPE = [1, 198, 768]
# Image classification docstring # Image classification docstring
_IMAGE_CLASS_CHECKPOINT = "facebook/deit-base-distilled-patch16-224" _IMAGE_CLASS_CHECKPOINT = "facebook/deit-base-distilled-patch16-224"
_IMAGE_CLASS_EXPECTED_OUTPUT = "'tabby, tabby cat'" _IMAGE_CLASS_EXPECTED_OUTPUT = "tabby, tabby cat"
DEIT_PRETRAINED_MODEL_ARCHIVE_LIST = [ DEIT_PRETRAINED_MODEL_ARCHIVE_LIST = [
...@@ -697,9 +697,11 @@ class DeiTForImageClassification(DeiTPreTrainedModel): ...@@ -697,9 +697,11 @@ class DeiTForImageClassification(DeiTPreTrainedModel):
```python ```python
>>> from transformers import DeiTFeatureExtractor, DeiTForImageClassification >>> from transformers import DeiTFeatureExtractor, DeiTForImageClassification
>>> import torch
>>> from PIL import Image >>> from PIL import Image
>>> import requests >>> import requests
>>> torch.manual_seed(3) # doctest: +IGNORE_RESULT
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg" >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw) >>> image = Image.open(requests.get(url, stream=True).raw)
...@@ -714,6 +716,7 @@ class DeiTForImageClassification(DeiTPreTrainedModel): ...@@ -714,6 +716,7 @@ class DeiTForImageClassification(DeiTPreTrainedModel):
>>> # model predicts one of the 1000 ImageNet classes >>> # model predicts one of the 1000 ImageNet classes
>>> predicted_class_idx = logits.argmax(-1).item() >>> predicted_class_idx = logits.argmax(-1).item()
>>> print("Predicted class:", model.config.id2label[predicted_class_idx]) >>> print("Predicted class:", model.config.id2label[predicted_class_idx])
Predicted class: maillot
```""" ```"""
return_dict = return_dict if return_dict is not None else self.config.use_return_dict return_dict = return_dict if return_dict is not None else self.config.use_return_dict
......
...@@ -44,11 +44,11 @@ _FEAT_EXTRACTOR_FOR_DOC = "PoolFormerFeatureExtractor" ...@@ -44,11 +44,11 @@ _FEAT_EXTRACTOR_FOR_DOC = "PoolFormerFeatureExtractor"
# Base docstring # Base docstring
_CHECKPOINT_FOR_DOC = "sail/poolformer_s12" _CHECKPOINT_FOR_DOC = "sail/poolformer_s12"
_EXPECTED_OUTPUT_SHAPE = [1, 197, 768] _EXPECTED_OUTPUT_SHAPE = [1, 512, 7, 7]
# Image classification docstring # Image classification docstring
_IMAGE_CLASS_CHECKPOINT = "sail/poolformer_s12" _IMAGE_CLASS_CHECKPOINT = "sail/poolformer_s12"
_IMAGE_CLASS_EXPECTED_OUTPUT = "'tabby, tabby cat'" _IMAGE_CLASS_EXPECTED_OUTPUT = "tabby, tabby cat"
POOLFORMER_PRETRAINED_MODEL_ARCHIVE_LIST = [ POOLFORMER_PRETRAINED_MODEL_ARCHIVE_LIST = [
"sail/poolformer_s12", "sail/poolformer_s12",
......
...@@ -49,7 +49,7 @@ _EXPECTED_OUTPUT_SHAPE = [1, 256, 16, 16] ...@@ -49,7 +49,7 @@ _EXPECTED_OUTPUT_SHAPE = [1, 256, 16, 16]
# Image classification docstring # Image classification docstring
_IMAGE_CLASS_CHECKPOINT = "nvidia/mit-b0" _IMAGE_CLASS_CHECKPOINT = "nvidia/mit-b0"
_IMAGE_CLASS_EXPECTED_OUTPUT = "'tabby, tabby cat'" _IMAGE_CLASS_EXPECTED_OUTPUT = "tabby, tabby cat"
SEGFORMER_PRETRAINED_MODEL_ARCHIVE_LIST = [ SEGFORMER_PRETRAINED_MODEL_ARCHIVE_LIST = [
"nvidia/segformer-b0-finetuned-ade-512-512", "nvidia/segformer-b0-finetuned-ade-512-512",
......
...@@ -1168,9 +1168,10 @@ class Speech2TextModel(Speech2TextPreTrainedModel): ...@@ -1168,9 +1168,10 @@ class Speech2TextModel(Speech2TextPreTrainedModel):
>>> model = Speech2TextModel.from_pretrained("facebook/s2t-small-librispeech-asr") >>> model = Speech2TextModel.from_pretrained("facebook/s2t-small-librispeech-asr")
>>> feature_extractor = Speech2TextFeatureExtractor.from_pretrained("facebook/s2t-small-librispeech-asr") >>> feature_extractor = Speech2TextFeatureExtractor.from_pretrained("facebook/s2t-small-librispeech-asr")
>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation") >>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
>>> input_features = feature_extractor( >>> inputs = feature_extractor(
... ds[0]["audio"]["array"], sampling_rate=ds[0]["audio"]["sampling_rate"], return_tensors="pt" ... ds[0]["audio"]["array"], sampling_rate=ds[0]["audio"]["sampling_rate"], return_tensors="pt"
>>> ).input_features ... )
>>> input_features = inputs.input_features
>>> decoder_input_ids = torch.tensor([[1, 1]]) * model.config.decoder_start_token_id >>> decoder_input_ids = torch.tensor([[1, 1]]) * model.config.decoder_start_token_id
>>> last_hidden_state = model(input_features, decoder_input_ids=decoder_input_ids).last_hidden_state >>> last_hidden_state = model(input_features, decoder_input_ids=decoder_input_ids).last_hidden_state
>>> list(last_hidden_state.shape) >>> list(last_hidden_state.shape)
...@@ -1322,9 +1323,10 @@ class Speech2TextForConditionalGeneration(Speech2TextPreTrainedModel): ...@@ -1322,9 +1323,10 @@ class Speech2TextForConditionalGeneration(Speech2TextPreTrainedModel):
>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation") >>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
>>> input_features = processor( >>> inputs = processor(
... ds[0]["audio"]["array"], sampling_rate=ds[0]["audio"]["sampling_rate"], return_tensors="pt" ... ds[0]["audio"]["array"], sampling_rate=ds[0]["audio"]["sampling_rate"], return_tensors="pt"
>>> ).input_features ... )
>>> input_features = inputs.input_features
>>> generated_ids = model.generate(inputs=input_features) >>> generated_ids = model.generate(inputs=input_features)
......
...@@ -874,24 +874,25 @@ class Speech2Text2ForCausalLM(Speech2Text2PreTrainedModel): ...@@ -874,24 +874,25 @@ class Speech2Text2ForCausalLM(Speech2Text2PreTrainedModel):
>>> encoder = Wav2Vec2Model(Wav2Vec2Config()) >>> encoder = Wav2Vec2Model(Wav2Vec2Config())
>>> decoder = Speech2Text2ForCausalLM(Speech2Text2Config()) >>> decoder = Speech2Text2ForCausalLM(Speech2Text2Config())
# init random speech2text model >>> # init random speech2text model
>>> model = SpeechEncoderDecoderModel(encoder=encoder, decoder=decoder) >>> model = SpeechEncoderDecoderModel(encoder=encoder, decoder=decoder)
>>> model.config.pad_token_id = tokenizer.pad_token_id >>> model.config.pad_token_id = tokenizer.pad_token_id
>>> model.config.decoder_start_token_id = tokenizer.bos_token_id >>> model.config.decoder_start_token_id = tokenizer.bos_token_id
# pre-process inputs and labels >>> # pre-process inputs and labels
>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation") >>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
>>> input_values = feature_extractor( >>> inputs = feature_extractor(
... ds[0]["audio"]["array"], sampling_rate=ds[0]["audio"]["sampling_rate"], return_tensors="pt" ... ds[0]["audio"]["array"], sampling_rate=ds[0]["audio"]["sampling_rate"], return_tensors="pt"
>>> ).input_values # Batch size 1 ... )
>>> input_values = inputs.input_values
>>> decoder_input_ids = tokenizer(ds[0]["text"], return_tensors="pt").input_ids >>> decoder_input_ids = tokenizer(ds[0]["text"], return_tensors="pt").input_ids
# compute loss >>> # compute loss
>>> loss = model(inputs=input_values, labels=decoder_input_ids).loss >>> loss = model(inputs=input_values, labels=decoder_input_ids).loss
# backprop loss >>> # backprop loss
>>> loss.backward() >>> loss.backward() # doctest: +IGNORE_RESULT
```""" ```"""
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
......
...@@ -48,7 +48,7 @@ _EXPECTED_OUTPUT_SHAPE = [1, 49, 768] ...@@ -48,7 +48,7 @@ _EXPECTED_OUTPUT_SHAPE = [1, 49, 768]
# Image classification docstring # Image classification docstring
_IMAGE_CLASS_CHECKPOINT = "microsoft/swin-tiny-patch4-window7-224" _IMAGE_CLASS_CHECKPOINT = "microsoft/swin-tiny-patch4-window7-224"
_IMAGE_CLASS_EXPECTED_OUTPUT = "'tabby, tabby cat'" _IMAGE_CLASS_EXPECTED_OUTPUT = "tabby, tabby cat"
SWIN_PRETRAINED_MODEL_ARCHIVE_LIST = [ SWIN_PRETRAINED_MODEL_ARCHIVE_LIST = [
......
...@@ -48,7 +48,7 @@ _EXPECTED_OUTPUT_SHAPE = [1, 197, 768] ...@@ -48,7 +48,7 @@ _EXPECTED_OUTPUT_SHAPE = [1, 197, 768]
# Image classification docstring # Image classification docstring
_IMAGE_CLASS_CHECKPOINT = "google/vit-base-patch16-224" _IMAGE_CLASS_CHECKPOINT = "google/vit-base-patch16-224"
_IMAGE_CLASS_EXPECTED_OUTPUT = "'Egyptian cat'" _IMAGE_CLASS_EXPECTED_OUTPUT = "Egyptian cat"
VIT_PRETRAINED_MODEL_ARCHIVE_LIST = [ VIT_PRETRAINED_MODEL_ARCHIVE_LIST = [
......
...@@ -1611,7 +1611,6 @@ class Wav2Vec2ForMaskedLM(Wav2Vec2PreTrainedModel): ...@@ -1611,7 +1611,6 @@ class Wav2Vec2ForMaskedLM(Wav2Vec2PreTrainedModel):
self.post_init() self.post_init()
@add_start_docstrings_to_model_forward(WAV_2_VEC_2_INPUTS_DOCSTRING) @add_start_docstrings_to_model_forward(WAV_2_VEC_2_INPUTS_DOCSTRING)
@replace_return_docstrings(output_type=Wav2Vec2BaseModelOutput, config_class=_CONFIG_FOR_DOC)
def forward( def forward(
self, self,
input_values, input_values,
...@@ -1621,40 +1620,6 @@ class Wav2Vec2ForMaskedLM(Wav2Vec2PreTrainedModel): ...@@ -1621,40 +1620,6 @@ class Wav2Vec2ForMaskedLM(Wav2Vec2PreTrainedModel):
return_dict=None, return_dict=None,
labels=None, labels=None,
): ):
r"""
labels (`torch.LongTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
TODO(PVP): Fill out when adding training
Returns:
Example:
```python
>>> from transformers import Wav2Vec2Processor, Wav2Vec2ForMaskedLM
>>> from datasets import load_dataset
>>> import soundfile as sf
>>> import torch
>>> processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-960h")
>>> model = Wav2Vec2ForMaskedLM.from_pretrained("facebook/wav2vec2-base-960h")
>>> def map_to_array(batch):
... speech, _ = sf.read(batch["file"])
... batch["speech"] = speech
... return batch
>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
>>> ds = ds.map(map_to_array)
>>> input_values = processor(ds["speech"][0], return_tensors="pt").input_values # Batch size 1
>>> logits = model(input_values).logits
>>> predicted_ids = torch.argmax(logits, dim=-1)
>>> transcription = processor.decode(predicted_ids[0])
```"""
return_dict = return_dict if return_dict is not None else self.config.use_return_dict return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.wav2vec2( outputs = self.wav2vec2(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment