Skip tests properly (#31308)

* Skip tests properly * [test_all] * Add 'reason' as kwarg for skipTest * [test_all] Fix up * [test_all]

Skip tests properly (#31308)
* Skip tests properly * [test_all] * Add 'reason' as kwarg for skipTest * [test_all] Fix up * [test_all]
1de7dc74 · amyeroberts · GitHub · 1f9f57ab · 1de7dc74 · 1de7dc74
Unverified Commit 1de7dc74 authored Jun 26, 2024 by amyeroberts Committed by GitHub Jun 26, 2024
20 changed files
--- a/docs/source/de/testing.md
+++ b/docs/source/de/testing.md
@@ -185,16 +185,16 @@ pytest -k "test and ada" tests/test_optimization.py

 Manchmal müssen Sie `accelerate` Tests für Ihre Modelle ausführen. Dazu fügen Sie einfach `-m accelerate_tests` zu Ihrem Befehl hinzu, wenn Sie diese Tests bei einem `OPT`-Lauf ausführen möchten:
 ```bash
-RUN_SLOW=1 pytest -m accelerate_tests tests/models/opt/test_modeling_opt.py 
+RUN_SLOW=1 pytest -m accelerate_tests tests/models/opt/test_modeling_opt.py
 ```


-### Dokumentationstests ausführen 
+### Dokumentationstests ausführen

-Um zu testen, ob die Dokumentationsbeispiele korrekt sind, sollten Sie überprüfen, ob die `doctests` erfolgreich sind. 
-Lassen Sie uns als Beispiel den docstring von [WhisperModel.forward](https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/modeling_whisper.py#L1017-L1035) verwenden: 
+Um zu testen, ob die Dokumentationsbeispiele korrekt sind, sollten Sie überprüfen, ob die `doctests` erfolgreich sind.
+Lassen Sie uns als Beispiel den docstring von [WhisperModel.forward](https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/modeling_whisper.py#L1017-L1035) verwenden:

-```python 
+```python
 r"""
 Returns:

@@ -217,8 +217,8 @@ Example:

 ```

-Führen Sie einfach die folgende Zeile aus, um automatisch jedes docstring-Beispiel in der gewünschten Datei zu testen: 
-```bash 
+Führen Sie einfach die folgende Zeile aus, um automatisch jedes docstring-Beispiel in der gewünschten Datei zu testen:
+```bash
 pytest --doctest-modules <path_to_file_or_dir>
 ```
 Wenn die Datei eine Markdown-Erweiterung hat, sollten Sie das Argument `--doctest-glob="*.md"` hinzufügen.
@@ -862,7 +862,7 @@ Code, der fehlerhaft ist, einen schlechten Zustand verursacht, der sich auf ande
 - Hier sehen Sie, wie Sie einen ganzen Test bedingungslos überspringen können:

 ```python no-style
-@unittest.skip("this bug needs to be fixed")
+@unittest.skip(reason="this bug needs to be fixed")
 def test_feature_x():
 ```


--- a/docs/source/en/testing.md
+++ b/docs/source/en/testing.md
@@ -184,16 +184,16 @@ pytest -k "test and ada" tests/test_optimization.py
 Sometimes you need to run `accelerate` tests on your models. For that you can just add `-m accelerate_tests` to your command, if let's say you want to run these tests on `OPT` run:

 ```bash
-RUN_SLOW=1 pytest -m accelerate_tests tests/models/opt/test_modeling_opt.py 
+RUN_SLOW=1 pytest -m accelerate_tests tests/models/opt/test_modeling_opt.py
 ```


-### Run documentation tests 
+### Run documentation tests

-In order to test whether the documentation examples are correct, you should check that the `doctests` are passing. 
-As an example, let's use [`WhisperModel.forward`'s docstring](https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/modeling_whisper.py#L1017-L1035): 
+In order to test whether the documentation examples are correct, you should check that the `doctests` are passing.
+As an example, let's use [`WhisperModel.forward`'s docstring](https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/modeling_whisper.py#L1017-L1035):

-```python 
+```python
 r"""
 Returns:

@@ -216,8 +216,8 @@ Example:

 ```

-Just run the following line to automatically test every docstring example in the desired file: 
-```bash 
+Just run the following line to automatically test every docstring example in the desired file:
+```bash
 pytest --doctest-modules <path_to_file_or_dir>
 ```
 If the file has a markdown extention, you should add the `--doctest-glob="*.md"` argument.
@@ -881,7 +881,7 @@ code that's buggy causes some bad state that will affect other tests, do not use
 - Here is how to skip whole test unconditionally:

 ```python no-style
-@unittest.skip("this bug needs to be fixed")
+@unittest.skip(reason="this bug needs to be fixed")
 def test_feature_x():
 ```


--- a/docs/source/ja/testing.md
+++ b/docs/source/ja/testing.md
@@ -171,16 +171,16 @@ pytest -k "test and ada" tests/test_optimization.py
 時々、モデルに対して `accelerate` テストを実行する必要があります。たとえば、`OPT` 実行に対してこれらのテストを実行したい場合、コマンドに `-m accelerate_tests` を追加するだけで済みます：

 ```bash
-RUN_SLOW=1 pytest -m accelerate_tests tests/models/opt/test_modeling_opt.py 
+RUN_SLOW=1 pytest -m accelerate_tests tests/models/opt/test_modeling_opt.py
 ```

-### Run documentation tests 
+### Run documentation tests

 ドキュメンテーションの例が正しいかどうかをテストするには、`doctests` が合格しているかを確認する必要があります。
 例として、[`WhisperModel.forward` のドックストリング](https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/modeling_whisper.py#L1017-L1035)を使用しましょう。


-```python 
+```python
 r"""
 Returns:

@@ -205,7 +205,7 @@ Example:

 指定したファイル内のすべてのドックストリング例を自動的にテストするために、以下の行を実行してください：

-```bash 
+```bash
 pytest --doctest-modules <path_to_file_or_dir>
 ```

@@ -809,7 +809,7 @@ with ExtendSysPath(f"{bindir}/.."):


 ```python no-style
-@unittest.skip("this bug needs to be fixed")
+@unittest.skip(reason="this bug needs to be fixed")
 def test_feature_x():
 ```

@@ -1211,4 +1211,3 @@ cmd_that_may_fail || true

 - [Github Actions:](https://github.com/actions/toolkit/issues/399)
 - [CircleCI:](https://ideas.circleci.com/ideas/CCI-I-344)
-
--- a/docs/source/ko/testing.md
+++ b/docs/source/ko/testing.md
@@ -26,19 +26,19 @@ rendered properly in your Markdown viewer.

 ## Transformers 테스트 방법[[how-transformers-are-tested]]

-1. PR이 제출되면 9개의 CircleCi 작업으로 테스트가 진행됩니다. 해당 PR에 대해 새로운 커밋이 생성될 때마다 테스트는 다시 진행됩니다. 이 작업들은 
-   이 [config 파일](https://github.com/huggingface/transformers/tree/main/.circleci/config.yml)에 정의되어 있으므로 필요하다면 
+1. PR이 제출되면 9개의 CircleCi 작업으로 테스트가 진행됩니다. 해당 PR에 대해 새로운 커밋이 생성될 때마다 테스트는 다시 진행됩니다. 이 작업들은
+   이 [config 파일](https://github.com/huggingface/transformers/tree/main/.circleci/config.yml)에 정의되어 있으므로 필요하다면
   사용자의 로컬 환경에서 동일하게 재현해 볼 수 있습니다.

   이 CI 작업은 `@slow` 테스트를 실행하지 않습니다.

 2. [github actions](https://github.com/huggingface/transformers/actions)에 의해 실행되는 작업은 3개입니다:

-   - [torch hub integration](https://github.com/huggingface/transformers/tree/main/.github/workflows/github-torch-hub.yml): 
+   - [torch hub integration](https://github.com/huggingface/transformers/tree/main/.github/workflows/github-torch-hub.yml):
    torch hub integration이 작동하는지 확인합니다.

-   - [self-hosted (push)](https://github.com/huggingface/transformers/tree/main/.github/workflows/self-push.yml): `main` 브랜치에서 커밋이 업데이트된 경우에만 GPU를 이용한 빠른 테스트를 실행합니다. 
-    이는 `src`, `tests`, `.github` 폴더 중 하나에 코드가 업데이트된 경우에만 실행됩니다. 
+   - [self-hosted (push)](https://github.com/huggingface/transformers/tree/main/.github/workflows/self-push.yml): `main` 브랜치에서 커밋이 업데이트된 경우에만 GPU를 이용한 빠른 테스트를 실행합니다.
+    이는 `src`, `tests`, `.github` 폴더 중 하나에 코드가 업데이트된 경우에만 실행됩니다.
    (model card, notebook, 기타 등등을 추가한 경우 실행되지 않도록 하기 위해서입니다)

   - [self-hosted runner](https://github.com/huggingface/transformers/tree/main/.github/workflows/self-scheduled.yml): `tests` 및 `examples`에서
@@ -61,7 +61,7 @@ RUN_SLOW=1 pytest examples/

 ### 실행할 테스트 선택[[choosing-which-tests-to-run]]

-이 문서는 테스트를 실행하는 다양한 방법에 대해 자세히 설명합니다. 
+이 문서는 테스트를 실행하는 다양한 방법에 대해 자세히 설명합니다.
 모든 내용을 읽은 후에도, 더 자세한 내용이 필요하다면 [여기](https://docs.pytest.org/en/latest/usage.html)에서 확인할 수 있습니다.

 다음은 가장 유용한 테스트 실행 방법 몇 가지입니다.
@@ -186,7 +186,7 @@ pytest -k "test and ada" tests/test_optimization.py
 모델에서 `accelerate` 테스트를 실행해야 할 때가 있습니다. 이를 위해서는 명령어에 `-m accelerate_tests`를 추가하면 됩니다.
 예를 들어, `OPT`에서 이러한 테스트를 실행하려면 다음과 같습니다:
 ```bash
-RUN_SLOW=1 pytest -m accelerate_tests tests/models/opt/test_modeling_opt.py 
+RUN_SLOW=1 pytest -m accelerate_tests tests/models/opt/test_modeling_opt.py
 ```

 ### 문서 테스트 실행[[run-documentation-tests]]
@@ -194,7 +194,7 @@ RUN_SLOW=1 pytest -m accelerate_tests tests/models/opt/test_modeling_opt.py
 예시 문서가 올바른지 테스트하려면 `doctests`가 통과하는지 확인해야 합니다.
 예를 들어, [`WhisperModel.forward`'s docstring](https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/modeling_whisper.py#L1017-L1035)를 사용해 봅시다:

-```python 
+```python
 r"""
 Returns:

@@ -218,7 +218,7 @@ Example:
 ```

 원하는 파일의 모든 docstring 예제를 자동으로 테스트하려면 다음 명령을 실행하면 됩니다:
-```bash 
+```bash
 pytest --doctest-modules <path_to_file_or_dir>
 ```
 파일의 확장자가 markdown인 경우 `--doctest-glob="*.md"` 인수를 추가해야 합니다.
@@ -240,9 +240,9 @@ pytest --picked

 ### 소스 수정 시 실패한 테스트 자동 재실행[[automatically-rerun-failed-tests-on-source-modification]]

-[pytest-xdist](https://github.com/pytest-dev/pytest-xdist)는 모든 실패한 테스트를 감지하고, 
+[pytest-xdist](https://github.com/pytest-dev/pytest-xdist)는 모든 실패한 테스트를 감지하고,
 파일을 수정한 후에 파일을 계속 재실행하여 테스트가 성공할 때까지 기다리는 매우 유용한 기능을 제공합니다.
-따라서 수정한 내용을 확인한 후 pytest를 다시 시작할 필요가 없습니다. 
+따라서 수정한 내용을 확인한 후 pytest를 다시 시작할 필요가 없습니다.
 모든 테스트가 통과될 때까지 이 과정을 반복한 후 다시 전체 실행이 이루어집니다.

 ```bash
@@ -252,7 +252,7 @@ pip install pytest-xdist
 재귀적 모드의 사용: `pytest -f` 또는 `pytest --looponfail`

 파일의 변경 사항은 `looponfailroots` 루트 디렉터리와 해당 내용을 (재귀적으로) 확인하여 감지됩니다.
-이 값의 기본값이 작동하지 않는 경우, 
+이 값의 기본값이 작동하지 않는 경우,
 `setup.cfg`의 설정 옵션을 변경하여 프로젝트에서 변경할 수 있습니다:

 ```ini
@@ -275,7 +275,7 @@ looponfailroots = transformers tests

 ### 특정 테스트 모듈 건너뛰기[[skip-a-test-module]]

-모든 테스트 모듈을 실행하되 특정 모듈을 제외하려면, 실행할 테스트 목록을 명시적으로 지정할 수 있습니다. 
+모든 테스트 모듈을 실행하되 특정 모듈을 제외하려면, 실행할 테스트 목록을 명시적으로 지정할 수 있습니다.
 예를 들어, `test_modeling_*.py` 테스트를 제외한 모든 테스트를 실행하려면 다음을 사용할 수 있습니다:

 ```bash
@@ -292,19 +292,19 @@ pytest --cache-clear tests

 ### 테스트를 병렬로 실행[[running-tests-in-parallel]]

-이전에 언급한 것처럼 `make test`는 테스트를 병렬로 실행하기 위해 
+이전에 언급한 것처럼 `make test`는 테스트를 병렬로 실행하기 위해
 `pytest-xdist` 플러그인(`-n X` 인수, 예를 들어 `-n 2`를 사용하여 2개의 병렬 작업 실행)을 통해 실행됩니다.

-`pytest-xdist`의 `--dist=` 옵션을 사용하여 테스트를 어떻게 그룹화할지 제어할 수 있습니다. 
+`pytest-xdist`의 `--dist=` 옵션을 사용하여 테스트를 어떻게 그룹화할지 제어할 수 있습니다.
 `--dist=loadfile`은 하나의 파일에 있는 테스트를 동일한 프로세스로 그룹화합니다.

 실행된 테스트의 순서가 다르고 예측할 수 없기 때문에, `pytest-xdist`로 테스트 스위트를 실행하면 실패가 발생할 수 있습니다 (검출되지 않은 결합된 테스트가 있는 경우).
-이 경우 [pytest-replay](https://github.com/ESSS/pytest-replay)를 사용하면 동일한 순서로 테스트를 다시 실행해서 
+이 경우 [pytest-replay](https://github.com/ESSS/pytest-replay)를 사용하면 동일한 순서로 테스트를 다시 실행해서
 실패하는 시퀀스를 최소화하는 데에 도움이 됩니다.

 ### 테스트 순서와 반복[[test-order-and-repetition]]

-잠재적인 종속성 및 상태 관련 버그(tear down)를 감지하기 위해 
+잠재적인 종속성 및 상태 관련 버그(tear down)를 감지하기 위해
 테스트를 여러 번, 연속으로, 무작위로 또는 세트로 반복하는 것이 좋습니다.
 그리고 직접적인 여러 번의 반복은 DL의 무작위성에 의해 발견되는 일부 문제를 감지하는 데에도 유용합니다.

@@ -341,10 +341,10 @@ pytest --flake-finder --flake-runs=5 tests/test_failing_test.py
 pip install pytest-random-order
 ```

-중요: `pytest-random-order`가 설치되면 테스트가 자동으로 임의의 순서로 섞입니다. 
+중요: `pytest-random-order`가 설치되면 테스트가 자동으로 임의의 순서로 섞입니다.
 구성 변경이나 커맨드 라인 옵션이 필요하지 않습니다.

-앞서 설명한 것처럼 이를 통해 한 테스트의 상태가 다른 테스트의 상태에 영향을 미치는 결합된 테스트를 감지할 수 있습니다. 
+앞서 설명한 것처럼 이를 통해 한 테스트의 상태가 다른 테스트의 상태에 영향을 미치는 결합된 테스트를 감지할 수 있습니다.
 `pytest-random-order`가 설치되면 해당 세션에서 사용된 랜덤 시드가 출력되며 예를 들어 다음과 같습니다:

 ```bash
@@ -364,7 +364,7 @@ Using --random-order-seed=573663
 ```

 정확히 동일한 테스트 목록(또는 목록이 없음)을 사용하는 경우에만 정확한 순서를 재현합니다.
-목록을 수동으로 좁히기 시작하면 더 이상 시드에 의존할 수 없고 실패했던 정확한 순서로 수동으로 목록을 나열해야합니다. 그리고 `--random-order-bucket=none`을 사용하여 pytest에게 순서를 임의로 설정하지 않도록 알려야 합니다. 
+목록을 수동으로 좁히기 시작하면 더 이상 시드에 의존할 수 없고 실패했던 정확한 순서로 수동으로 목록을 나열해야합니다. 그리고 `--random-order-bucket=none`을 사용하여 pytest에게 순서를 임의로 설정하지 않도록 알려야 합니다.
 예를 들어 다음과 같습니다:

 ```bash
@@ -377,19 +377,19 @@ pytest --random-order-bucket=none tests/test_a.py tests/test_c.py tests/test_b.p
 pytest --random-order-bucket=none
 ```

-기본적으로 `--random-order-bucket=module`이 내재되어 있으므로, 모듈 수준에서 파일을 섞습니다. 
+기본적으로 `--random-order-bucket=module`이 내재되어 있으므로, 모듈 수준에서 파일을 섞습니다.
 또한 `class`, `package`, `global` 및 `none` 수준에서도 섞을 수 있습니다.
 자세한 내용은 해당 [문서](https://github.com/jbasko/pytest-random-order)를 참조하세요.

 또 다른 무작위화의 대안은 [`pytest-randomly`](https://github.com/pytest-dev/pytest-randomly)입니다.
-이 모듈은 매우 유사한 기능/인터페이스를 가지고 있지만, `pytest-random-order`에 있는 버킷 모드를 사용할 수는 없습니다. 
+이 모듈은 매우 유사한 기능/인터페이스를 가지고 있지만, `pytest-random-order`에 있는 버킷 모드를 사용할 수는 없습니다.
 설치 후에는 자동으로 적용되는 문제도 동일하게 가집니다.

 ### 외관과 느낌을 변경[[look-and-feel-variations]

 #### pytest-sugar 사용[[pytest-sugar]]

-[pytest-sugar](https://github.com/Frozenball/pytest-sugar)는 테스트가 보여지는 형태를 개선하고, 
+[pytest-sugar](https://github.com/Frozenball/pytest-sugar)는 테스트가 보여지는 형태를 개선하고,
 진행 상황 바를 추가하며, 실패한 테스트와 검증을 즉시 표시하는 플러그인입니다. 설치하면 자동으로 활성화됩니다.

 ```bash
@@ -416,7 +416,7 @@ pytest --pspec tests/test_optimization.py

 #### 실패한 테스트 즉시 표시[[instantly-shows-failed-tests]]

-[pytest-instafail](https://github.com/pytest-dev/pytest-instafail)은 테스트 세션의 끝까지 기다리지 않고 
+[pytest-instafail](https://github.com/pytest-dev/pytest-instafail)은 테스트 세션의 끝까지 기다리지 않고
 실패 및 오류를 즉시 표시합니다.

 ```bash
@@ -435,7 +435,7 @@ GPU가 활성화된 환경에서, CPU 전용 모드로 테스트하려면 `CUDA_
 CUDA_VISIBLE_DEVICES="" pytest tests/utils/test_logging.py
 ```

-또는 다중 GPU가 있는 경우 `pytest`에서 사용할 GPU를 지정할 수도 있습니다. 
+또는 다중 GPU가 있는 경우 `pytest`에서 사용할 GPU를 지정할 수도 있습니다.
 예를 들어, GPU `0` 및 `1`이 있는 경우 다음을 실행할 수 있습니다:

 ```bash
@@ -444,7 +444,7 @@ CUDA_VISIBLE_DEVICES="1" pytest tests/utils/test_logging.py

 이렇게 하면 다른 GPU에서 다른 작업을 실행하려는 경우 유용합니다.

-일부 테스트는 반드시 CPU 전용으로 실행해야 하며, 일부는 CPU 또는 GPU 또는 TPU에서 실행해야 하고, 일부는 여러 GPU에서 실행해야 합니다. 
+일부 테스트는 반드시 CPU 전용으로 실행해야 하며, 일부는 CPU 또는 GPU 또는 TPU에서 실행해야 하고, 일부는 여러 GPU에서 실행해야 합니다.
 다음 스킵 데코레이터는 테스트의 요구 사항을 CPU/GPU/TPU별로 설정하는 데 사용됩니다:

 - `require_torch` - 이 테스트는 torch에서만 실행됩니다.
@@ -480,7 +480,7 @@ def test_example_with_multi_gpu():
 def test_tf_thing_with_tensorflow():
 ```

-이러한 데코레이터는 중첩될 수 있습니다. 
+이러한 데코레이터는 중첩될 수 있습니다.
 예를 들어, 느린 테스트로 진행되고 pytorch에서 적어도 하나의 GPU가 필요한 경우 다음과 같이 설정할 수 있습니다:

 ```python no-style
@@ -489,7 +489,7 @@ def test_tf_thing_with_tensorflow():
 def test_example_slow_on_gpu():
 ```

-`@parametrized`와 같은 일부 데코레이터는 테스트 이름을 다시 작성하기 때문에 `@require_*` 스킵 데코레이터는 올바르게 작동하려면 항상 맨 마지막에 나열되어야 합니다. 
+`@parametrized`와 같은 일부 데코레이터는 테스트 이름을 다시 작성하기 때문에 `@require_*` 스킵 데코레이터는 올바르게 작동하려면 항상 맨 마지막에 나열되어야 합니다.
 다음은 올바른 사용 예입니다:

 ```python no-style
@@ -498,7 +498,7 @@ def test_example_slow_on_gpu():
 def test_integration_foo():
 ```

-`@pytest.mark.parametrize`에는 이러한 순서 문제는 없으므로 처음 혹은 마지막에 위치시킬 수 있고 이러한 경우에도 잘 작동할 것입니다. 
+`@pytest.mark.parametrize`에는 이러한 순서 문제는 없으므로 처음 혹은 마지막에 위치시킬 수 있고 이러한 경우에도 잘 작동할 것입니다.
 하지만 unittest가 아닌 경우에만 작동합니다.

 테스트 내부에서 다음을 사용할 수 있습니다:
@@ -513,7 +513,7 @@ n_gpu = get_gpu_count()  #torch와 tf와 함께 작동

 ### 분산 훈련[[distributed-training]]

-`pytest`는 분산 훈련을 직접적으로 다루지 못합니다. 
+`pytest`는 분산 훈련을 직접적으로 다루지 못합니다.
 이를 시도하면 하위 프로세스가 올바른 작업을 수행하지 않고 `pytest`라고 생각하기에 테스트 스위트를 반복해서 실행하게 됩니다.
 그러나 일반 프로세스를 생성한 다음 여러 워커를 생성하고 IO 파이프를 관리하도록 하면 동작합니다.

@@ -532,7 +532,7 @@ CUDA_VISIBLE_DEVICES=0,1 RUN_SLOW=1 pytest -sv tests/test_trainer_distributed.py

 ### 출력 캡처[[output-capture]]

-테스트 실행 중 `stdout` 및 `stderr`로 전송된 모든 출력이 캡처됩니다. 
+테스트 실행 중 `stdout` 및 `stderr`로 전송된 모든 출력이 캡처됩니다.
 테스트나 설정 메소드가 실패하면 캡처된 출력은 일반적으로 실패 추적 정보와 함께 표시됩니다.

 출력 캡처를 비활성화하고 `stdout` 및 `stderr`를 정상적으로 받으려면 `-s` 또는 `--capture=no`를 사용하세요:
@@ -563,7 +563,7 @@ pytest --color=no tests/utils/test_logging.py
 pytest --pastebin=failed tests/utils/test_logging.py
 ```

-이렇게 하면 각 실패에 대한 URL을 제공하는 remote Paste service에 테스트 실행 정보를 제출합니다. 
+이렇게 하면 각 실패에 대한 URL을 제공하는 remote Paste service에 테스트 실행 정보를 제출합니다.
 일반적인 테스트를 선택할 수도 있고 혹은 특정 실패만 보내려면 `-x`와 같이 추가할 수도 있습니다.

 전체 테스트 세션 로그에 대한 URL을 생성합니다:
@@ -574,17 +574,17 @@ pytest --pastebin=all tests/utils/test_logging.py

 ## 테스트 작성[[writing-tests]]

-🤗 transformers 테스트는 대부분 `unittest`를 기반으로 하지만, 
+🤗 transformers 테스트는 대부분 `unittest`를 기반으로 하지만,
 `pytest`에서 실행되므로 대부분의 경우 두 시스템의 기능을 사용할 수 있습니다.

-지원되는 기능에 대해 [여기](https://docs.pytest.org/en/stable/unittest.html)에서 확인할 수 있지만, 
+지원되는 기능에 대해 [여기](https://docs.pytest.org/en/stable/unittest.html)에서 확인할 수 있지만,
 기억해야 할 중요한 점은 대부분의 `pytest` fixture가 작동하지 않는다는 것입니다.
 파라미터화도 작동하지 않지만, 우리는 비슷한 방식으로 작동하는 `parameterized` 모듈을 사용합니다.


 ### 매개변수화[[parametrization]]

-동일한 테스트를 다른 인수로 여러 번 실행해야 하는 경우가 종종 있습니다. 
+동일한 테스트를 다른 인수로 여러 번 실행해야 하는 경우가 종종 있습니다.
 테스트 내에서 이 작업을 수행할 수 있지만, 그렇게 하면 하나의 인수 세트에 대해 테스트를 실행할 수 없습니다.

 ```python
@@ -605,7 +605,7 @@ class TestMathUnitTest(unittest.TestCase):
        assert_equal(math.floor(input), expected)
 ```

-이제 기본적으로 이 테스트는 `test_floor`의 마지막 3개 인수가 
+이제 기본적으로 이 테스트는 `test_floor`의 마지막 3개 인수가
 매개변수 목록의 해당 인수에 할당되는 것으로 3번 실행될 것입니다.

 그리고 `negative` 및 `integer` 매개변수 집합만 실행하려면 다음과 같이 실행할 수 있습니다:
@@ -620,7 +620,7 @@ pytest -k "negative and integer" tests/test_mytest.py
 pytest -k "not negative" tests/test_mytest.py
 ```

-앞에서 언급한 `-k` 필터를 사용하는 것 외에도, 
+앞에서 언급한 `-k` 필터를 사용하는 것 외에도,
 각 서브 테스트의 정확한 이름을 확인한 후에 일부 혹은 전체 서브 테스트를 실행할 수 있습니다.

 ```bash
@@ -641,10 +641,10 @@ test_this1.py::TestMathUnitTest::test_floor_2_large_fraction
 pytest test_this1.py::TestMathUnitTest::test_floor_0_negative  test_this1.py::TestMathUnitTest::test_floor_1_integer
 ```

-`transformers`의 개발자 종속성에 이미 있는 [parameterized](https://pypi.org/project/parameterized/) 모듈은 
+`transformers`의 개발자 종속성에 이미 있는 [parameterized](https://pypi.org/project/parameterized/) 모듈은
 `unittests`와 `pytest` 테스트 모두에서 작동합니다.

-그러나 테스트가 `unittest`가 아닌 경우 `pytest.mark.parametrize`를 사용할 수 있습니다(이미 있는 일부 테스트에서 사용되는 경우도 있습니다. 
+그러나 테스트가 `unittest`가 아닌 경우 `pytest.mark.parametrize`를 사용할 수 있습니다(이미 있는 일부 테스트에서 사용되는 경우도 있습니다.
 주로 `examples` 하위에 있습니다).

 다음은 `pytest`의 `parametrize` 마커를 사용한 동일한 예입니다:
@@ -666,8 +666,8 @@ def test_floor(name, input, expected):
    assert_equal(math.floor(input), expected)
 ```

-`parameterized`와 마찬가지로 `pytest.mark.parametrize`를 사용하면 
-`-k` 필터가 작동하지 않는 경우에도 실행할 서브 테스트를 정확하게 지정할 수 있습니다. 
+`parameterized`와 마찬가지로 `pytest.mark.parametrize`를 사용하면
+`-k` 필터가 작동하지 않는 경우에도 실행할 서브 테스트를 정확하게 지정할 수 있습니다.
 단, 이 매개변수화 함수는 서브 테스트의 이름 집합을 약간 다르게 생성합니다. 다음과 같은 모습입니다:

 ```bash
@@ -694,7 +694,7 @@ pytest test_this2.py::test_floor[negative--1.5--2.0] test_this2.py::test_floor[i

 ### 파일 및 디렉터리[[files-and-directories]]

-테스트에서 종종 현재 테스트 파일과 관련된 상대적인 위치를 알아야 하는 경우가 있습니다. 
+테스트에서 종종 현재 테스트 파일과 관련된 상대적인 위치를 알아야 하는 경우가 있습니다.
 테스트가 여러 디렉터리에서 호출되거나 깊이가 다른 하위 디렉터리에 있을 수 있기 때문에 그 위치를 아는 것은 간단하지 않습니다.
 `transformers.test_utils.TestCasePlus`라는 헬퍼 클래스는 모든 기본 경로를 처리하고 간단한 액세서를 제공하여 이 문제를 해결합니다:

@@ -717,7 +717,7 @@ pytest test_this2.py::test_floor[negative--1.5--2.0] test_this2.py::test_floor[i
  - `repo_root_dir_str`
  - `src_dir_str`

-위의 내용을 사용하려면 테스트가 'transformers.test_utils.TestCasePlus'의 서브클래스에 있는지 확인해야 합니다. 
+위의 내용을 사용하려면 테스트가 'transformers.test_utils.TestCasePlus'의 서브클래스에 있는지 확인해야 합니다.
 예를 들어 다음과 같습니다:

 ```python
@@ -729,7 +729,7 @@ class PathExampleTest(TestCasePlus):
        data_dir = self.tests_dir / "fixtures/tests_samples/wmt_en_ro"
 ```

-만약 `pathlib`를 통해 경로를 조작할 필요가 없거나 경로를 문자열로만 필요로 하는 경우에는 `pathlib` 객체에 `str()`을 호출하거나 `_str`로 끝나는 접근자를 사용할 수 있습니다. 
+만약 `pathlib`를 통해 경로를 조작할 필요가 없거나 경로를 문자열로만 필요로 하는 경우에는 `pathlib` 객체에 `str()`을 호출하거나 `_str`로 끝나는 접근자를 사용할 수 있습니다.
 예를 들어 다음과 같습니다:

 ```python
@@ -743,14 +743,14 @@ class PathExampleTest(TestCasePlus):

 ### 임시 파일 및 디렉터리[[temporary-files-and-directories]]

-고유한 임시 파일 및 디렉터리를 사용하는 것은 병렬 테스트 실행에 있어 필수적입니다. 
-이렇게 함으로써 테스트들이 서로의 데이터를 덮어쓰지 않게 할 수 있습니다. 또한 우리는 생성된 테스트의 종료 단계에서 이러한 임시 파일 및 디렉터리를 제거하고 싶습니다.  
+고유한 임시 파일 및 디렉터리를 사용하는 것은 병렬 테스트 실행에 있어 필수적입니다.
+이렇게 함으로써 테스트들이 서로의 데이터를 덮어쓰지 않게 할 수 있습니다. 또한 우리는 생성된 테스트의 종료 단계에서 이러한 임시 파일 및 디렉터리를 제거하고 싶습니다.
 따라서 이러한 요구 사항을 충족시켜주는 `tempfile`과 같은 패키지를 사용하는 것이 중요합니다.

-그러나 테스트를 디버깅할 때는 임시 파일이나 디렉터리에 들어가는 내용을 확인할 수 있어야 하며, 
+그러나 테스트를 디버깅할 때는 임시 파일이나 디렉터리에 들어가는 내용을 확인할 수 있어야 하며,
 재실행되는 각 테스트마다 임시 파일이나 디렉터리의 경로에 대해 무작위 값이 아닌 정확한 값을 알고 싶을 것입니다.

-`transformers.test_utils.TestCasePlus`라는 도우미 클래스는 이러한 목적에 가장 적합합니다. 
+`transformers.test_utils.TestCasePlus`라는 도우미 클래스는 이러한 목적에 가장 적합합니다.
 이 클래스는 `unittest.TestCase`의 하위 클래스이므로, 우리는 이것을 테스트 모듈에서 쉽게 상속할 수 있습니다.

 다음은 해당 클래스를 사용하는 예시입니다:
@@ -773,7 +773,7 @@ def test_whatever(self):
    tmp_dir = self.get_auto_remove_tmp_dir()
 ```

-`tmp_dir`에는 생성된 임시 디렉터리의 경로가 포함됩니다. 
+`tmp_dir`에는 생성된 임시 디렉터리의 경로가 포함됩니다.
 이는 테스트의 종료 단계에서 자동으로 제거됩니다.

 - 선택한 경로로 임시 디렉터리 생성 후에 테스트 시작 전에 비어 있는 상태인지 확인하고, 테스트 후에는 비우지 마세요.
@@ -783,10 +783,10 @@ def test_whatever(self):
    tmp_dir = self.get_auto_remove_tmp_dir("./xxx")
 ```

-이것은 디버깅할 때 특정 디렉터리를 모니터링하고, 
+이것은 디버깅할 때 특정 디렉터리를 모니터링하고,
 그 디렉터리에 이전에 실행된 테스트가 데이터를 남기지 않도록 하는 데에 유용합니다.

- `before` 및 `after` 인수를 직접 오버라이딩하여 기본 동작을 변경할 수 있으며 
+- `before` 및 `after` 인수를 직접 오버라이딩하여 기본 동작을 변경할 수 있으며
 다음 중 하나의 동작으로 이어집니다:

  - `before=True`: 테스트 시작 시 임시 디렉터리가 항상 지워집니다.
@@ -804,7 +804,7 @@ def test_whatever(self):

 <Tip>

-각 테스트는 여러 개의 임시 디렉터리를 등록할 수 있으며, 
+각 테스트는 여러 개의 임시 디렉터리를 등록할 수 있으며,
 별도로 요청하지 않는 한 모두 자동으로 제거됩니다.

 </Tip>
@@ -826,17 +826,17 @@ with ExtendSysPath(f"{bindir}/.."):

 ### 테스트 건너뛰기[[skipping-tests]]

-이것은 버그가 발견되어 새로운 테스트가 작성되었지만 아직 그 버그가 수정되지 않은 경우에 유용합니다. 
+이것은 버그가 발견되어 새로운 테스트가 작성되었지만 아직 그 버그가 수정되지 않은 경우에 유용합니다.
 이 테스트를 주 저장소에 커밋하려면 `make test` 중에 건너뛰도록 해야 합니다.

 방법:

- **skip**은 테스트가 일부 조건이 충족될 경우에만 통과될 것으로 예상되고, 그렇지 않으면 pytest가 전체 테스트를 건너뛰어야 함을 의미합니다. 
-일반적인 예로는 Windows가 아닌 플랫폼에서 Windows 전용 테스트를 건너뛰거나 
+- **skip**은 테스트가 일부 조건이 충족될 경우에만 통과될 것으로 예상되고, 그렇지 않으면 pytest가 전체 테스트를 건너뛰어야 함을 의미합니다.
+일반적인 예로는 Windows가 아닌 플랫폼에서 Windows 전용 테스트를 건너뛰거나
 외부 리소스(예를 들어 데이터베이스)에 의존하는 테스트를 건너뛰는 것이 있습니다.

- **xfail**은 테스트가 특정한 이유로 인해 실패할 것으로 예상하는 것을 의미합니다. 
-일반적인 예로는 아직 구현되지 않은 기능이나 아직 수정되지 않은 버그의 테스트가 있습니다. 
+- **xfail**은 테스트가 특정한 이유로 인해 실패할 것으로 예상하는 것을 의미합니다.
+일반적인 예로는 아직 구현되지 않은 기능이나 아직 수정되지 않은 버그의 테스트가 있습니다.
 `xfail`로 표시된 테스트가 예상대로 실패하지 않고 통과된 경우, 이것은 xpass이며 테스트 결과 요약에 기록됩니다.

 두 가지 중요한 차이점 중 하나는 `skip`은 테스트를 실행하지 않지만 `xfail`은 실행한다는 것입니다.
@@ -847,7 +847,7 @@ with ExtendSysPath(f"{bindir}/.."):
 - 전체 테스트를 무조건 건너뛰려면 다음과 같이 할 수 있습니다:

 ```python no-style
-@unittest.skip("this bug needs to be fixed")
+@unittest.skip(reason="this bug needs to be fixed")
 def test_feature_x():
 ```

@@ -920,7 +920,7 @@ class TestClass():

 ### 느린 테스트[[slow-tests]]

-테스트 라이브러리는 지속적으로 확장되고 있으며, 일부 테스트는 실행하는 데 몇 분이 걸립니다. 
+테스트 라이브러리는 지속적으로 확장되고 있으며, 일부 테스트는 실행하는 데 몇 분이 걸립니다.
 그리고 우리에게는 테스트 스위트가 CI를 통해 완료되기까지 한 시간을 기다릴 여유가 없습니다.
 따라서 필수 테스트를 위한 일부 예외를 제외하고 느린 테스트는 다음과 같이 표시해야 합니다.

@@ -936,7 +936,7 @@ def test_integration_foo():
 RUN_SLOW=1 pytest tests
 ```

-`@parameterized`와 같은 몇 가지 데코레이터는 테스트 이름을 다시 작성합니다. 
+`@parameterized`와 같은 몇 가지 데코레이터는 테스트 이름을 다시 작성합니다.
 그러므로 `@slow`와 나머지 건너뛰기 데코레이터 `@require_*`가 올바르게 작동되려면 마지막에 나열되어야 합니다. 다음은 올바른 사용 예입니다.

 ```python no-style
@@ -945,25 +945,25 @@ RUN_SLOW=1 pytest tests
 def test_integration_foo():
 ```

-이 문서의 초반부에 설명된 것처럼 느린 테스트는 PR의 CI 확인이 아닌 예약된 일정 기반으로 실행됩니다. 
+이 문서의 초반부에 설명된 것처럼 느린 테스트는 PR의 CI 확인이 아닌 예약된 일정 기반으로 실행됩니다.
 따라서 PR 제출 중에 일부 문제를 놓친 채로 병합될 수 있습니다.
-이러한 문제들은 다음번의 예정된 CI 작업 중에 감지됩니다. 
+이러한 문제들은 다음번의 예정된 CI 작업 중에 감지됩니다.
 하지만 PR을 제출하기 전에 자신의 컴퓨터에서 느린 테스트를 실행하는 것 또한 중요합니다.

 느린 테스트로 표시해야 하는지 여부를 결정하는 대략적인 결정 기준은 다음과 같습니다.

-만약 테스트가 라이브러리의 내부 구성 요소 중 하나에 집중되어 있다면(예: 모델링 파일, 토큰화 파일, 파이프라인), 
+만약 테스트가 라이브러리의 내부 구성 요소 중 하나에 집중되어 있다면(예: 모델링 파일, 토큰화 파일, 파이프라인),
 해당 테스트를 느린 테스트 스위트에서 실행해야 합니다.
-만약 라이브러리의 다른 측면(예: 문서 또는 예제)에 집중되어 있다면, 
+만약 라이브러리의 다른 측면(예: 문서 또는 예제)에 집중되어 있다면,
 해당 테스트를 느린 테스트 스위트에서 실행해야 합니다. 그리고 이 접근 방식을 보완하기 위해 예외를 만들어야 합니다.

- 무거운 가중치 세트나 50MB보다 큰 데이터셋을 다운로드해야 하는 모든 테스트(예: 모델 통합 테스트, 토크나이저 통합 테스트, 파이프라인 통합 테스트)를 
+- 무거운 가중치 세트나 50MB보다 큰 데이터셋을 다운로드해야 하는 모든 테스트(예: 모델 통합 테스트, 토크나이저 통합 테스트, 파이프라인 통합 테스트)를
  느린 테스트로 설정해야 합니다.
-  새로운 모델을 추가하는 경우 통합 테스트용으로 무작위 가중치로 작은 버전을 만들어 허브에 업로드해야 합니다. 
+  새로운 모델을 추가하는 경우 통합 테스트용으로 무작위 가중치로 작은 버전을 만들어 허브에 업로드해야 합니다.
  이 내용은 아래 단락에서 설명됩니다.
 - 특별히 빠르게 실행되도록 최적화되지 않은 학습을 수행해야 하는 테스트는 느린 테스트로 설정해야 합니다.
- 느리지 않아야 할 테스트 중 일부가 극도로 느린 경우 
-  예외를 도입하고 이를 `@slow`로 설정할 수 있습니다. 
+- 느리지 않아야 할 테스트 중 일부가 극도로 느린 경우
+  예외를 도입하고 이를 `@slow`로 설정할 수 있습니다.
  대용량 파일을 디스크에 저장하고 불러오는 자동 모델링 테스트는 `@slow`으로 표시된 테스트의 좋은 예입니다.
 - CI에서 1초 이내에 테스트가 완료되는 경우(다운로드 포함)에는 느린 테스트가 아니어야 합니다.

@@ -976,22 +976,22 @@ def test_integration_foo():
 grep tiny tests examples
 ```

-다음은 작은 모델[stas/tiny-wmt19-en-de](https://huggingface.co/stas/tiny-wmt19-en-de)을 만든 
-[script](https://github.com/huggingface/transformers/tree/main/scripts/fsmt/fsmt-make-tiny-model.py) 예시입니다. 
+다음은 작은 모델[stas/tiny-wmt19-en-de](https://huggingface.co/stas/tiny-wmt19-en-de)을 만든
+[script](https://github.com/huggingface/transformers/tree/main/scripts/fsmt/fsmt-make-tiny-model.py) 예시입니다.
 특정 모델의 아키텍처에 맞게 쉽게 조정할 수 있습니다.

-예를 들어 대용량 모델을 다운로드하는 경우 런타임을 잘못 측정하기 쉽지만, 
-로컬에서 테스트하면 다운로드한 파일이 캐시되어 다운로드 시간이 측정되지 않습니다. 
+예를 들어 대용량 모델을 다운로드하는 경우 런타임을 잘못 측정하기 쉽지만,
+로컬에서 테스트하면 다운로드한 파일이 캐시되어 다운로드 시간이 측정되지 않습니다.
 대신 CI 로그의 실행 속도 보고서를 확인하세요(`pytest --durations=0 tests`의 출력).

-이 보고서는 느린 이상값으로 표시되지 않거나 빠르게 다시 작성해야 하는 느린 이상값을 찾는 데도 유용합니다. 
+이 보고서는 느린 이상값으로 표시되지 않거나 빠르게 다시 작성해야 하는 느린 이상값을 찾는 데도 유용합니다.
 CI에서 테스트 스위트가 느려지기 시작하면 이 보고서의 맨 위 목록에 가장 느린 테스트가 표시됩니다.



 ### stdout/stderr 출력 테스트[[testing-the-stdout/stderr-output]]

-`stdout` 및/또는 `stderr`로 쓰는 함수를 테스트하려면 `pytest`의 [capsys 시스템](https://docs.pytest.org/en/latest/capture.html)을 사용하여 해당 스트림에 액세스할 수 있습니다. 
+`stdout` 및/또는 `stderr`로 쓰는 함수를 테스트하려면 `pytest`의 [capsys 시스템](https://docs.pytest.org/en/latest/capture.html)을 사용하여 해당 스트림에 액세스할 수 있습니다.
 다음과 같이 수행할 수 있습니다.

 ```python
@@ -1019,7 +1019,7 @@ def test_result_and_stdout(capsys):
    assert msg in err
 ```

-그리고, 물론 대부분의 경우에는 `stderr`는 예외의 일부로 제공됩니다. 
+그리고, 물론 대부분의 경우에는 `stderr`는 예외의 일부로 제공됩니다.
 그러므로 해당 경우에는 try/except를 사용해야 합니다.

 ```python
@@ -1061,11 +1061,11 @@ def test_result_and_stdout():
 ```

 `stdout` 캡처에 관련된 중요한 문제 중 하나는 보통 `print`에서 이전에 인쇄된 내용을 재설정하는 `\r` 문자가 포함될 수 있다는 것입니다.
-`pytest`에서는 문제가 없지만 `pytest -s`에서는 이러한 문자가 버퍼에 포함되므로 
+`pytest`에서는 문제가 없지만 `pytest -s`에서는 이러한 문자가 버퍼에 포함되므로
 `-s`가 있거나 없는 상태에서 태스트를 수행할 수 있으려면 캡처된 출력에 대해 추가적인 정리가 필요합니다.
 이 경우에는 `re.sub(r'~.*\r', '', buf, 0, re.M)`을 사용할 수 있습니다.

-하지만 도우미 컨텍스트 관리자 래퍼를 사용하면 
+하지만 도우미 컨텍스트 관리자 래퍼를 사용하면
 출력에 `\r`이 포함되어 있는지의 여부에 관계없이 모든 것을 자동으로 처리하므로 편리합니다.

 ```python
@@ -1108,7 +1108,7 @@ with CaptureStd() as cs:
 print(cs.err, cs.out)
 ```

-또한, 테스트의 디버깅을 지원하기 위해 
+또한, 테스트의 디버깅을 지원하기 위해
 이러한 컨텍스트 관리자는 기본적으로 컨텍스트에서 종료할 때 캡처된 스트림을 자동으로 다시 실행합니다.


@@ -1130,7 +1130,7 @@ assert cl.out, msg + "\n"

 ### 환경 변수를 이용하여 테스트[[testing-with-environment-variables]]

-특정 테스트의 환경 변수 영향을 검증하려면 
+특정 테스트의 환경 변수 영향을 검증하려면
 `transformers.testing_utils.mockenv`라는 도우미 데코레이터를 사용할 수 있습니다.

 ```python
@@ -1143,7 +1143,7 @@ class HfArgumentParserTest(unittest.TestCase):
        env_level_str = os.getenv("TRANSFORMERS_VERBOSITY", None)
 ```

-일부 경우에는 외부 프로그램을 호출해야할 수도 있는데, 이 때에는 여러 개의 로컬 경로를 포함하는 `os.environ`에서 `PYTHONPATH`의 설정이 필요합니다.  
+일부 경우에는 외부 프로그램을 호출해야할 수도 있는데, 이 때에는 여러 개의 로컬 경로를 포함하는 `os.environ`에서 `PYTHONPATH`의 설정이 필요합니다.
 헬퍼 클래스 `transformers.test_utils.TestCasePlus`가 도움이 됩니다:

 ```python
@@ -1156,8 +1156,8 @@ class EnvExampleTest(TestCasePlus):
        # 이제 `env`를 사용하여 외부 프로그램 호출
 ```

-테스트 파일이 `tests` 테스트 스위트 또는 `examples`에 있는지에 따라 
-`env[PYTHONPATH]`가 두 디렉터리 중 하나를 포함하도록 설정되며, 
+테스트 파일이 `tests` 테스트 스위트 또는 `examples`에 있는지에 따라
+`env[PYTHONPATH]`가 두 디렉터리 중 하나를 포함하도록 설정되며,
 현재 저장소에 대해 테스트가 수행되도록 `src` 디렉터리도 포함됩니다.
 테스트 호출 이전에 설정된 경우에는 `env[PYTHONPATH]`를 그대로 사용합니다.

@@ -1166,7 +1166,7 @@ class EnvExampleTest(TestCasePlus):

 ### 재현 가능한 결과 얻기[[getting-reproducible-results]]

-일부 상황에서 테스트에서 임의성을 제거하여 동일하게 재현 가능한 결과를 얻고 싶을 수 있습니다. 
+일부 상황에서 테스트에서 임의성을 제거하여 동일하게 재현 가능한 결과를 얻고 싶을 수 있습니다.
 이를 위해서는 다음과 같이 시드를 고정해야 합니다.

 ```python
@@ -1207,11 +1207,11 @@ pytest tests/utils/test_logging.py -W error::UserWarning --pdb
 셀프 푸시 워크플로우 CI 작업을 트리거하려면, 다음을 수행해야 합니다.

 1. `transformers` 원본에서 새 브랜치를 만듭니다(포크가 아닙니다!).
-2. 브랜치 이름은 `ci_` 또는 `ci-`로 시작해야 합니다(`main`도 트리거하지만 `main`에서는 PR을 할 수 없습니다). 
-   또한 특정 경로에 대해서만 트리거되므로 이 문서가 작성된 후에 변경된 내용은 
+2. 브랜치 이름은 `ci_` 또는 `ci-`로 시작해야 합니다(`main`도 트리거하지만 `main`에서는 PR을 할 수 없습니다).
+   또한 특정 경로에 대해서만 트리거되므로 이 문서가 작성된 후에 변경된 내용은
   [여기](https://github.com/huggingface/transformers/blob/main/.github/workflows/self-push.yml)의 *push:*에서 확인할 수 있습니다.
 3. 이 브랜치에서 PR을 생성합니다
-4. 그런 다음 [여기](https://github.com/huggingface/transformers/actions/workflows/self-push.yml)에서 작업이 나타나는지 확인할 수 있습니다. 
+4. 그런 다음 [여기](https://github.com/huggingface/transformers/actions/workflows/self-push.yml)에서 작업이 나타나는지 확인할 수 있습니다.
   백로그가 있는 경우, 바로 실행되지 않을 수도 있습니다.


@@ -1219,13 +1219,13 @@ pytest tests/utils/test_logging.py -W error::UserWarning --pdb

 ## 실험적인 CI 기능 테스트[[testing-Experimental-CI-Features]]

-CI 기능을 테스트하는 것은 일반 CI 작동에 방해가 될 수 있기 때문에 잠재적으로 문제가 발생할 수 있습니다. 
+CI 기능을 테스트하는 것은 일반 CI 작동에 방해가 될 수 있기 때문에 잠재적으로 문제가 발생할 수 있습니다.
 따라서 새로운 CI 기능을 추가하는 경우 다음과 같이 수행해야 합니다.

 1. 테스트해야 할 내용을 테스트하는 새로운 전용 작업을 생성합니다.
 2. 새로운 작업은 항상 성공해야만 녹색 ✓를 받을 수 있습니다(아래에 자세한 내용이 있습니다).
-3. 다양한 PR 유형에 대한 확인을  위해 
-   (사용자 포크 브랜치, 포크되지 않은 브랜치, github.com UI 직접 파일 편집에서 생성된 브랜치, 강제 푸시 등 PR의 유형은 아주 다양합니다.) 
+3. 다양한 PR 유형에 대한 확인을  위해
+   (사용자 포크 브랜치, 포크되지 않은 브랜치, github.com UI 직접 파일 편집에서 생성된 브랜치, 강제 푸시 등 PR의 유형은 아주 다양합니다.)
   며칠 동안 실험 작업의 로그를 모니터링하면서 실행해봅니다.
   (의도적으로 항상 녹색을 표시하므로 작업 전체가 녹색은 아니라는 점에 유의합니다.)
 4. 모든 것이 안정적인지 확인한 후, 새로운 변경 사항을 기존 작업에 병합합니다.
@@ -1234,7 +1234,7 @@ CI 기능을 테스트하는 것은 일반 CI 작동에 방해가 될 수 있기

 그러나 새로운 CI 기능이 개발 중인 동안, 항상 성공하도록 할 수 있는 방법은 무엇일까요?

-TravisCI와 같은 일부 CI는 `ignore-step-failure`를 지원하며 전체 작업을 성공한 것으로 보고하지만, 
+TravisCI와 같은 일부 CI는 `ignore-step-failure`를 지원하며 전체 작업을 성공한 것으로 보고하지만,
 현재 우리가 사용하는 CircleCI와 Github Actions는 이를 지원하지 않습니다.

 따라서 다음과 같은 해결책을 사용할 수 있습니다.
@@ -1264,12 +1264,12 @@ TravisCI와 같은 일부 CI는 `ignore-step-failure`를 지원하며 전체 작
 cmd_that_may_fail || true
 ```

-결과에 만족한 후에는 물론, 실험적인 단계 또는 작업을 일반 작업의 나머지 부분과 통합하면서 
-`set +euo pipefail` 또는 기타 추가한 요소를 제거하여 
+결과에 만족한 후에는 물론, 실험적인 단계 또는 작업을 일반 작업의 나머지 부분과 통합하면서
+`set +euo pipefail` 또는 기타 추가한 요소를 제거하여
 실험 작업이 일반 CI 작동에 방해되지 않도록 해야 합니다.

-이 전반적인 과정은 실험 단계가 PR의 전반적인 상태에 영향을 주지 않고 실패하도록 
-`allow-failure`와 같은 기능을 설정할 수 있다면 훨씬 더 쉬웠을 것입니다. 
+이 전반적인 과정은 실험 단계가 PR의 전반적인 상태에 영향을 주지 않고 실패하도록
+`allow-failure`와 같은 기능을 설정할 수 있다면 훨씬 더 쉬웠을 것입니다.
 그러나 앞에서 언급한 바와 같이 CircleCI와 Github Actions는 현재 이러한 기능들 지원하지 않습니다.

 이 기능의 지원을 위한 투표에 참여하고 CI 관련 스레드들에서 이러한 상황을 확인할 수도 있습니다.

--- a/src/transformers/testing_utils.py
+++ b/src/transformers/testing_utils.py
@@ -226,7 +226,7 @@ def is_pt_tf_cross_test(test_case):

    """
    if not _run_pt_tf_cross_tests or not is_torch_available() or not is_tf_available():
-        return unittest.skip("test is PT+TF test")(test_case)
+        return unittest.skip(reason="test is PT+TF test")(test_case)
    else:
        try:
            import pytest  # We don't need a hard dependency on pytest in the main library
@@ -245,7 +245,7 @@ def is_pt_flax_cross_test(test_case):

    """
    if not _run_pt_flax_cross_tests or not is_torch_available() or not is_flax_available():
-        return unittest.skip("test is PT+FLAX test")(test_case)
+        return unittest.skip(reason="test is PT+FLAX test")(test_case)
    else:
        try:
            import pytest  # We don't need a hard dependency on pytest in the main library
@@ -262,7 +262,7 @@ def is_staging_test(test_case):
    Those tests will run using the staging environment of huggingface.co instead of the real model hub.
    """
    if not _run_staging:
-        return unittest.skip("test is staging test")(test_case)
+        return unittest.skip(reason="test is staging test")(test_case)
    else:
        try:
            import pytest  # We don't need a hard dependency on pytest in the main library
@@ -278,7 +278,7 @@ def is_pipeline_test(test_case):
    skipped.
    """
    if not _run_pipeline_tests:
-        return unittest.skip("test is pipeline test")(test_case)
+        return unittest.skip(reason="test is pipeline test")(test_case)
    else:
        try:
            import pytest  # We don't need a hard dependency on pytest in the main library
@@ -293,7 +293,7 @@ def is_agent_test(test_case):
    Decorator marking a test as an agent test. If RUN_TOOL_TESTS is set to a falsy value, those tests will be skipped.
    """
    if not _run_agent_tests:
-        return unittest.skip("test is an agent test")(test_case)
+        return unittest.skip(reason="test is an agent test")(test_case)
    else:
        try:
            import pytest  # We don't need a hard dependency on pytest in the main library
@@ -321,7 +321,7 @@ def tooslow(test_case):
    these will not be tested by the CI.

    """
-    return unittest.skip("test is too slow")(test_case)
+    return unittest.skip(reason="test is too slow")(test_case)


 def custom_tokenizers(test_case):
@@ -709,7 +709,7 @@ def require_torch_multi_gpu(test_case):
    To run *only* the multi_gpu tests, assuming all test names contain multi_gpu: $ pytest -sv ./tests -k "multi_gpu"
    """
    if not is_torch_available():
-        return unittest.skip("test requires PyTorch")(test_case)
+        return unittest.skip(reason="test requires PyTorch")(test_case)

    import torch

@@ -723,7 +723,7 @@ def require_torch_multi_accelerator(test_case):
    multi_accelerator: $ pytest -sv ./tests -k "multi_accelerator"
    """
    if not is_torch_available():
-        return unittest.skip("test requires PyTorch")(test_case)
+        return unittest.skip(reason="test requires PyTorch")(test_case)

    return unittest.skipUnless(backend_device_count(torch_device) > 1, "test requires multiple accelerators")(
        test_case
@@ -735,7 +735,7 @@ def require_torch_non_multi_gpu(test_case):
    Decorator marking a test that requires 0 or 1 GPU setup (in PyTorch).
    """
    if not is_torch_available():
-        return unittest.skip("test requires PyTorch")(test_case)
+        return unittest.skip(reason="test requires PyTorch")(test_case)

    import torch

@@ -747,7 +747,7 @@ def require_torch_non_multi_accelerator(test_case):
    Decorator marking a test that requires 0 or 1 accelerator setup (in PyTorch).
    """
    if not is_torch_available():
-        return unittest.skip("test requires PyTorch")(test_case)
+        return unittest.skip(reason="test requires PyTorch")(test_case)

    return unittest.skipUnless(backend_device_count(torch_device) < 2, "test requires 0 or 1 accelerator")(test_case)

@@ -757,7 +757,7 @@ def require_torch_up_to_2_gpus(test_case):
    Decorator marking a test that requires 0 or 1 or 2 GPU setup (in PyTorch).
    """
    if not is_torch_available():
-        return unittest.skip("test requires PyTorch")(test_case)
+        return unittest.skip(reason="test requires PyTorch")(test_case)

    import torch

@@ -769,7 +769,7 @@ def require_torch_up_to_2_accelerators(test_case):
    Decorator marking a test that requires 0 or 1 or 2 accelerator setup (in PyTorch).
    """
    if not is_torch_available():
-        return unittest.skip("test requires PyTorch")(test_case)
+        return unittest.skip(reason="test requires PyTorch")(test_case)

    return unittest.skipUnless(backend_device_count(torch_device) < 3, "test requires 0 or 1 or 2 accelerators")
    (test_case)
@@ -806,7 +806,7 @@ def require_torch_multi_npu(test_case):
    To run *only* the multi_npu tests, assuming all test names contain multi_npu: $ pytest -sv ./tests -k "multi_npu"
    """
    if not is_torch_npu_available():
-        return unittest.skip("test requires PyTorch NPU")(test_case)
+        return unittest.skip(reason="test requires PyTorch NPU")(test_case)

    return unittest.skipUnless(torch.npu.device_count() > 1, "test requires multiple NPUs")(test_case)

@@ -830,7 +830,7 @@ def require_torch_multi_xpu(test_case):
    To run *only* the multi_xpu tests, assuming all test names contain multi_xpu: $ pytest -sv ./tests -k "multi_xpu"
    """
    if not is_torch_xpu_available():
-        return unittest.skip("test requires PyTorch XPU")(test_case)
+        return unittest.skip(reason="test requires PyTorch XPU")(test_case)

    return unittest.skipUnless(torch.xpu.device_count() > 1, "test requires multiple XPUs")(test_case)

@@ -1078,7 +1078,7 @@ def require_bitsandbytes(test_case):
        except ImportError:
            return test_case
    else:
-        return unittest.skip("test requires bitsandbytes and torch")(test_case)
+        return unittest.skip(reason="test requires bitsandbytes and torch")(test_case)


 def require_optimum(test_case):

--- a/tests/deepspeed/test_deepspeed.py
+++ b/tests/deepspeed/test_deepspeed.py
@@ -108,13 +108,13 @@ def require_deepspeed_aio(test_case):
    Decorator marking a test that requires deepspeed aio (nvme)
    """
    if not is_deepspeed_available():
-        return unittest.skip("test requires deepspeed")(test_case)
+        return unittest.skip(reason="test requires deepspeed")(test_case)

    import deepspeed
    from deepspeed.ops.aio import AsyncIOBuilder

    if not deepspeed.ops.__compatible_ops__[AsyncIOBuilder.NAME]:
-        return unittest.skip("test requires deepspeed async-io")(test_case)
+        return unittest.skip(reason="test requires deepspeed async-io")(test_case)
    else:
        return test_case

@@ -643,7 +643,7 @@ class TrainerIntegrationDeepSpeed(TrainerIntegrationDeepSpeedWithCustomConfig, T
            # print(trainer.model.b.item())
            # need to investigate at some point
            if (stage == ZERO3 and dtype == FP16) or (dtype == BF16):
-                return
+                self.skipTest(reason="When using zero3/fp16 or any/bf16 the optimizer seems run oddly")

            # it's enough that train didn't fail for this test, but we must check that
            # optimizer/scheduler didn't run (since if it did this test isn't testing the right thing)
@@ -795,7 +795,7 @@ class TrainerIntegrationDeepSpeed(TrainerIntegrationDeepSpeedWithCustomConfig, T
        # ToDo: Currently, hf_optim + hf_scheduler resumes with the correct states and
        # also has same losses for few steps but then slowly diverges. Need to figure it out.
        if optim == HF_OPTIM and scheduler == HF_SCHEDULER:
-            return
+            self.skipTest(reason="hf_optim + hf_scheduler resumes with the correct states but slowly diverges")

        output_dir = self.get_auto_remove_tmp_dir("./xxx", after=False)
        ds_config_dict = self.get_config_dict(stage)
@@ -1113,7 +1113,7 @@ class TestDeepSpeedWithLauncher(TestCasePlus):
    @require_torch_multi_accelerator
    def test_inference(self, dtype):
        if dtype == "bf16" and not is_torch_bf16_available_on_device(torch_device):
-            self.skipTest("test requires bfloat16 hardware support")
+            self.skipTest(reason="test requires bfloat16 hardware support")

        # this is just inference, so no optimizer should be loaded
        # it only works for z3 (makes no sense with z1-z2)

--- a/tests/extended/test_trainer_ext.py
+++ b/tests/extended/test_trainer_ext.py
@@ -80,7 +80,7 @@ class TestTrainerExt(TestCasePlus):
        logs = TrainerState.load_from_json(os.path.join(output_dir, "trainer_state.json")).log_history

        if not do_eval:
-            return
+            self.skipTest(reason="do_eval is False")

        eval_metrics = [log for log in logs if "eval_loss" in log.keys()]


--- a/tests/generation/test_utils.py
+++ b/tests/generation/test_utils.py
@@ -463,9 +463,9 @@ class GenerationTesterMixin:
            config, input_ids, attention_mask = self._get_input_ids_and_config()

            if not hasattr(config, "use_cache"):
-                self.skipTest("This model doesn't support caching")
+                self.skipTest(reason="This model doesn't support caching")
            if any(model_name in model_class.__name__.lower() for model_name in ["rwkv"]):
-                self.skipTest("Won't fix: model with non-standard dictionary output shapes")
+                self.skipTest(reason="Won't fix: model with non-standard dictionary output shapes")

            config.use_cache = True
            config.is_decoder = True
@@ -625,9 +625,9 @@ class GenerationTesterMixin:
            config, input_ids, attention_mask = self._get_input_ids_and_config()

            if not hasattr(config, "use_cache"):
-                self.skipTest("This model doesn't support caching")
+                self.skipTest(reason="This model doesn't support caching")
            if any(model_name in model_class.__name__.lower() for model_name in ["rwkv"]):
-                self.skipTest("Won't fix: model with non-standard dictionary output shapes")
+                self.skipTest(reason="Won't fix: model with non-standard dictionary output shapes")

            model = model_class(config).to(torch_device).eval()
            logits_process_kwargs, _ = self._get_logits_processor_and_warper_kwargs(
@@ -667,7 +667,7 @@ class GenerationTesterMixin:
    def test_model_parallel_beam_search(self):
        for model_class in self.all_generative_model_classes:
            if "xpu" in torch_device:
-                return unittest.skip("device_map='auto' does not work with XPU devices")
+                return unittest.skip(reason="device_map='auto' does not work with XPU devices")

            if model_class._no_split_modules is None:
                continue
@@ -765,7 +765,7 @@ class GenerationTesterMixin:

        # if no bos token id => cannot generate from None
        if config.bos_token_id is None:
-            return
+            self.skipTest(reason="bos_token_id is None")

        # hack in case they are equal, otherwise the attn mask will be [0]
        if config.bos_token_id == config.pad_token_id:
@@ -982,17 +982,17 @@ class GenerationTesterMixin:
    def test_contrastive_generate(self):
        for model_class in self.all_generative_model_classes:
            if model_class._is_stateful:
-                self.skipTest("Stateful models don't support contrastive search generation")
+                self.skipTest(reason="Stateful models don't support contrastive search generation")

            # won't fix: FSMT and Reformer have a different cache variable type (and format).
            if any(model_name in model_class.__name__.lower() for model_name in ["fsmt", "reformer"]):
-                self.skipTest("Won't fix: old model with different cache format")
+                self.skipTest(reason="Won't fix: old model with different cache format")

            config, input_ids, attention_mask = self._get_input_ids_and_config()

            # NOTE: contrastive search only works with cache on at the moment.
            if not hasattr(config, "use_cache"):
-                self.skipTest("This model doesn't support caching")
+                self.skipTest(reason="This model doesn't support caching")
            config.use_cache = True
            config.is_decoder = True

@@ -1009,17 +1009,17 @@ class GenerationTesterMixin:
    def test_contrastive_generate_dict_outputs_use_cache(self):
        for model_class in self.all_generative_model_classes:
            if model_class._is_stateful:
-                self.skipTest("Stateful models don't support contrastive search generation")
+                self.skipTest(reason="Stateful models don't support contrastive search generation")

            # won't fix: FSMT and Reformer have a different cache variable type (and format).
            if any(model_name in model_class.__name__.lower() for model_name in ["fsmt", "reformer"]):
-                self.skipTest("Won't fix: old model with different cache format")
+                self.skipTest(reason="Won't fix: old model with different cache format")

            config, input_ids, attention_mask = self._get_input_ids_and_config()

            # NOTE: contrastive search only works with cache on at the moment.
            if not hasattr(config, "use_cache"):
-                self.skipTest("This model doesn't support caching")
+                self.skipTest(reason="This model doesn't support caching")
            config.use_cache = True
            config.is_decoder = True

@@ -1045,18 +1045,18 @@ class GenerationTesterMixin:
        # Check that choosing 'low_memory' does not change the model output
        for model_class in self.all_generative_model_classes:
            if model_class._is_stateful:
-                self.skipTest("Stateful models don't support contrastive search generation")
+                self.skipTest(reason="Stateful models don't support contrastive search generation")

            if any(model_name in model_class.__name__.lower() for model_name in ["fsmt", "reformer", "speech2text"]):
-                self.skipTest("Won't fix: old model with different cache format")
+                self.skipTest(reason="Won't fix: old model with different cache format")
            if any(model_name in model_class.__name__.lower() for model_name in ["gptbigcode"]):
-                self.skipTest("TODO: fix me")
+                self.skipTest(reason="TODO: fix me")

            config, input_ids, attention_mask = self._get_input_ids_and_config(batch_size=1)

            # NOTE: contrastive search only works with cache on at the moment.
            if not hasattr(config, "use_cache"):
-                self.skipTest("This model doesn't support caching")
+                self.skipTest(reason="This model doesn't support caching")

            config.use_cache = True
            config.is_decoder = True
@@ -1087,9 +1087,9 @@ class GenerationTesterMixin:
        # Check that choosing 'low_memory' does not change the model output
        for model_class in self.all_generative_model_classes:
            if model_class._is_stateful:
-                self.skipTest("May fix in the future: need custom cache handling")
+                self.skipTest(reason="May fix in the future: need custom cache handling")
            if any(model_name in model_class.__name__.lower() for model_name in ["fsmt", "reformer"]):
-                self.skipTest("Won't fix: old model with different cache format")
+                self.skipTest(reason="Won't fix: old model with different cache format")
            if any(
                model_name in model_class.__name__.lower()
                for model_name in [
@@ -1102,7 +1102,7 @@ class GenerationTesterMixin:
                    "jamba",
                ]
            ):
-                self.skipTest("May fix in the future: need model-specific fixes")
+                self.skipTest(reason="May fix in the future: need model-specific fixes")
            config, input_ids, _ = self._get_input_ids_and_config(batch_size=2)
            # batch_size=1 is ok, but batch_size>1 will cause non-identical output

@@ -1135,9 +1135,9 @@ class GenerationTesterMixin:

        for model_class in self.all_generative_model_classes:
            if model_class._is_stateful:
-                self.skipTest("Stateful models don't support assisted generation")
+                self.skipTest(reason="Stateful models don't support assisted generation")
            if any(model_name in model_class.__name__.lower() for model_name in ["fsmt", "reformer"]):
-                self.skipTest("Won't fix: old model with different cache format")
+                self.skipTest(reason="Won't fix: old model with different cache format")
            if any(
                model_name in model_class.__name__.lower()
                for model_name in [
@@ -1151,14 +1151,14 @@ class GenerationTesterMixin:
                    "clvp",
                ]
            ):
-                self.skipTest("May fix in the future: need model-specific fixes")
+                self.skipTest(reason="May fix in the future: need model-specific fixes")

            # enable cache
            config, input_ids, attention_mask = self._get_input_ids_and_config(batch_size=1)

            # NOTE: assisted generation only works with cache on at the moment.
            if not hasattr(config, "use_cache"):
-                self.skipTest("This model doesn't support caching")
+                self.skipTest(reason="This model doesn't support caching")

            config.use_cache = True
            config.is_decoder = True
@@ -1206,9 +1206,9 @@ class GenerationTesterMixin:

        for model_class in self.all_generative_model_classes:
            if model_class._is_stateful:
-                self.skipTest("Stateful models don't support assisted generation")
+                self.skipTest(reason="Stateful models don't support assisted generation")
            if any(model_name in model_class.__name__.lower() for model_name in ["fsmt", "reformer"]):
-                self.skipTest("Won't fix: old model with different cache format")
+                self.skipTest(reason="Won't fix: old model with different cache format")
            if any(
                model_name in model_class.__name__.lower()
                for model_name in [
@@ -1222,14 +1222,14 @@ class GenerationTesterMixin:
                    "clvp",
                ]
            ):
-                self.skipTest("May fix in the future: need model-specific fixes")
+                self.skipTest(reason="May fix in the future: need model-specific fixes")

            # enable cache
            config, input_ids, attention_mask = self._get_input_ids_and_config(batch_size=1)

            # NOTE: assisted generation only works with cache on at the moment.
            if not hasattr(config, "use_cache"):
-                self.skipTest("This model doesn't support caching")
+                self.skipTest(reason="This model doesn't support caching")

            config.use_cache = True
            config.is_decoder = True
@@ -1268,9 +1268,9 @@ class GenerationTesterMixin:
        # different shapes, see https://github.com/huggingface/transformers/issues/25420#issuecomment-1775317535).
        for model_class in self.all_generative_model_classes:
            if model_class._is_stateful:
-                self.skipTest("Stateful models don't support assisted generation")
+                self.skipTest(reason="Stateful models don't support assisted generation")
            if any(model_name in model_class.__name__.lower() for model_name in ["fsmt", "reformer"]):
-                self.skipTest("Won't fix: old model with different cache format")
+                self.skipTest(reason="Won't fix: old model with different cache format")
            if any(
                model_name in model_class.__name__.lower()
                for model_name in [
@@ -1284,14 +1284,14 @@ class GenerationTesterMixin:
                    "clvp",
                ]
            ):
-                self.skipTest("May fix in the future: need model-specific fixes")
+                self.skipTest(reason="May fix in the future: need model-specific fixes")

            # enable cache
            config, input_ids, attention_mask = self._get_input_ids_and_config(batch_size=1)

            # NOTE: assisted generation only works with cache on at the moment.
            if not hasattr(config, "use_cache"):
-                self.skipTest("This model doesn't support caching")
+                self.skipTest(reason="This model doesn't support caching")

            config.use_cache = True
            config.is_decoder = True
@@ -1436,7 +1436,7 @@ class GenerationTesterMixin:

            # If it doesn't support cache, pass the test
            if not hasattr(config, "use_cache"):
-                self.skipTest("This model doesn't support caching")
+                self.skipTest(reason="This model doesn't support caching")

            model = model_class(config).to(torch_device)
            if "use_cache" not in inputs:
@@ -1445,7 +1445,7 @@ class GenerationTesterMixin:

            # If "past_key_values" is not returned, pass the test (e.g. RWKV uses a different cache name and format)
            if "past_key_values" not in outputs:
-                self.skipTest("This model doesn't return `past_key_values`")
+                self.skipTest(reason="This model doesn't return `past_key_values`")

            num_hidden_layers = (
                getattr(config, "decoder_layers", None)
@@ -1553,14 +1553,14 @@ class GenerationTesterMixin:
        # Tests that we can continue generating from past key values, returned from a previous `generate` call
        for model_class in self.all_generative_model_classes:
            if any(model_name in model_class.__name__.lower() for model_name in ["imagegpt"]):
-                self.skipTest("Won't fix: old model with unique inputs/caches/other")
+                self.skipTest(reason="Won't fix: old model with unique inputs/caches/other")
            if any(model_name in model_class.__name__.lower() for model_name in ["umt5"]):
-                self.skipTest("TODO: needs modeling or test input preparation fixes for compatibility")
+                self.skipTest(reason="TODO: needs modeling or test input preparation fixes for compatibility")

            config, inputs = self.model_tester.prepare_config_and_inputs_for_common()

            if not hasattr(config, "use_cache"):
-                self.skipTest("This model doesn't support caching")
+                self.skipTest(reason="This model doesn't support caching")

            # Let's make it always:
            # 1. use cache (for obvious reasons)
@@ -1582,7 +1582,7 @@ class GenerationTesterMixin:
            # If "past_key_values" is not returned, skip the test (e.g. RWKV uses a different cache name and format)
            outputs = model(**inputs)
            if "past_key_values" not in outputs:
-                self.skipTest("This model doesn't return `past_key_values`")
+                self.skipTest(reason="This model doesn't return `past_key_values`")

            # Traditional way of generating text, with `return_dict_in_generate` to return the past key values
            outputs = model.generate(**inputs, do_sample=False, max_new_tokens=4, return_dict_in_generate=True)
@@ -1632,7 +1632,7 @@ class GenerationTesterMixin:
        # 👉 tests with and without sampling so we can cover the most common use cases.
        for model_class in self.all_generative_model_classes:
            if not model_class._supports_cache_class:
-                self.skipTest("This model does not support the new cache format")
+                self.skipTest(reason="This model does not support the new cache format")

            config, input_ids, attention_mask = self._get_input_ids_and_config()
            config.use_cache = True
@@ -1689,7 +1689,7 @@ class GenerationTesterMixin:
    def test_generate_with_quant_cache(self):
        for model_class in self.all_generative_model_classes:
            if not model_class._supports_quantized_cache:
-                self.skipTest("This model does not support the quantized cache format")
+                self.skipTest(reason="This model does not support the quantized cache format")

            config, input_ids, attention_mask = self._get_input_ids_and_config()
            config.use_cache = True

--- a/tests/models/albert/test_tokenization_albert.py
+++ b/tests/models/albert/test_tokenization_albert.py
@@ -67,7 +67,7 @@ class AlbertTokenizationTest(TokenizerTesterMixin, unittest.TestCase):

    def test_rust_and_python_full_tokenizers(self):
        if not self.test_rust_tokenizer:
-            return
+            self.skipTest(reason="test_rust_tokenizer is set to False")

        tokenizer = self.get_tokenizer()
        rust_tokenizer = self.get_rust_tokenizer()

--- a/tests/models/align/test_modeling_align.py
+++ b/tests/models/align/test_modeling_align.py
@@ -23,7 +23,6 @@ import requests

 from transformers import AlignConfig, AlignProcessor, AlignTextConfig, AlignVisionConfig
 from transformers.testing_utils import (
-    is_flax_available,
    require_torch,
    require_vision,
    slow,
@@ -56,10 +55,6 @@ if is_vision_available():
    from PIL import Image


-if is_flax_available():
-    pass
-
-
 class AlignVisionModelTester:
    def __init__(
        self,
@@ -215,9 +210,11 @@ class AlignVisionModelTest(ModelTesterMixin, unittest.TestCase):

            check_hidden_states_output(inputs_dict, config, model_class)

+    @unittest.skip
    def test_training(self):
        pass

+    @unittest.skip
    def test_training_gradient_checkpointing(self):
        pass

@@ -355,9 +352,11 @@ class AlignTextModelTest(ModelTesterMixin, unittest.TestCase):
        config_and_inputs = self.model_tester.prepare_config_and_inputs()
        self.model_tester.create_and_check_model(*config_and_inputs)

+    @unittest.skip
    def test_training(self):
        pass

+    @unittest.skip
    def test_training_gradient_checkpointing(self):
        pass

@@ -518,7 +517,7 @@ class AlignModelTest(ModelTesterMixin, PipelineTesterMixin, unittest.TestCase):

    def _create_and_check_torchscript(self, config, inputs_dict):
        if not self.test_torchscript:
-            return
+            self.skipTest(reason="test_torchscript is set to False")

        configs_no_init = _config_zero_init(config)  # To be sure we have no Nan
        configs_no_init.torchscript = True

--- a/tests/models/altclip/test_modeling_altclip.py
+++ b/tests/models/altclip/test_modeling_altclip.py
@@ -178,9 +178,11 @@ class AltCLIPVisionModelTest(ModelTesterMixin, unittest.TestCase):
        config_and_inputs = self.model_tester.prepare_config_and_inputs()
        self.model_tester.create_and_check_model(*config_and_inputs)

+    @unittest.skip
    def test_training(self):
        pass

+    @unittest.skip
    def test_training_gradient_checkpointing(self):
        pass

@@ -309,7 +311,7 @@ class AltCLIPTextModelTest(ModelTesterMixin, unittest.TestCase):
    test_head_masking = False

    # TODO (@SunMarc): Fix me
-    @unittest.skip("It's broken.")
+    @unittest.skip(reason="It's broken.")
    def test_resize_tokens_embeddings(self):
        super().test_resize_tokens_embeddings()

@@ -324,9 +326,11 @@ class AltCLIPTextModelTest(ModelTesterMixin, unittest.TestCase):
        config_and_inputs = self.model_tester.prepare_config_and_inputs()
        self.model_tester.create_and_check_model(*config_and_inputs)

+    @unittest.skip
    def test_training(self):
        pass

+    @unittest.skip
    def test_training_gradient_checkpointing(self):
        pass

@@ -487,7 +491,7 @@ class AltCLIPModelTest(ModelTesterMixin, PipelineTesterMixin, unittest.TestCase)

    def _create_and_check_torchscript(self, config, inputs_dict):
        if not self.test_torchscript:
-            return
+            self.skipTest(reason="test_torchscript is set to False")

        configs_no_init = _config_zero_init(config)  # To be sure we have no Nan
        configs_no_init.torchscript = True

--- a/tests/models/bark/test_modeling_bark.py
+++ b/tests/models/bark/test_modeling_bark.py
@@ -754,7 +754,7 @@ class BarkFineModelTest(ModelTesterMixin, unittest.TestCase):
            with torch.no_grad():
                model(**inputs)[0]

-    @unittest.skip("FineModel relies on codebook idx and does not return same logits")
+    @unittest.skip(reason="FineModel relies on codebook idx and does not return same logits")
    def test_inputs_embeds_matches_input_ids(self):
        pass

@@ -826,7 +826,7 @@ class BarkFineModelTest(ModelTesterMixin, unittest.TestCase):
        # resizing tokens_embeddings of a ModuleList
        original_config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
        if not self.test_resize_embeddings:
-            return
+            self.skipTest(reason="test_resize_embeddings is False")

        for model_class in self.all_model_classes:
            config = copy.deepcopy(original_config)
@@ -877,7 +877,7 @@ class BarkFineModelTest(ModelTesterMixin, unittest.TestCase):
        # resizing tokens_embeddings of a ModuleList
        original_config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
        if not self.test_resize_embeddings:
-            return
+            self.skipTest(reason="test_resize_embeddings is False")

        original_config.tie_word_embeddings = False

@@ -931,7 +931,7 @@ class BarkFineModelTest(ModelTesterMixin, unittest.TestCase):
    def test_flash_attn_2_inference_equivalence(self):
        for model_class in self.all_model_classes:
            if not model_class._supports_flash_attn_2:
-                return
+                self.skipTest(reason="Model does not support flash_attention_2")

            config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
            model = model_class(config)
@@ -988,7 +988,7 @@ class BarkFineModelTest(ModelTesterMixin, unittest.TestCase):
    def test_flash_attn_2_inference_equivalence_right_padding(self):
        for model_class in self.all_model_classes:
            if not model_class._supports_flash_attn_2:
-                return
+                self.skipTest(reason="Model does not support flash_attention_2")

            config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
            model = model_class(config)

--- a/tests/models/bart/test_modeling_bart.py
+++ b/tests/models/bart/test_modeling_bart.py
@@ -1515,9 +1515,10 @@ class BartStandaloneDecoderModelTest(ModelTesterMixin, GenerationTesterMixin, un
        config_and_inputs = self.model_tester.prepare_config_and_inputs()
        self.model_tester.create_and_check_decoder_model_attention_mask_past(*config_and_inputs)

+    @unittest.skip(reason="Decoder cannot keep gradients")
    def test_retain_grad_hidden_states_attentions(self):
-        # decoder cannot keep gradients
        return

+    @unittest.skip
    def test_save_load_fast_init_from_base(self):
        pass
--- a/tests/models/bart/test_tokenization_bart.py
+++ b/tests/models/bart/test_tokenization_bart.py
@@ -147,6 +147,7 @@ class TestTokenizationBart(TokenizerTesterMixin, unittest.TestCase):
            self.assertTrue((input_ids[:, -1] == tokenizer.eos_token_id).all().item())
            self.assertTrue((labels[:, -1] == tokenizer.eos_token_id).all().item())

+    @unittest.skip
    def test_pretokenized_inputs(self):
        pass


--- a/tests/models/barthez/test_tokenization_barthez.py
+++ b/tests/models/barthez/test_tokenization_barthez.py
@@ -75,7 +75,7 @@ class BarthezTokenizationTest(TokenizerTesterMixin, unittest.TestCase):

    def test_rust_and_python_full_tokenizers(self):
        if not self.test_rust_tokenizer:
-            return
+            self.skipTest(reason="test_rust_tokenizer is set to False")

        tokenizer = self.get_tokenizer()
        rust_tokenizer = self.get_rust_tokenizer()

--- a/tests/models/beit/test_modeling_beit.py
+++ b/tests/models/beit/test_modeling_beit.py
@@ -301,7 +301,7 @@ class BeitModelTest(ModelTesterMixin, PipelineTesterMixin, unittest.TestCase):

    def test_training(self):
        if not self.model_tester.is_training:
-            return
+            self.skipTest(reason="model_tester.is_training is set to False")

        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
        config.return_dict = True
@@ -325,7 +325,7 @@ class BeitModelTest(ModelTesterMixin, PipelineTesterMixin, unittest.TestCase):
    def test_training_gradient_checkpointing(self):
        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
        if not self.model_tester.is_training:
-            return
+            self.skipTest(reason="model_tester.is_training is set to False")

        config.use_cache = False
        config.return_dict = True

--- a/tests/models/bert/test_modeling_bert.py
+++ b/tests/models/bert/test_modeling_bert.py
@@ -614,7 +614,7 @@ class BertModelTest(ModelTesterMixin, GenerationTesterMixin, PipelineTesterMixin
        for model_class in self.all_model_classes:
            # BertForMultipleChoice behaves incorrectly in JIT environments.
            if model_class == BertForMultipleChoice:
-                return
+                self.skipTest(reason="BertForMultipleChoice behaves incorrectly in JIT environments.")

            config.torchscript = True
            model = model_class(config=config)

--- a/tests/models/bert/test_tokenization_bert.py
+++ b/tests/models/bert/test_tokenization_bert.py
@@ -79,7 +79,7 @@ class BertTokenizationTest(TokenizerTesterMixin, unittest.TestCase):

    def test_rust_and_python_full_tokenizers(self):
        if not self.test_rust_tokenizer:
-            return
+            self.skipTest(reason="test_rust_tokenizer is set to False")

        tokenizer = self.get_tokenizer()
        rust_tokenizer = self.get_rust_tokenizer()

--- a/tests/models/big_bird/test_modeling_big_bird.py
+++ b/tests/models/big_bird/test_modeling_big_bird.py
@@ -716,7 +716,7 @@ class BigBirdModelIntegrationTest(unittest.TestCase):
        """

        if not self.test_attention_probs:
-            return
+            self.skip("test_attention_probs is set to False")

        model = BigBirdModel.from_pretrained(
            "google/bigbird-roberta-base", attention_type="block_sparse", num_random_blocks=3, block_size=16

--- a/tests/models/big_bird/test_tokenization_big_bird.py
+++ b/tests/models/big_bird/test_tokenization_big_bird.py
@@ -63,7 +63,7 @@ class BigBirdTokenizationTest(TokenizerTesterMixin, unittest.TestCase):

    def test_rust_and_python_full_tokenizers(self):
        if not self.test_rust_tokenizer:
-            return
+            self.skipTest(reason="test_rust_tokenizer is set to False")

        tokenizer = self.get_tokenizer()
        rust_tokenizer = self.get_rust_tokenizer()