"examples/flax/vscode:/vscode.git/clone" did not exist on "ea5567502441135fc1cdab4abdf77fa710461ec3"
testing.rst 44.4 KB
Newer Older
1
..
Sylvain Gugger's avatar
Sylvain Gugger committed
2
3
4
5
6
7
8
9
10
11
12
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

13
Testing
Sylvain Gugger's avatar
Sylvain Gugger committed
14
=======================================================================================================================
15
16
17
18
19
20
21
22
23
24


Let's take a look at how 🤗 Transformer models are tested and how you can write new tests and improve the existing ones.

There are 2 test suites in the repository:

1. ``tests`` -- tests for the general API
2. ``examples`` -- tests primarily for various applications that aren't part of the API

How transformers are tested
Sylvain Gugger's avatar
Sylvain Gugger committed
25
-----------------------------------------------------------------------------------------------------------------------
26

Sylvain Gugger's avatar
Sylvain Gugger committed
27
1. Once a PR is submitted it gets tested with 9 CircleCi jobs. Every new commit to that PR gets retested. These jobs
28
29
   are defined in this :prefix_link:`config file <.circleci/config.yml>`, so that if needed you can reproduce the same
   environment on your machine.
Sylvain Gugger's avatar
Sylvain Gugger committed
30

31
   These CI jobs don't run ``@slow`` tests.
Sylvain Gugger's avatar
Sylvain Gugger committed
32

33
34
2. There are 3 jobs run by `github actions <https://github.com/huggingface/transformers/actions>`__:

35
36
   * :prefix_link:`torch hub integration <.github/workflows/github-torch-hub.yml>`: checks whether torch hub
     integration works.
Sylvain Gugger's avatar
Sylvain Gugger committed
37

38
39
40
   * :prefix_link:`self-hosted (push) <.github/workflows/self-push.yml>`: runs fast tests on GPU only on commits on
     ``master``. It only runs if a commit on ``master`` has updated the code in one of the following folders: ``src``,
     ``tests``, ``.github`` (to prevent running on added model cards, notebooks, etc.)
41

42
43
   * :prefix_link:`self-hosted runner <.github/workflows/self-scheduled.yml>`: runs normal and slow tests on GPU in
     ``tests`` and ``examples``:
44
45
46

   .. code-block:: bash

Stas Bekman's avatar
Stas Bekman committed
47
48
    RUN_SLOW=1 pytest tests/
    RUN_SLOW=1 pytest examples/
49
50
51
52
53
54

   The results can be observed `here <https://github.com/huggingface/transformers/actions>`__.



Running tests
Sylvain Gugger's avatar
Sylvain Gugger committed
55
-----------------------------------------------------------------------------------------------------------------------
56
57
58
59
60
61





Choosing which tests to run
Sylvain Gugger's avatar
Sylvain Gugger committed
62
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
63

Sylvain Gugger's avatar
Sylvain Gugger committed
64
65
This document goes into many details of how tests can be run. If after reading everything, you need even more details
you will find them `here <https://docs.pytest.org/en/latest/usage.html>`__.
66
67
68
69
70
71
72

Here are some most useful ways of running tests.

Run all:

.. code-block:: console

73
    pytest
74
75
76
77
78

or:

.. code-block:: bash

79
    make test
80
81
82
83
84

Note that the latter is defined as:

.. code-block:: bash

85
    python -m pytest -n auto --dist=loadfile -s -v ./tests/
86
87
88
89
90
91
92
93
94
95
96

which tells pytest to:

* run as many test processes as they are CPU cores (which could be too many if you don't have a ton of RAM!)
* ensure that all tests from the same file will be run by the same test process
* do not capture output
* run in verbose mode



Getting the list of all tests
Sylvain Gugger's avatar
Sylvain Gugger committed
97
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
98
99
100
101
102

All tests of the test suite:

.. code-block:: bash

103
    pytest --collect-only -q
104
105
106
107
108

All tests of a given test file:

.. code-block:: bash

109
    pytest tests/test_optimization.py --collect-only -q
110
111


Sylvain Gugger's avatar
Sylvain Gugger committed
112

113
Run a specific test module
Sylvain Gugger's avatar
Sylvain Gugger committed
114
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
115
116
117
118
119

To run an individual test module:

.. code-block:: bash

120
    pytest tests/test_logging.py
Sylvain Gugger's avatar
Sylvain Gugger committed
121

122
123

Run specific tests
Sylvain Gugger's avatar
Sylvain Gugger committed
124
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
125

Sylvain Gugger's avatar
Sylvain Gugger committed
126
127
Since unittest is used inside most of the tests, to run specific subtests you need to know the name of the unittest
class containing those tests. For example, it could be:
128
129
130

.. code-block:: bash

131
    pytest tests/test_optimization.py::OptimizationTest::test_adam_w
132
133
134
135
136
137
138
139
140
141
142

Here:

* ``tests/test_optimization.py`` - the file with tests
* ``OptimizationTest`` - the name of the class
* ``test_adam_w`` - the name of the specific test function

If the file contains multiple classes, you can choose to run only tests of a given class. For example:

.. code-block:: bash

143
    pytest tests/test_optimization.py::OptimizationTest
144
145
146
147
148
149
150
151


will run all the tests inside that class.

As mentioned earlier you can see what tests are contained inside the ``OptimizationTest`` class by running:

.. code-block:: bash

152
    pytest tests/test_optimization.py::OptimizationTest --collect-only -q
153
154
155
156
157
158
159

You can run tests by keyword expressions.

To run only tests whose name contains ``adam``:

.. code-block:: bash

160
    pytest -k adam tests/test_optimization.py
161

162
163
164
Logical ``and`` and ``or`` can be used to indicate whether all keywords should match or either. ``not`` can be used to
negate.

165
166
167
168
To run all tests except those whose name contains ``adam``:

.. code-block:: bash

169
    pytest -k "not adam" tests/test_optimization.py
170
171
172
173
174

And you can combine the two patterns in one:

.. code-block:: bash

175
    pytest -k "ada and not adam" tests/test_optimization.py
176

177
178
179
180
For example to run both ``test_adafactor`` and ``test_adam_w`` you can use:

.. code-block:: bash

181
    pytest -k "test_adam_w or test_adam_w" tests/test_optimization.py
182
183
184
185
186
187
188

Note that we use ``or`` here, since we want either of the keywords to match to include both.

If you want to include only tests that include both patterns, ``and`` is to be used:

.. code-block:: bash

189
    pytest -k "test and ada" tests/test_optimization.py
190

191
192
193


Run only modified tests
Sylvain Gugger's avatar
Sylvain Gugger committed
194
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
195

Sylvain Gugger's avatar
Sylvain Gugger committed
196
197
198
You can run the tests related to the unstaged files or the current branch (according to Git) by using `pytest-picked
<https://github.com/anapaulagomes/pytest-picked>`__. This is a great way of quickly testing your changes didn't break
anything, since it won't run the tests related to files you didn't touch.
199
200
201
202
203
204
205
206
207

.. code-block:: bash

    pip install pytest-picked

.. code-block:: bash

    pytest --picked

Sylvain Gugger's avatar
Sylvain Gugger committed
208
All tests will be run from files and folders which are modified, but not yet committed.
209
210

Automatically rerun failed tests on source modification
Sylvain Gugger's avatar
Sylvain Gugger committed
211
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
212

Sylvain Gugger's avatar
Sylvain Gugger committed
213
214
215
`pytest-xdist <https://github.com/pytest-dev/pytest-xdist>`__ provides a very useful feature of detecting all failed
tests, and then waiting for you to modify files and continuously re-rerun those failing tests until they pass while you
fix them. So that you don't need to re start pytest after you made the fix. This is repeated until all tests pass after
216
217
218
219
220
221
222
223
which again a full run is performed.

.. code-block:: bash

    pip install pytest-xdist

To enter the mode: ``pytest -f`` or ``pytest --looponfail``

Sylvain Gugger's avatar
Sylvain Gugger committed
224
225
226
File changes are detected by looking at ``looponfailroots`` root directories and all of their contents (recursively).
If the default for this value does not work for you, you can change it in your project by setting a configuration
option in ``setup.cfg``:
227
228
229
230
231
232
233
234
235
236
237
238
239

.. code-block:: ini

    [tool:pytest]
    looponfailroots = transformers tests

or ``pytest.ini``/``tox.ini`` files:

.. code-block:: ini

    [pytest]
    looponfailroots = transformers tests

Sylvain Gugger's avatar
Sylvain Gugger committed
240
241
This would lead to only looking for file changes in the respective directories, specified relatively to the ini-file’s
directory.
242

Sylvain Gugger's avatar
Sylvain Gugger committed
243
`pytest-watch <https://github.com/joeyespo/pytest-watch>`__ is an alternative implementation of this functionality.
244
245
246


Skip a test module
Sylvain Gugger's avatar
Sylvain Gugger committed
247
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
248

Sylvain Gugger's avatar
Sylvain Gugger committed
249
250
If you want to run all test modules, except a few you can exclude them by giving an explicit list of tests to run. For
example, to run all except ``test_modeling_*.py`` tests:
251
252
253

.. code-block:: bash

254
    pytest `ls -1 tests/*py | grep -v test_modeling`
255
256
257


Clearing state
Sylvain Gugger's avatar
Sylvain Gugger committed
258
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
259

Sylvain Gugger's avatar
Sylvain Gugger committed
260
CI builds and when isolation is important (against speed), cache should be cleared:
261
262
263
264
265
266

.. code-block:: bash

    pytest --cache-clear tests

Running tests in parallel
Sylvain Gugger's avatar
Sylvain Gugger committed
267
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
268

Sylvain Gugger's avatar
Sylvain Gugger committed
269
270
As mentioned earlier ``make test`` runs tests in parallel via ``pytest-xdist`` plugin (``-n X`` argument, e.g. ``-n 2``
to run 2 parallel jobs).
271

Sylvain Gugger's avatar
Sylvain Gugger committed
272
273
``pytest-xdist``'s ``--dist=`` option allows one to control how the tests are grouped. ``--dist=loadfile`` puts the
tests located in one file onto the same process.
274

Sylvain Gugger's avatar
Sylvain Gugger committed
275
276
277
278
Since the order of executed tests is different and unpredictable, if running the test suite with ``pytest-xdist``
produces failures (meaning we have some undetected coupled tests), use `pytest-replay
<https://github.com/ESSS/pytest-replay>`__ to replay the tests in the same order, which should help with then somehow
reducing that failing sequence to a minimum.
279
280

Test order and repetition
Sylvain Gugger's avatar
Sylvain Gugger committed
281
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
282

Sylvain Gugger's avatar
Sylvain Gugger committed
283
284
285
It's good to repeat the tests several times, in sequence, randomly, or in sets, to detect any potential
inter-dependency and state-related bugs (tear down). And the straightforward multiple repetition is just good to detect
some problems that get uncovered by randomness of DL.
286
287
288


Repeat tests
Sylvain Gugger's avatar
Sylvain Gugger committed
289
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
290
291
292
293
294

* `pytest-flakefinder <https://github.com/dropbox/pytest-flakefinder>`__:

.. code-block:: bash

295
    pip install pytest-flakefinder
296
297
298
299
300

And then run every test multiple times (50 by default):

.. code-block:: bash

301
    pytest --flake-finder --flake-runs=5 tests/test_failing_test.py
Sylvain Gugger's avatar
Sylvain Gugger committed
302

303
304
.. note::
   This plugin doesn't work with ``-n`` flag from ``pytest-xdist``.
Sylvain Gugger's avatar
Sylvain Gugger committed
305

306
307
308
309
310
.. note::
   There is another plugin ``pytest-repeat``, but it doesn't work with ``unittest``.


Run tests in a random order
Sylvain Gugger's avatar
Sylvain Gugger committed
311
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
312
313
314
315
316

.. code-block:: bash

    pip install pytest-random-order

Sylvain Gugger's avatar
Sylvain Gugger committed
317
318
Important: the presence of ``pytest-random-order`` will automatically randomize tests, no configuration change or
command line options is required.
319

Sylvain Gugger's avatar
Sylvain Gugger committed
320
321
As explained earlier this allows detection of coupled tests - where one test's state affects the state of another. When
``pytest-random-order`` is installed it will print the random seed it used for that session, e.g:
322
323
324

.. code-block:: bash

325
326
327
328
    pytest tests
    [...]
    Using --random-order-bucket=module
    Using --random-order-seed=573663
329

Sylvain Gugger's avatar
Sylvain Gugger committed
330
So that if the given particular sequence fails, you can reproduce it by adding that exact seed, e.g.:
331
332
333

.. code-block:: bash

334
335
336
337
    pytest --random-order-seed=573663
    [...]
    Using --random-order-bucket=module
    Using --random-order-seed=573663
338

Sylvain Gugger's avatar
Sylvain Gugger committed
339
340
341
It will only reproduce the exact order if you use the exact same list of tests (or no list at all). Once you start to
manually narrowing down the list you can no longer rely on the seed, but have to list them manually in the exact order
they failed and tell pytest to not randomize them instead using ``--random-order-bucket=none``, e.g.:
342
343
344

.. code-block:: bash

345
    pytest --random-order-bucket=none tests/test_a.py tests/test_c.py tests/test_b.py
346
347
348
349
350
351
352

To disable the shuffling for all tests:

.. code-block:: bash

    pytest --random-order-bucket=none

Sylvain Gugger's avatar
Sylvain Gugger committed
353
354
355
By default ``--random-order-bucket=module`` is implied, which will shuffle the files on the module levels. It can also
shuffle on ``class``, ``package``, ``global`` and ``none`` levels. For the complete details please see its
`documentation <https://github.com/jbasko/pytest-random-order>`__.
356

Sylvain Gugger's avatar
Sylvain Gugger committed
357
358
359
Another randomization alternative is: ``pytest-randomly`` <https://github.com/pytest-dev/pytest-randomly>`__. This
module has a very similar functionality/interface, but it doesn't have the bucket modes available in
``pytest-random-order``. It has the same problem of imposing itself once installed.
360
361

Look and feel variations
Sylvain Gugger's avatar
Sylvain Gugger committed
362
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
363
364

pytest-sugar
Sylvain Gugger's avatar
Sylvain Gugger committed
365
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
366

Sylvain Gugger's avatar
Sylvain Gugger committed
367
368
`pytest-sugar <https://github.com/Frozenball/pytest-sugar>`__ is a plugin that improves the look-n-feel, adds a
progressbar, and show tests that fail and the assert instantly. It gets activated automatically upon installation.
369
370

.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
371

372
    pip install pytest-sugar
373
374
375
376
377
378
379
380
381
382
383
384

To run tests without it, run:

.. code-block:: bash

    pytest -p no:sugar

or uninstall it.



Report each sub-test name and its progress
Sylvain Gugger's avatar
Sylvain Gugger committed
385
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
386

Sylvain Gugger's avatar
Sylvain Gugger committed
387
For a single or a group of tests via ``pytest`` (after ``pip install pytest-pspec``):
388
389
390

.. code-block:: bash

391
    pytest --pspec tests/test_optimization.py
392
393
394
395



Instantly shows failed tests
Sylvain Gugger's avatar
Sylvain Gugger committed
396
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
397

Sylvain Gugger's avatar
Sylvain Gugger committed
398
399
`pytest-instafail <https://github.com/pytest-dev/pytest-instafail>`__ shows failures and errors instantly instead of
waiting until the end of test session.
400
401
402
403
404
405
406
407
408
409

.. code-block:: bash

    pip install pytest-instafail

.. code-block:: bash

    pytest --instafail

To GPU or not to GPU
Sylvain Gugger's avatar
Sylvain Gugger committed
410
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
411
412
413
414

On a GPU-enabled setup, to test in CPU-only mode add ``CUDA_VISIBLE_DEVICES=""``:

.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
415

416
417
    CUDA_VISIBLE_DEVICES="" pytest tests/test_logging.py

Sylvain Gugger's avatar
Sylvain Gugger committed
418
419
or if you have multiple gpus, you can specify which one is to be used by ``pytest``. For example, to use only the
second gpu if you have gpus ``0`` and ``1``, you can run:
420
421

.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
422

423
424
425
    CUDA_VISIBLE_DEVICES="1" pytest tests/test_logging.py

This is handy when you want to run different tasks on different GPUs.
426

Sylvain Gugger's avatar
Sylvain Gugger committed
427
428
Some tests must be run on CPU-only, others on either CPU or GPU or TPU, yet others on multiple-GPUs. The following skip
decorators are used to set the requirements of tests CPU/GPU/TPU-wise:
429
430
431

* ``require_torch`` - this test will run only under torch
* ``require_torch_gpu`` - as ``require_torch`` plus requires at least 1 GPU
432
433
* ``require_torch_multi_gpu`` - as ``require_torch`` plus requires at least 2 GPUs
* ``require_torch_non_multi_gpu`` - as ``require_torch`` plus requires 0 or 1 GPUs
434
* ``require_torch_up_to_2_gpus`` - as ``require_torch`` plus requires 0 or 1 or 2 GPUs
435
436
* ``require_torch_tpu`` - as ``require_torch`` plus requires at least 1 TPU

437
438
439
Let's depict the GPU requirements in the following table:


440
441
442
443
444
445
446
447
448
449
450
+----------+----------------------------------+
| n gpus   |  decorator                       |
+==========+==================================+
| ``>= 0`` | ``@require_torch``               |
+----------+----------------------------------+
| ``>= 1`` | ``@require_torch_gpu``           |
+----------+----------------------------------+
| ``>= 2`` | ``@require_torch_multi_gpu``     |
+----------+----------------------------------+
| ``< 2``  | ``@require_torch_non_multi_gpu`` |
+----------+----------------------------------+
451
452
| ``< 3``  | ``@require_torch_up_to_2_gpus``  |
+----------+----------------------------------+
453
454


455
456
457
458
For example, here is a test that must be run only when there are 2 or more GPUs available and pytorch is installed:

.. code-block:: python

459
460
    @require_torch_multi_gpu
    def test_example_with_multi_gpu():
461
462
463
464
465
466
467
468

If a test requires ``tensorflow`` use the ``require_tf`` decorator. For example:

.. code-block:: python

    @require_tf
    def test_tf_thing_with_tensorflow():

Sylvain Gugger's avatar
Sylvain Gugger committed
469
470
These decorators can be stacked. For example, if a test is slow and requires at least one GPU under pytorch, here is
how to set it up:
471
472
473
474
475
476

.. code-block:: python

    @require_torch_gpu
    @slow
    def test_example_slow_on_gpu():
477

Sylvain Gugger's avatar
Sylvain Gugger committed
478
479
Some decorators like ``@parametrized`` rewrite test names, therefore ``@require_*`` skip decorators have to be listed
last for them to work correctly. Here is an example of the correct usage:
480
481
482
483

.. code-block:: python

    @parameterized.expand(...)
484
    @require_torch_multi_gpu
485
486
    def test_integration_foo():

Sylvain Gugger's avatar
Sylvain Gugger committed
487
488
This order problem doesn't exist with ``@pytest.mark.parametrize``, you can put it first or last and it will still
work. But it only works with non-unittests.
489
490
491
492
493
494
495

Inside tests:

* How many GPUs are available:

.. code-block:: bash

496
497
    from transformers.testing_utils import get_gpu_count
    n_gpu = get_gpu_count() # works with torch and tf
498
499


Sylvain Gugger's avatar
Sylvain Gugger committed
500

501
502
503
Distributed training
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Sylvain Gugger's avatar
Sylvain Gugger committed
504
505
506
``pytest`` can't deal with distributed training directly. If this is attempted - the sub-processes don't do the right
thing and end up thinking they are ``pytest`` and start running the test suite in loops. It works, however, if one
spawns a normal process that then spawns off multiple workers and manages the IO pipes.
507

508
Here are some tests that use it:
509

510
511
* :prefix_link:`test_trainer_distributed.py <tests/test_trainer_distributed.py>`
* :prefix_link:`test_deepspeed.py <tests/deepspeed/test_deepspeed.py>`
512

513
To jump right into the execution point, search for the ``execute_subprocess_async`` call in those tests.
514
515
516
517
518

You will need at least 2 GPUs to see these tests in action:

.. code-block:: bash

519
    CUDA_VISIBLE_DEVICES=0,1 RUN_SLOW=1 pytest -sv tests/test_trainer_distributed.py
520
521
522


Output capture
Sylvain Gugger's avatar
Sylvain Gugger committed
523
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
524

Sylvain Gugger's avatar
Sylvain Gugger committed
525
526
During test execution any output sent to ``stdout`` and ``stderr`` is captured. If a test or a setup method fails, its
according captured output will usually be shown along with the failure traceback.
527

Sylvain Gugger's avatar
Sylvain Gugger committed
528
To disable output capturing and to get the ``stdout`` and ``stderr`` normally, use ``-s`` or ``--capture=no``:
529
530
531

.. code-block:: bash

532
    pytest -s tests/test_logging.py
533
534
535
536
537

To send test results to JUnit format output:

.. code-block:: bash

538
    py.test tests --junitxml=result.xml
539
540
541


Color control
Sylvain Gugger's avatar
Sylvain Gugger committed
542
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
543
544
545
546
547

To have no color (e.g., yellow on white background is not readable):

.. code-block:: bash

548
    pytest --color=no tests/test_logging.py
549
550
551
552



Sending test report to online pastebin service
Sylvain Gugger's avatar
Sylvain Gugger committed
553
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
554
555
556
557
558

Creating a URL for each test failure:

.. code-block:: bash

559
    pytest --pastebin=failed tests/test_logging.py
560

Sylvain Gugger's avatar
Sylvain Gugger committed
561
562
This will submit test run information to a remote Paste service and provide a URL for each failure. You may select
tests as usual or add for example -x if you only want to send one particular failure.
563
564
565
566
567

Creating a URL for a whole test session log:

.. code-block:: bash

568
    pytest --pastebin=all tests/test_logging.py
569
570
571
572



Writing tests
Sylvain Gugger's avatar
Sylvain Gugger committed
573
-----------------------------------------------------------------------------------------------------------------------
574

Sylvain Gugger's avatar
Sylvain Gugger committed
575
576
🤗 transformers tests are based on ``unittest``, but run by ``pytest``, so most of the time features from both systems
can be used.
577

Sylvain Gugger's avatar
Sylvain Gugger committed
578
579
580
You can read `here <https://docs.pytest.org/en/stable/unittest.html>`__ which features are supported, but the important
thing to remember is that most ``pytest`` fixtures don't work. Neither parametrization, but we use the module
``parameterized`` that works in a similar way.
581
582
583


Parametrization
Sylvain Gugger's avatar
Sylvain Gugger committed
584
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
585

Sylvain Gugger's avatar
Sylvain Gugger committed
586
587
Often, there is a need to run the same test multiple times, but with different arguments. It could be done from within
the test, but then there is no way of running that test for just one set of arguments.
588
589

.. code-block:: python
Sylvain Gugger's avatar
Sylvain Gugger committed
590

591
592
    # test_this1.py
    import unittest
593
594
    from parameterized import parameterized
    class TestMathUnitTest(unittest.TestCase):
595
596
597
598
599
600
601
        @parameterized.expand([
            ("negative", -1.5, -2.0),
            ("integer", 1, 1.0),
            ("large fraction", 1.6, 1),
        ])
        def test_floor(self, name, input, expected):
            assert_equal(math.floor(input), expected)
602

Sylvain Gugger's avatar
Sylvain Gugger committed
603
604
Now, by default this test will be run 3 times, each time with the last 3 arguments of ``test_floor`` being assigned the
corresponding arguments in the parameter list.
605
606

and you could run just the ``negative`` and ``integer`` sets of params with:
607
608
609

.. code-block:: bash

610
    pytest -k "negative and integer" tests/test_mytest.py
611
612
613
614
615

or all but ``negative`` sub-tests, with:

.. code-block:: bash

616
    pytest -k "not negative" tests/test_mytest.py
617

Sylvain Gugger's avatar
Sylvain Gugger committed
618
619
620
Besides using the ``-k`` filter that was just mentioned, you can find out the exact name of each sub-test and run any
or all of them using their exact names.

621
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
622

623
624
625
    pytest test_this1.py --collect-only -q

and it will list:
Sylvain Gugger's avatar
Sylvain Gugger committed
626

627
628
629
630
631
632
633
634
635
636
637
638
.. code-block:: bash

    test_this1.py::TestMathUnitTest::test_floor_0_negative
    test_this1.py::TestMathUnitTest::test_floor_1_integer
    test_this1.py::TestMathUnitTest::test_floor_2_large_fraction

So now you can run just 2 specific sub-tests:

.. code-block:: bash

    pytest test_this1.py::TestMathUnitTest::test_floor_0_negative  test_this1.py::TestMathUnitTest::test_floor_1_integer

Sylvain Gugger's avatar
Sylvain Gugger committed
639
640
641
642
643
The module `parameterized <https://pypi.org/project/parameterized/>`__ which is already in the developer dependencies
of ``transformers`` works for both: ``unittests`` and ``pytest`` tests.

If, however, the test is not a ``unittest``, you may use ``pytest.mark.parametrize`` (or you may see it being used in
some existing tests, mostly under ``examples``).
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661

Here is the same example, this time using ``pytest``'s ``parametrize`` marker:

.. code-block:: python

    # test_this2.py
    import pytest
    @pytest.mark.parametrize(
        "name, input, expected",
        [
            ("negative", -1.5, -2.0),
            ("integer", 1, 1.0),
            ("large fraction", 1.6, 1),
        ],
    )
    def test_floor(name, input, expected):
        assert_equal(math.floor(input), expected)

Sylvain Gugger's avatar
Sylvain Gugger committed
662
663
664
665
Same as with ``parameterized``, with ``pytest.mark.parametrize`` you can have a fine control over which sub-tests are
run, if the ``-k`` filter doesn't do the job. Except, this parametrization function creates a slightly different set of
names for the sub-tests. Here is what they look like:

666
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
667

668
669
670
    pytest test_this2.py --collect-only -q

and it will list:
Sylvain Gugger's avatar
Sylvain Gugger committed
671

672
673
674
675
.. code-block:: bash

    test_this2.py::test_floor[integer-1-1.0]
    test_this2.py::test_floor[negative--1.5--2.0]
676
    test_this2.py::test_floor[large fraction-1.6-1]
677
678
679
680
681
682
683
684
685

So now you can run just the specific test:

.. code-block:: bash

    pytest test_this2.py::test_floor[negative--1.5--2.0] test_this2.py::test_floor[integer-1-1.0]

as in the previous example.

Sylvain Gugger's avatar
Sylvain Gugger committed
686

687

688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
Files and directories
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In tests often we need to know where things are relative to the current test file, and it's not trivial since the test
could be invoked from more than one directory or could reside in sub-directories with different depths. A helper class
:obj:`transformers.test_utils.TestCasePlus` solves this problem by sorting out all the basic paths and provides easy
accessors to them:

* ``pathlib`` objects (all fully resolved):

   - ``test_file_path`` - the current test file path, i.e. ``__file__``
   - ``test_file_dir`` - the directory containing the current test file
   - ``tests_dir`` - the directory of the ``tests`` test suite
   - ``examples_dir`` - the directory of the ``examples`` test suite
   - ``repo_root_dir`` - the directory of the repository
   - ``src_dir`` - the directory of ``src`` (i.e. where the ``transformers`` sub-dir resides)

* stringified paths---same as above but these return paths as strings, rather than ``pathlib`` objects:

   - ``test_file_path_str``
   - ``test_file_dir_str``
   - ``tests_dir_str``
   - ``examples_dir_str``
   - ``repo_root_dir_str``
   - ``src_dir_str``

To start using those all you need is to make sure that the test resides in a subclass of
:obj:`transformers.test_utils.TestCasePlus`. For example:

.. code-block:: python

    from transformers.testing_utils import TestCasePlus
    class PathExampleTest(TestCasePlus):
        def test_something_involving_local_locations(self):
722
            data_dir = self.tests_dir / "fixtures/tests_samples/wmt_en_ro"
723

724
725
If you don't need to manipulate paths via ``pathlib`` or you just need a path as a string, you can always invoked
``str()`` on the ``pathlib`` object or use the accessors ending with ``_str``. For example:
726
727
728
729
730
731
732
733
734
735
736

.. code-block:: python

    from transformers.testing_utils import TestCasePlus
    class PathExampleTest(TestCasePlus):
        def test_something_involving_stringified_locations(self):
            examples_dir = self.examples_dir_str




737
Temporary files and directories
Sylvain Gugger's avatar
Sylvain Gugger committed
738
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
739

Sylvain Gugger's avatar
Sylvain Gugger committed
740
Using unique temporary files and directories are essential for parallel test running, so that the tests won't overwrite
741
each other's data. Also we want to get the temporary files and directories removed at the end of each test that created
Sylvain Gugger's avatar
Sylvain Gugger committed
742
them. Therefore, using packages like ``tempfile``, which address these needs is essential.
743

744
745
However, when debugging tests, you need to be able to see what goes into the temporary file or directory and you want
to know it's exact path and not having it randomized on every test re-run.
746

Sylvain Gugger's avatar
Sylvain Gugger committed
747
748
A helper class :obj:`transformers.test_utils.TestCasePlus` is best used for such purposes. It's a sub-class of
:obj:`unittest.TestCase`, so we can easily inherit from it in the test modules.
749
750
751
752
753
754
755

Here is an example of its usage:

.. code-block:: python

    from transformers.testing_utils import TestCasePlus
    class ExamplesTests(TestCasePlus):
756
757
        def test_whatever(self):
            tmp_dir = self.get_auto_remove_tmp_dir()
758
759
760

This code creates a unique temporary directory, and sets :obj:`tmp_dir` to its location.

761
* Create a unique temporary dir:
762
763
764
765

.. code-block:: python

    def test_whatever(self):
766
767
768
769
        tmp_dir = self.get_auto_remove_tmp_dir()

``tmp_dir`` will contain the path to the created temporary dir. It will be automatically removed at the end of the
test.
770

771
* Create a temporary dir of my choice, ensure it's empty before the test starts and don't empty it after the test.
772
773
774
775

.. code-block:: python

    def test_whatever(self):
776
        tmp_dir = self.get_auto_remove_tmp_dir("./xxx")
777

778
779
This is useful for debug when you want to monitor a specific directory and want to make sure the previous tests didn't
leave any data in there.
780

781
782
* You can override the default behavior by directly overriding the ``before`` and ``after`` args, leading to one of the
  following behaviors:
783

784
785
786
787
    - ``before=True``: the temporary dir will always be cleared at the beginning of the test.
    - ``before=False``: if the temporary dir already existed, any existing files will remain there.
    - ``after=True``: the temporary dir will always be deleted at the end of the test.
    - ``after=False``: the temporary dir will always be left intact at the end of the test.
788
789

.. note::
Sylvain Gugger's avatar
Sylvain Gugger committed
790
791
792
   In order to run the equivalent of ``rm -r`` safely, only subdirs of the project repository checkout are allowed if
   an explicit obj:`tmp_dir` is used, so that by mistake no ``/tmp`` or similar important part of the filesystem will
   get nuked. i.e. please always pass paths that start with ``./``.
793
794

.. note::
Sylvain Gugger's avatar
Sylvain Gugger committed
795
796
   Each test can register multiple temporary directories and they all will get auto-removed, unless requested
   otherwise.
797
798


799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
Temporary sys.path override
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you need to temporary override ``sys.path`` to import from another test for example, you can use the
``ExtendSysPath`` context manager. Example:


.. code-block:: python

    import os
    from transformers.testing_utils import ExtendSysPath
    bindir = os.path.abspath(os.path.dirname(__file__))
    with ExtendSysPath(f"{bindir}/.."):
        from test_trainer import TrainerIntegrationCommon  # noqa



816
Skipping tests
Sylvain Gugger's avatar
Sylvain Gugger committed
817
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
818

Sylvain Gugger's avatar
Sylvain Gugger committed
819
820
This is useful when a bug is found and a new test is written, yet the bug is not fixed yet. In order to be able to
commit it to the main repository we need make sure it's skipped during ``make test``.
821
822
823

Methods:

Sylvain Gugger's avatar
Sylvain Gugger committed
824
825
826
-  A **skip** means that you expect your test to pass only if some conditions are met, otherwise pytest should skip
   running the test altogether. Common examples are skipping windows-only tests on non-windows platforms, or skipping
   tests that depend on an external resource which is not available at the moment (for example a database).
827

Sylvain Gugger's avatar
Sylvain Gugger committed
828
829
830
-  A **xfail** means that you expect a test to fail for some reason. A common example is a test for a feature not yet
   implemented, or a bug not yet fixed. When a test passes despite being expected to fail (marked with
   pytest.mark.xfail), it’s an xpass and will be reported in the test summary.
831

Sylvain Gugger's avatar
Sylvain Gugger committed
832
833
One of the important differences between the two is that ``skip`` doesn't run the test, and ``xfail`` does. So if the
code that's buggy causes some bad state that will affect other tests, do not use ``xfail``.
834
835

Implementation
Sylvain Gugger's avatar
Sylvain Gugger committed
836
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857

- Here is how to skip whole test unconditionally:

.. code-block:: python

    @unittest.skip("this bug needs to be fixed")
    def test_feature_x():

or via pytest:

.. code-block:: python

    @pytest.mark.skip(reason="this bug needs to be fixed")

or the ``xfail`` way:

.. code-block:: python

    @pytest.mark.xfail
    def test_feature_x():

858
- Here is how to skip a test based on some internal check inside the test:
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880

.. code-block:: python

    def test_feature_x():
        if not has_something():
            pytest.skip("unsupported configuration")

or the whole module:

.. code-block:: python

    import pytest
    if not pytest.config.getoption("--custom-flag"):
        pytest.skip("--custom-flag is missing, skipping tests", allow_module_level=True)

or the ``xfail`` way:

.. code-block:: python

    def test_feature_x():
        pytest.xfail("expected to fail until bug XYZ is fixed")

881
- Here is how to skip all tests in a module if some import is missing:
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899

.. code-block:: python

    docutils = pytest.importorskip("docutils", minversion="0.3")

-  Skip a test based on a condition:

.. code-block:: python

    @pytest.mark.skipif(sys.version_info < (3,6), reason="requires python3.6 or higher")
    def test_feature_x():

or:

.. code-block:: python

    @unittest.skipIf(torch_device == "cpu", "Can't do half precision")
    def test_feature_x():
Sylvain Gugger's avatar
Sylvain Gugger committed
900

901
902
903
904
905
906
907
908
909
910
or skip the whole module:

.. code-block:: python

    @pytest.mark.skipif(sys.platform == 'win32', reason="does not run on windows")
    class TestClass():
        def test_feature_x(self):

More details, example and ways are `here <https://docs.pytest.org/en/latest/skipping.html>`__.

911
Slow tests
Sylvain Gugger's avatar
Sylvain Gugger committed
912
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
913

Sylvain Gugger's avatar
Sylvain Gugger committed
914
915
916
The library of tests is ever-growing, and some of the tests take minutes to run, therefore we can't afford waiting for
an hour for the test suite to complete on CI. Therefore, with some exceptions for essential tests, slow tests should be
marked as in the example below:
917
918
919
920
921
922
923

.. code-block:: python

    from transformers.testing_utils import slow
    @slow
    def test_integration_foo():

924
Once a test is marked as ``@slow``, to run such tests set ``RUN_SLOW=1`` env var, e.g.:
925
926
927
928

.. code-block:: bash

    RUN_SLOW=1 pytest tests
Sylvain Gugger's avatar
Sylvain Gugger committed
929
930
931

Some decorators like ``@parameterized`` rewrite test names, therefore ``@slow`` and the rest of the skip decorators
``@require_*`` have to be listed last for them to work correctly. Here is an example of the correct usage:
Stas Bekman's avatar
Stas Bekman committed
932
933

.. code-block:: python
934

Stas Bekman's avatar
Stas Bekman committed
935
936
937
    @parameterized.expand(...)
    @slow
    def test_integration_foo():
938

Sylvain Gugger's avatar
Sylvain Gugger committed
939
940
941
942
As explained at the beginning of this document, slow tests get to run on a scheduled basis, rather than in PRs CI
checks. So it's possible that some problems will be missed during a PR submission and get merged. Such problems will
get caught during the next scheduled CI job. But it also means that it's important to run the slow tests on your
machine before submitting the PR.
943
944
945

Here is a rough decision making mechanism for choosing which tests should be marked as slow:

Sylvain Gugger's avatar
Sylvain Gugger committed
946
947
948
949
If the test is focused on one of the library's internal components (e.g., modeling files, tokenization files,
pipelines), then we should run that test in the non-slow test suite. If it's focused on an other aspect of the library,
such as the documentation or the examples, then we should run these tests in the slow test suite. And then, to refine
this approach we should have exceptions:
950

951
952
953
954
* All tests that need to download a heavy set of weights or a dataset that is larger than ~50MB (e.g., model or
  tokenizer integration tests, pipeline integration tests) should be set to slow. If you're adding a new model, you
  should create and upload to the hub a tiny version of it (with random weights) for integration tests. This is
  discussed in the following paragraphs.
955
* All tests that need to do a training not specifically optimized to be fast should be set to slow.
Sylvain Gugger's avatar
Sylvain Gugger committed
956
957
958
* We can introduce exceptions if some of these should-be-non-slow tests are excruciatingly slow, and set them to
  ``@slow``. Auto-modeling tests, which save and load large files to disk, are a good example of tests that are marked
  as ``@slow``.
959
960
* If a test completes under 1 second on CI (including downloads if any) then it should be a normal test regardless.

Sylvain Gugger's avatar
Sylvain Gugger committed
961
962
963
964
Collectively, all the non-slow tests need to cover entirely the different internals, while remaining fast. For example,
a significant coverage can be achieved by testing with specially created tiny models with random weights. Such models
have the very minimal number of layers (e.g., 2), vocab size (e.g., 1000), etc. Then the ``@slow`` tests can use large
slow models to do qualitative testing. To see the use of these simply look for *tiny* models with:
965
966
967
968
969

.. code-block:: bash

    grep tiny tests examples

970
971
972
Here is a an example of a :prefix_link:`script <scripts/fsmt/fsmt-make-tiny-model.py>` that created the tiny model
`stas/tiny-wmt19-en-de <https://huggingface.co/stas/tiny-wmt19-en-de>`__. You can easily adjust it to your specific
model's architecture.
973

Sylvain Gugger's avatar
Sylvain Gugger committed
974
975
976
It's easy to measure the run-time incorrectly if for example there is an overheard of downloading a huge model, but if
you test it locally the downloaded files would be cached and thus the download time not measured. Hence check the
execution speed report in CI logs instead (the output of ``pytest --durations=0 tests``).
977

Sylvain Gugger's avatar
Sylvain Gugger committed
978
979
980
That report is also useful to find slow outliers that aren't marked as such, or which need to be re-written to be fast.
If you notice that the test suite starts getting slow on CI, the top listing of this report will show the slowest
tests.
981
982


983
Testing the stdout/stderr output
Sylvain Gugger's avatar
Sylvain Gugger committed
984
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
985

Sylvain Gugger's avatar
Sylvain Gugger committed
986
987
In order to test functions that write to ``stdout`` and/or ``stderr``, the test can access those streams using the
``pytest``'s `capsys system <https://docs.pytest.org/en/latest/capture.html>`__. Here is how this is accomplished:
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005

.. code-block:: python

    import sys
    def print_to_stdout(s): print(s)
    def print_to_stderr(s): sys.stderr.write(s)
    def test_result_and_stdout(capsys):
        msg = "Hello"
        print_to_stdout(msg)
        print_to_stderr(msg)
        out, err = capsys.readouterr() # consume the captured output streams
        # optional: if you want to replay the consumed streams:
        sys.stdout.write(out)
        sys.stderr.write(err)
        # test:
        assert msg in out
        assert msg in err

Sylvain Gugger's avatar
Sylvain Gugger committed
1006
1007
And, of course, most of the time, ``stderr`` will come as a part of an exception, so try/except has to be used in such
a case:
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038

.. code-block:: python

    def raise_exception(msg): raise ValueError(msg)
    def test_something_exception():
        msg = "Not a good value"
        error = ''
        try:
            raise_exception(msg)
        except Exception as e:
            error = str(e)
            assert msg in error, f"{msg} is in the exception:\n{error}"

Another approach to capturing stdout is via ``contextlib.redirect_stdout``:

.. code-block:: python

    from io import StringIO
    from contextlib import redirect_stdout
    def print_to_stdout(s): print(s)
    def test_result_and_stdout():
        msg = "Hello"
        buffer = StringIO()
        with redirect_stdout(buffer):
            print_to_stdout(msg)
        out = buffer.getvalue()
        # optional: if you want to replay the consumed streams:
        sys.stdout.write(out)
        # test:
        assert msg in out

Sylvain Gugger's avatar
Sylvain Gugger committed
1039
1040
1041
1042
An important potential issue with capturing stdout is that it may contain ``\r`` characters that in normal ``print``
reset everything that has been printed so far. There is no problem with ``pytest``, but with ``pytest -s`` these
characters get included in the buffer, so to be able to have the test run with and without ``-s``, you have to make an
extra cleanup to the captured output, using ``re.sub(r'~.*\r', '', buf, 0, re.M)``.
1043

Sylvain Gugger's avatar
Sylvain Gugger committed
1044
1045
But, then we have a helper context manager wrapper to automatically take care of it all, regardless of whether it has
some ``\r``'s in it or not, so it's a simple:
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064

.. code-block:: python

    from transformers.testing_utils import CaptureStdout
    with CaptureStdout() as cs:
        function_that_writes_to_stdout()
    print(cs.out)

Here is a full test example:

.. code-block:: python

    from transformers.testing_utils import CaptureStdout
    msg = "Secret message\r"
    final = "Hello World"
    with CaptureStdout() as cs:
        print(msg + final)
    assert cs.out == final+"\n", f"captured: {cs.out}, expecting {final}"

Sylvain Gugger's avatar
Sylvain Gugger committed
1065
If you'd like to capture ``stderr`` use the :obj:`CaptureStderr` class instead:
1066
1067
1068
1069
1070
1071
1072
1073

.. code-block:: python

    from transformers.testing_utils import CaptureStderr
    with CaptureStderr() as cs:
        function_that_writes_to_stderr()
    print(cs.err)

Sylvain Gugger's avatar
Sylvain Gugger committed
1074
If you need to capture both streams at once, use the parent :obj:`CaptureStd` class:
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085

.. code-block:: python

    from transformers.testing_utils import CaptureStd
    with CaptureStd() as cs:
        function_that_writes_to_stdout_and_stderr()
    print(cs.err, cs.out)



Capturing logger stream
Sylvain Gugger's avatar
Sylvain Gugger committed
1086
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096

If you need to validate the output of a logger, you can use :obj:`CaptureLogger`:

.. code-block:: python

    from transformers import logging
    from transformers.testing_utils import CaptureLogger

    msg = "Testing 1, 2, 3"
    logging.set_verbosity_info()
1097
    logger = logging.get_logger("transformers.models.bart.tokenization_bart")
1098
1099
1100
1101
1102
1103
    with CaptureLogger(logger) as cl:
        logger.info(msg)
    assert cl.out, msg+"\n"


Testing with environment variables
Sylvain Gugger's avatar
Sylvain Gugger committed
1104
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1105

Sylvain Gugger's avatar
Sylvain Gugger committed
1106
1107
If you want to test the impact of environment variables for a specific test you can use a helper decorator
``transformers.testing_utils.mockenv``
1108
1109
1110
1111
1112
1113
1114
1115
1116

.. code-block:: python

    from transformers.testing_utils import mockenv
    class HfArgumentParserTest(unittest.TestCase):
        @mockenv(TRANSFORMERS_VERBOSITY="error")
        def test_env_override(self):
            env_level_str = os.getenv("TRANSFORMERS_VERBOSITY", None)

1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
At times an external program needs to be called, which requires setting ``PYTHONPATH`` in ``os.environ`` to include
multiple local paths. A helper class :obj:`transformers.test_utils.TestCasePlus` comes to help:

.. code-block:: python

    from transformers.testing_utils import TestCasePlus
    class EnvExampleTest(TestCasePlus):
        def test_external_prog(self):
            env = self.get_env()
            # now call the external program, passing ``env`` to it

Depending on whether the test file was under the ``tests`` test suite or ``examples`` it'll correctly set up
``env[PYTHONPATH]`` to include one of these two directories, and also the ``src`` directory to ensure the testing is
done against the current repo, and finally with whatever ``env[PYTHONPATH]`` was already set to before the test was
called if anything.

This helper method creates a copy of the ``os.environ`` object, so the original remains intact.

1135
1136

Getting reproducible results
Sylvain Gugger's avatar
Sylvain Gugger committed
1137
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1138

Sylvain Gugger's avatar
Sylvain Gugger committed
1139
1140
In some situations you may want to remove randomness for your tests. To get identical reproducable results set, you
will need to fix the seed:
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163

.. code-block:: python

    seed = 42

    # python RNG
    import random
    random.seed(seed)

    # pytorch RNGs
    import torch
    torch.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    if torch.cuda.is_available(): torch.cuda.manual_seed_all(seed)

    # numpy RNG
    import numpy as np
    np.random.seed(seed)

    # tf RNG
    tf.random.set_seed(seed)

Debugging tests
Sylvain Gugger's avatar
Sylvain Gugger committed
1164
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1165
1166
1167
1168
1169
1170

To start a debugger at the point of the warning, do this:

.. code-block:: bash

    pytest tests/test_logging.py -W error::UserWarning --pdb
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233



Testing Experimental CI Features
-----------------------------------------------------------------------------------------------------------------------

Testing CI features can be potentially problematic as it can interfere with the normal CI functioning. Therefore if a
new CI feature is to be added, it should be done as following.

1. Create a new dedicated job that tests what needs to be tested
2. The new job must always succeed so that it gives us a green ✓ (details below).
3. Let it run for some days to see that a variety of different PR types get to run on it (user fork branches,
   non-forked branches, branches originating from github.com UI direct file edit, various forced pushes, etc. - there
   are so many) while monitoring the experimental job's logs (not the overall job green as it's purposefully always
   green)
4. When it's clear that everything is solid, then merge the new changes into existing jobs.

That way experiments on CI functionality itself won't interfere with the normal workflow.

Now how can we make the job always succeed while the new CI feature is being developed?

Some CIs, like TravisCI support ignore-step-failure and will report the overall job as successful, but CircleCI and
Github Actions as of this writing don't support that.

So the following workaround can be used:

1. ``set +euo pipefail`` at the beginning of the run command to suppress most potential failures in the bash script.
2. the last command must be a success: ``echo "done"`` or just ``true`` will do

Here is an example:

.. code-block:: yaml

    - run:
        name: run CI experiment
        command: |
            set +euo pipefail
            echo "setting run-all-despite-any-errors-mode"
            this_command_will_fail
            echo "but bash continues to run"
            # emulate another failure
            false
            # but the last command must be a success
            echo "during experiment do not remove: reporting success to CI, even if there were failures"

For simple commands you could also do:

.. code-block:: bash

    cmd_that_may_fail || true

Of course, once satisfied with the results, integrate the experimental step or job with the rest of the normal jobs,
while removing ``set +euo pipefail`` or any other things you may have added to ensure that the experimental job doesn't
interfere with the normal CI functioning.

This whole process would have been much easier if we only could set something like ``allow-failure`` for the
experimental step, and let it fail without impacting the overall status of PRs. But as mentioned earlier CircleCI and
Github Actions don't support it at the moment.

You can vote for this feature and see where it is at at these CI-specific threads:

* `Github Actions: <https://github.com/actions/toolkit/issues/399>`__
* `CircleCI: <https://ideas.circleci.com/ideas/CCI-I-344>`__