Unverified Commit e21340da authored by Stas Bekman's avatar Stas Bekman Committed by GitHub
Browse files

[testing utils] get_auto_remove_tmp_dir more intuitive behavior (#8401)



* [testing utils] get_auto_remove_tmp_dir default change

Now that I have been using `get_auto_remove_tmp_dir default change` for a while, I realized that the defaults aren't most optimal.

99% of the time we want the tmp dir to be empty at the beginning of the test - so changing the default to `before=True` - this shouldn't impact any tests since this feature is used only during debug.

* simplify things

* update docs

* fix doc layout

* style

* Update src/transformers/testing_utils.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* better 3-state doc

* style

* Apply suggestions from code review
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* s/tmp/temporary/ + style

* correct the statement
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
parent e7e15498
...@@ -716,11 +716,11 @@ Temporary files and directories ...@@ -716,11 +716,11 @@ Temporary files and directories
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Using unique temporary files and directories are essential for parallel test running, so that the tests won't overwrite Using unique temporary files and directories are essential for parallel test running, so that the tests won't overwrite
each other's data. Also we want to get the temp files and directories removed at the end of each test that created each other's data. Also we want to get the temporary files and directories removed at the end of each test that created
them. Therefore, using packages like ``tempfile``, which address these needs is essential. them. Therefore, using packages like ``tempfile``, which address these needs is essential.
However, when debugging tests, you need to be able to see what goes into the temp file or directory and you want to However, when debugging tests, you need to be able to see what goes into the temporary file or directory and you want
know it's exact path and not having it randomized on every test re-run. to know it's exact path and not having it randomized on every test re-run.
A helper class :obj:`transformers.test_utils.TestCasePlus` is best used for such purposes. It's a sub-class of A helper class :obj:`transformers.test_utils.TestCasePlus` is best used for such purposes. It's a sub-class of
:obj:`unittest.TestCase`, so we can easily inherit from it in the test modules. :obj:`unittest.TestCase`, so we can easily inherit from it in the test modules.
...@@ -736,32 +736,33 @@ Here is an example of its usage: ...@@ -736,32 +736,33 @@ Here is an example of its usage:
This code creates a unique temporary directory, and sets :obj:`tmp_dir` to its location. This code creates a unique temporary directory, and sets :obj:`tmp_dir` to its location.
In this and all the following scenarios the temporary directory will be auto-removed at the end of test, unless * Create a unique temporary dir:
``after=False`` is passed to the helper function.
* Create a temporary directory of my choice and delete it at the end - useful for debugging when you want to monitor a
specific directory:
.. code-block:: python .. code-block:: python
def test_whatever(self): def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test") tmp_dir = self.get_auto_remove_tmp_dir()
``tmp_dir`` will contain the path to the created temporary dir. It will be automatically removed at the end of the
test.
* Create a temporary directory of my choice and do not delete it at the end---useful for when you want to look at the * Create a temporary dir of my choice, ensure it's empty before the test starts and don't empty it after the test.
temp results:
.. code-block:: python .. code-block:: python
def test_whatever(self): def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", after=False) tmp_dir = self.get_auto_remove_tmp_dir("./xxx")
* Create a temporary directory of my choice and ensure to delete it right away---useful for when you disabled deletion This is useful for debug when you want to monitor a specific directory and want to make sure the previous tests didn't
in the previous test run and want to make sure the that temporary directory is empty before the new test is run: leave any data in there.
.. code-block:: python * You can override the default behavior by directly overriding the ``before`` and ``after`` args, leading to one of the
following behaviors:
def test_whatever(self): - ``before=True``: the temporary dir will always be cleared at the beginning of the test.
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", before=True) - ``before=False``: if the temporary dir already existed, any existing files will remain there.
- ``after=True``: the temporary dir will always be deleted at the end of the test.
- ``after=False``: the temporary dir will always be left intact at the end of the test.
.. note:: .. note::
In order to run the equivalent of ``rm -r`` safely, only subdirs of the project repository checkout are allowed if In order to run the equivalent of ``rm -r`` safely, only subdirs of the project repository checkout are allowed if
...@@ -815,7 +816,7 @@ or the ``xfail`` way: ...@@ -815,7 +816,7 @@ or the ``xfail`` way:
@pytest.mark.xfail @pytest.mark.xfail
def test_feature_x(): def test_feature_x():
Here is how to skip a test based on some internal check inside the test: - Here is how to skip a test based on some internal check inside the test:
.. code-block:: python .. code-block:: python
...@@ -838,7 +839,7 @@ or the ``xfail`` way: ...@@ -838,7 +839,7 @@ or the ``xfail`` way:
def test_feature_x(): def test_feature_x():
pytest.xfail("expected to fail until bug XYZ is fixed") pytest.xfail("expected to fail until bug XYZ is fixed")
Here is how to skip all tests in a module if some import is missing: - Here is how to skip all tests in a module if some import is missing:
.. code-block:: python .. code-block:: python
......
...@@ -522,45 +522,47 @@ class TestCasePlus(unittest.TestCase): ...@@ -522,45 +522,47 @@ class TestCasePlus(unittest.TestCase):
- ``repo_root_dir_str`` - ``repo_root_dir_str``
- ``src_dir_str`` - ``src_dir_str``
Feature 2: Flexible auto-removable temp dirs which are guaranteed to get removed at the end of test. Feature 2: Flexible auto-removable temporary dirs which are guaranteed to get removed at the end of test.
In all the following scenarios the temp dir will be auto-removed at the end of test, unless `after=False`. 1. Create a unique temporary dir:
# 1. create a unique temp dir, `tmp_dir` will contain the path to the created temp dir
:: ::
def test_whatever(self): def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir() tmp_dir = self.get_auto_remove_tmp_dir()
# 2. create a temp dir of my choice and delete it at the end - useful for debug when you want to # monitor a ``tmp_dir`` will contain the path to the created temporary dir. It will be automatically removed at the end of the
specific directory test.
2. Create a temporary dir of my choice, ensure it's empty before the test starts and don't
empty it after the test.
:: ::
def test_whatever(self): def test_whatever(self):
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test") tmp_dir = self.get_auto_remove_tmp_dir("./xxx")
# 3. create a temp dir of my choice and do not delete it at the end - useful for when you want # to look at the This is useful for debug when you want to monitor a specific directory and want to make sure the previous tests
temp results didn't leave any data in there.
:: 3. You can override the first two options by directly overriding the ``before`` and ``after`` args, leading to the
def test_whatever(self): following behavior:
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", after=False)
# 4. create a temp dir of my choice and ensure to delete it right away - useful for when you # disabled deletion in ``before=True``: the temporary dir will always be cleared at the beginning of the test.
the previous test run and want to make sure the that tmp dir is empty # before the new test is run
:: ``before=False``: if the temporary dir already existed, any existing files will remain there.
def test_whatever(self): ``after=True``: the temporary dir will always be deleted at the end of the test.
tmp_dir = self.get_auto_remove_tmp_dir(tmp_dir="./tmp/run/test", before=True)
``after=False``: the temporary dir will always be left intact at the end of the test.
Note 1: In order to run the equivalent of `rm -r` safely, only subdirs of the project repository checkout are Note 1: In order to run the equivalent of ``rm -r`` safely, only subdirs of the project repository checkout are
allowed if an explicit `tmp_dir` is used, so that by mistake no `/tmp` or similar important part of the filesystem allowed if an explicit ``tmp_dir`` is used, so that by mistake no ``/tmp`` or similar important part of the
will get nuked. i.e. please always pass paths that start with `./` filesystem will get nuked. i.e. please always pass paths that start with ``./``
Note 2: Each test can register multiple temp dirs and they all will get auto-removed, unless requested otherwise. Note 2: Each test can register multiple temporary dirs and they all will get auto-removed, unless requested
otherwise.
Feature 3: Get a copy of the ``os.environ`` object that sets up ``PYTHONPATH`` specific to the current test suite. Feature 3: Get a copy of the ``os.environ`` object that sets up ``PYTHONPATH`` specific to the current test suite.
This is useful for invoking external programs from the test suite - e.g. distributed training. This is useful for invoking external programs from the test suite - e.g. distributed training.
...@@ -573,6 +575,7 @@ class TestCasePlus(unittest.TestCase): ...@@ -573,6 +575,7 @@ class TestCasePlus(unittest.TestCase):
""" """
def setUp(self): def setUp(self):
# get_auto_remove_tmp_dir feature:
self.teardown_tmp_dirs = [] self.teardown_tmp_dirs = []
# figure out the resolved paths for repo_root, tests, examples, etc. # figure out the resolved paths for repo_root, tests, examples, etc.
...@@ -660,21 +663,42 @@ class TestCasePlus(unittest.TestCase): ...@@ -660,21 +663,42 @@ class TestCasePlus(unittest.TestCase):
env["PYTHONPATH"] = ":".join(paths) env["PYTHONPATH"] = ":".join(paths)
return env return env
def get_auto_remove_tmp_dir(self, tmp_dir=None, after=True, before=False): def get_auto_remove_tmp_dir(self, tmp_dir=None, before=None, after=None):
""" """
Args: Args:
tmp_dir (:obj:`string`, `optional`): tmp_dir (:obj:`string`, `optional`):
use this path, if None a unique path will be assigned if :obj:`None`:
before (:obj:`bool`, `optional`, defaults to :obj:`False`):
if `True` and tmp dir already exists make sure to empty it right away - a unique temporary path will be created
after (:obj:`bool`, `optional`, defaults to :obj:`True`): - sets ``before=True`` if ``before`` is :obj:`None`
delete the tmp dir at the end of the test - sets ``after=True`` if ``after`` is :obj:`None`
else:
- :obj:`tmp_dir` will be created
- sets ``before=True`` if ``before`` is :obj:`None`
- sets ``after=False`` if ``after`` is :obj:`None`
before (:obj:`bool`, `optional`):
If :obj:`True` and the :obj:`tmp_dir` already exists, make sure to empty it right away if :obj:`False`
and the :obj:`tmp_dir` already exists, any existing files will remain there.
after (:obj:`bool`, `optional`):
If :obj:`True`, delete the :obj:`tmp_dir` at the end of the test if :obj:`False`, leave the
:obj:`tmp_dir` and its contents intact at the end of the test.
Returns: Returns:
tmp_dir(:obj:`string`): either the same value as passed via `tmp_dir` or the path to the auto-created tmp tmp_dir(:obj:`string`): either the same value as passed via `tmp_dir` or the path to the auto-selected tmp
dir dir
""" """
if tmp_dir is not None: if tmp_dir is not None:
# defining the most likely desired behavior for when a custom path is provided.
# this most likely indicates the debug mode where we want an easily locatable dir that:
# 1. gets cleared out before the test (if it already exists)
# 2. is left intact after the test
if before is None:
before = True
if after is None:
after = False
# using provided path # using provided path
path = Path(tmp_dir).resolve() path = Path(tmp_dir).resolve()
...@@ -691,6 +715,15 @@ class TestCasePlus(unittest.TestCase): ...@@ -691,6 +715,15 @@ class TestCasePlus(unittest.TestCase):
path.mkdir(parents=True, exist_ok=True) path.mkdir(parents=True, exist_ok=True)
else: else:
# defining the most likely desired behavior for when a unique tmp path is auto generated
# (not a debug mode), here we require a unique tmp dir that:
# 1. is empty before the test (it will be empty in this situation anyway)
# 2. gets fully removed after the test
if before is None:
before = True
if after is None:
after = True
# using unique tmp dir (always empty, regardless of `before`) # using unique tmp dir (always empty, regardless of `before`)
tmp_dir = tempfile.mkdtemp() tmp_dir = tempfile.mkdtemp()
...@@ -701,7 +734,8 @@ class TestCasePlus(unittest.TestCase): ...@@ -701,7 +734,8 @@ class TestCasePlus(unittest.TestCase):
return tmp_dir return tmp_dir
def tearDown(self): def tearDown(self):
# remove registered temp dirs
# get_auto_remove_tmp_dir feature: remove registered temp dirs
for path in self.teardown_tmp_dirs: for path in self.teardown_tmp_dirs:
shutil.rmtree(path, ignore_errors=True) shutil.rmtree(path, ignore_errors=True)
self.teardown_tmp_dirs = [] self.teardown_tmp_dirs = []
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment