Fix: Fixed the previous tracking URI setting logic to prevent clashes with...

Fix: Fixed the previous tracking URI setting logic to prevent clashes with original MLflow code. (#29096) * Changed logic for setting the tracking URI. The previous code was calling the `mlflow.set_tracking_uri` function regardless of whether or not the environment variable `MLFLOW_TRACKING_URI` is even set. This led to clashes with the original MLflow implementation and therefore the logic was changed to only calling the function when the environment variable is explicitly set. * Check if tracking URI has already been set. The previous code did not consider the possibility that the tracking URI may already be set elsewhere and was therefore (erroneously) overriding previously set tracking URIs using the environment variable. * Removed redundant parentheses. Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fix docstring to reflect library convention properly. Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fix docstring to reflect library convention properly. "Unset by default" is the correct expression rather than "Default to `None`." Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Fix: Fixed the previous tracking URI setting logic to prevent clashes with...
Fix: Fixed the previous tracking URI setting logic to prevent clashes with original MLflow code. (#29096) * Changed logic for setting the tracking URI. The previous code was calling the `mlflow.set_tracking_uri` function regardless of whether or not the environment variable `MLFLOW_TRACKING_URI` is even set. This led to clashes with the original MLflow implementation and therefore the logic was changed to only calling the function when the environment variable is explicitly set. * Check if tracking URI has already been set. The previous code did not consider the possibility that the tracking URI may already be set elsewhere and was therefore (erroneously) overriding previously set tracking URIs using the environment variable. * Removed redundant parentheses. Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fix docstring to reflect library convention properly. Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fix docstring to reflect library convention properly. "Unset by default" is the correct expression rather than "Default to `None`." Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
81220cba · Sean (Seok-Won) Yi · GitHub · 5e4b69dc · 81220cba
Unverified Commit 81220cba authored Mar 04, 2024 by Sean (Seok-Won) Yi Committed by GitHub Mar 04, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 14 additions and 9 deletions

src/transformers/integrations/integration_utils.py src/transformers/integrations/integration_utils.py +14 -9

No files found.
--- a/src/transformers/integrations/integration_utils.py
+++ b/src/transformers/integrations/integration_utils.py
@@ -960,9 +960,9 @@ class MLflowCallback(TrainerCallback):
            remote server, e.g. s3 or GCS. If set to `True` or *1*, will copy each saved checkpoint on each save in
            [`TrainingArguments`]'s `output_dir` to the local or remote artifact storage. Using it without a remote
            storage will just copy the files to your artifact location.
-        - **MLFLOW_TRACKING_URI** (`str`, *optional*, defaults to `""`):
-            Whether to store runs at a specific path or remote server. Default to an empty string which will store runs
-            at `./mlruns` locally.
+        - **MLFLOW_TRACKING_URI** (`str`, *optional*):
+            Whether to store runs at a specific path or remote server. Unset by default, which skips setting the
+            tracking URI entirely.
        - **MLFLOW_EXPERIMENT_NAME** (`str`, *optional*, defaults to `None`):
            Whether to use an MLflow experiment_name under which to launch the run. Default to `None` which will point
            to the `Default` experiment in MLflow. Otherwise, it is a case sensitive name of the experiment to be
@@ -982,7 +982,7 @@ class MLflowCallback(TrainerCallback):
        """
        self._log_artifacts = os.getenv("HF_MLFLOW_LOG_ARTIFACTS", "FALSE").upper() in ENV_VARS_TRUE_VALUES
        self._nested_run = os.getenv("MLFLOW_NESTED_RUN", "FALSE").upper() in ENV_VARS_TRUE_VALUES
-        self._tracking_uri = os.getenv("MLFLOW_TRACKING_URI", "")
+        self._tracking_uri = os.getenv("MLFLOW_TRACKING_URI", None)
        self._experiment_name = os.getenv("MLFLOW_EXPERIMENT_NAME", None)
        self._flatten_params = os.getenv("MLFLOW_FLATTEN_PARAMS", "FALSE").upper() in ENV_VARS_TRUE_VALUES
        self._run_id = os.getenv("MLFLOW_RUN_ID", None)
@@ -997,12 +997,17 @@ class MLflowCallback(TrainerCallback):
            f" tags={self._nested_run}, tracking_uri={self._tracking_uri}"
        )
        if state.is_world_process_zero:
-            self._ml_flow.set_tracking_uri(self._tracking_uri)
-
-            if self._tracking_uri == "":
-                logger.debug(f"MLflow tracking URI is not set. Runs will be stored at {os.path.realpath('./mlruns')}")
+            if not self._ml_flow.is_tracking_uri_set():
+                if self._tracking_uri:
+                    self._ml_flow.set_tracking_uri(self._tracking_uri)
+                    logger.debug(f"MLflow tracking URI is set to {self._tracking_uri}")
+                else:
+                    logger.debug(
+                        "Environment variable `MLFLOW_TRACKING_URI` is not provided and therefore will not be"
+                        " explicitly set."
+                    )
            else:
-                logger.debug(f"MLflow tracking URI is set to {self._tracking_uri}")
+                logger.debug(f"MLflow tracking URI is set to {self._ml_flow.get_tracking_uri()}")

            if self._ml_flow.active_run() is None or self._nested_run or self._run_id:
                if self._experiment_name: