"tests/vscode:/vscode.git/clone" did not exist on "c33f6046c3dab8f41bedf893404e6469dea3bce8"
Unverified Commit 777b1bfe authored by Muhammad Sakib Khan Inan's avatar Muhammad Sakib Khan Inan Committed by GitHub
Browse files

New logging support to "Trainer" Class (ClearML Logger) (#20184)



* Init Update

* ClearML Callbacks integration

* update corrections

* args reporting updated

* {'tensorboard': False, 'pytorch': False}

* ClearML Tests added

* add clearml

* output_uri=True in Task.init

* reformatted integrations.py

* reformatted and fixed

* IF-ELSE statement issue on "has_clearml" resolved

* Add clearml in main callback docs

* Add additional clearml documentation

* Update src/transformers/integrations.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Accept suggestion
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Accept suggestion
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Small change in comments

* Make style clearml

* Accept suggestion
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: default avatarVictor Sonck <victor.sonck@gmail.com>
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
parent b4997382
...@@ -37,6 +37,7 @@ By default a [`Trainer`] will use the following callbacks: ...@@ -37,6 +37,7 @@ By default a [`Trainer`] will use the following callbacks:
installed. installed.
- [`~integrations.CodeCarbonCallback`] if [codecarbon](https://pypi.org/project/codecarbon/) is - [`~integrations.CodeCarbonCallback`] if [codecarbon](https://pypi.org/project/codecarbon/) is
installed. installed.
- [`~integrations.ClearMLCallback`] if [clearml](https://github.com/allegroai/clearml) is installed.
The main class that implements callbacks is [`TrainerCallback`]. It gets the The main class that implements callbacks is [`TrainerCallback`]. It gets the
[`TrainingArguments`] used to instantiate the [`Trainer`], can access that [`TrainingArguments`] used to instantiate the [`Trainer`], can access that
...@@ -73,6 +74,8 @@ Here is the list of the available [`TrainerCallback`] in the library: ...@@ -73,6 +74,8 @@ Here is the list of the available [`TrainerCallback`] in the library:
[[autodoc]] integrations.NeptuneCallback [[autodoc]] integrations.NeptuneCallback
[[autodoc]] integrations.ClearMLCallback
## TrainerCallback ## TrainerCallback
[[autodoc]] TrainerCallback [[autodoc]] TrainerCallback
......
...@@ -199,6 +199,7 @@ You can easily log and monitor your runs code. The following are currently suppo ...@@ -199,6 +199,7 @@ You can easily log and monitor your runs code. The following are currently suppo
* [Weights & Biases](https://docs.wandb.ai/integrations/huggingface) * [Weights & Biases](https://docs.wandb.ai/integrations/huggingface)
* [Comet ML](https://www.comet.ml/docs/python-sdk/huggingface/) * [Comet ML](https://www.comet.ml/docs/python-sdk/huggingface/)
* [Neptune](https://docs.neptune.ai/integrations-and-supported-tools/model-training/hugging-face) * [Neptune](https://docs.neptune.ai/integrations-and-supported-tools/model-training/hugging-face)
* [ClearML](https://clear.ml/docs/latest/docs/getting_started/ds/ds_first_steps)
### Weights & Biases ### Weights & Biases
...@@ -335,3 +336,40 @@ Now, when you start the training with `trainer.train()`, your metadata will be l ...@@ -335,3 +336,40 @@ Now, when you start the training with `trainer.train()`, your metadata will be l
| `NEPTUNE_PROJECT` | The full name of your Neptune project (`workspace-name/project-name`). To find and copy it, head to **project settings** &rarr; **Properties**. | | `NEPTUNE_PROJECT` | The full name of your Neptune project (`workspace-name/project-name`). To find and copy it, head to **project settings** &rarr; **Properties**. |
For detailed instructions and examples, see the [Neptune docs](https://docs.neptune.ai/integrations-and-supported-tools/model-training/hugging-face). For detailed instructions and examples, see the [Neptune docs](https://docs.neptune.ai/integrations-and-supported-tools/model-training/hugging-face).
### ClearML
To use ClearML, install the clearml package with:
```bash
pip install clearml
```
Then [create new credentials]() from the ClearML Server. You can get a free hosted server [here]() or [self-host your own]()!
After creating your new credentials, you can either copy the local snippet which you can paste after running:
```bash
clearml-init
```
Or you can copy the jupyter snippet if you are in Jupyter or Colab:
```python
%env CLEARML_WEB_HOST=https://app.clear.ml
%env CLEARML_API_HOST=https://api.clear.ml
%env CLEARML_FILES_HOST=https://files.clear.ml
%env CLEARML_API_ACCESS_KEY=***
%env CLEARML_API_SECRET_KEY=***
```
To enable logging to ClearML, include `"clearml"` in the `report_to` of your `TrainingArguments` or script. Or just pass along `--report_to all` if you have `clearml` already installed.
Advanced configuration is possible by setting environment variables:
| Environment Variable | Value |
|---|---|
| CLEARML_PROJECT | Name of the project in ClearML. (default: `"HuggingFace Transformers"`) |
| CLEARML_TASK | Name of the task in ClearML. (default: `"Trainer"`) |
Additional configuration options are available through generic [clearml environment variables](https://clear.ml/docs/latest/docs/configs/env_vars).
\ No newline at end of file
...@@ -175,7 +175,7 @@ def parse_args(): ...@@ -175,7 +175,7 @@ def parse_args():
default="all", default="all",
help=( help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,' 'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,'
' `"wandb"` and `"comet_ml"`. Use `"all"` (default) to report to all integrations.' ' `"wandb"`, `"comet_ml"` and `"clearml"`. Use `"all"` (default) to report to all integrations.'
"Only applicable when `--with_tracking` is passed." "Only applicable when `--with_tracking` is passed."
), ),
) )
......
...@@ -216,7 +216,7 @@ def parse_args(): ...@@ -216,7 +216,7 @@ def parse_args():
default="all", default="all",
help=( help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,' 'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,'
' `"wandb"` and `"comet_ml"`. Use `"all"` (default) to report to all integrations.' ' `"wandb"`, `"comet_ml"` and `"clearml"`. Use `"all"` (default) to report to all integrations.'
"Only applicable when `--with_tracking` is passed." "Only applicable when `--with_tracking` is passed."
), ),
) )
......
...@@ -223,7 +223,7 @@ def parse_args(): ...@@ -223,7 +223,7 @@ def parse_args():
default="all", default="all",
help=( help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,' 'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,'
' `"wandb"` and `"comet_ml"`. Use `"all"` (default) to report to all integrations.' ' `"wandb"`, `"comet_ml"` and `"clearml"`. Use `"all"` (default) to report to all integrations.'
"Only applicable when `--with_tracking` is passed." "Only applicable when `--with_tracking` is passed."
), ),
) )
......
...@@ -205,7 +205,7 @@ def parse_args(): ...@@ -205,7 +205,7 @@ def parse_args():
default="all", default="all",
help=( help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,' 'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,'
' `"wandb"` and `"comet_ml"`. Use `"all"` (default) to report to all integrations.' ' `"wandb"`, `"comet_ml"` and `"clearml"`. Use `"all"` (default) to report to all integrations.'
"Only applicable when `--with_tracking` is passed." "Only applicable when `--with_tracking` is passed."
), ),
) )
......
...@@ -296,7 +296,7 @@ def parse_args(): ...@@ -296,7 +296,7 @@ def parse_args():
default="all", default="all",
help=( help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,' 'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,'
' `"wandb"` and `"comet_ml"`. Use `"all"` (default) to report to all integrations.' ' `"wandb"`, `"comet_ml"` and `"clearml"`. Use `"all"` (default) to report to all integrations.'
"Only applicable when `--with_tracking` is passed." "Only applicable when `--with_tracking` is passed."
), ),
) )
......
...@@ -297,7 +297,7 @@ def parse_args(): ...@@ -297,7 +297,7 @@ def parse_args():
default="all", default="all",
help=( help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,' 'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,'
' `"wandb"` and `"comet_ml"`. Use `"all"` (default) to report to all integrations.' ' `"wandb"`, `"comet_ml"` and `"clearml"`. Use `"all"` (default) to report to all integrations.'
"Only applicable when `--with_tracking` is passed." "Only applicable when `--with_tracking` is passed."
), ),
) )
......
...@@ -298,7 +298,7 @@ def parse_args(): ...@@ -298,7 +298,7 @@ def parse_args():
default="all", default="all",
help=( help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,' 'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,'
' `"wandb"` and `"comet_ml"`. Use `"all"` (default) to report to all integrations.' ' `"wandb"`, `"comet_ml"` and `"clearml"`. Use `"all"` (default) to report to all integrations.'
"Only applicable when `--with_tracking` is passed." "Only applicable when `--with_tracking` is passed."
), ),
) )
......
...@@ -179,7 +179,7 @@ def parse_args(): ...@@ -179,7 +179,7 @@ def parse_args():
default="all", default="all",
help=( help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,' 'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,'
' `"wandb"` and `"comet_ml"`. Use `"all"` (default) to report to all integrations.' ' `"wandb"`, `"comet_ml"` and `"clearml"`. Use `"all"` (default) to report to all integrations.'
"Only applicable when `--with_tracking` is passed." "Only applicable when `--with_tracking` is passed."
), ),
) )
......
...@@ -232,7 +232,7 @@ def parse_args(): ...@@ -232,7 +232,7 @@ def parse_args():
default="all", default="all",
help=( help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,' 'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,'
' `"wandb"` and `"comet_ml"`. Use `"all"` (default) to report to all integrations.' ' `"wandb"`, `"comet_ml"` and `"clearml"`. Use `"all"` (default) to report to all integrations.'
"Only applicable when `--with_tracking` is passed." "Only applicable when `--with_tracking` is passed."
), ),
) )
......
...@@ -281,7 +281,7 @@ def parse_args(): ...@@ -281,7 +281,7 @@ def parse_args():
default="all", default="all",
help=( help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,' 'The integration to report the results and logs to. Supported platforms are `"tensorboard"`,'
' `"wandb"` and `"comet_ml"`. Use `"all"` (default) to report to all integrations.' ' `"wandb"`, `"comet_ml"` and `"clearml"`. Use `"all"` (default) to report to all integrations.'
"Only applicable when `--with_tracking` is passed." "Only applicable when `--with_tracking` is passed."
), ),
) )
......
...@@ -99,6 +99,7 @@ _import_structure = { ...@@ -99,6 +99,7 @@ _import_structure = {
"generation": [], "generation": [],
"hf_argparser": ["HfArgumentParser"], "hf_argparser": ["HfArgumentParser"],
"integrations": [ "integrations": [
"is_clearml_available",
"is_comet_available", "is_comet_available",
"is_neptune_available", "is_neptune_available",
"is_optuna_available", "is_optuna_available",
...@@ -3239,6 +3240,7 @@ if TYPE_CHECKING: ...@@ -3239,6 +3240,7 @@ if TYPE_CHECKING:
# Integrations # Integrations
from .integrations import ( from .integrations import (
is_clearml_available,
is_comet_available, is_comet_available,
is_neptune_available, is_neptune_available,
is_optuna_available, is_optuna_available,
......
...@@ -75,6 +75,10 @@ def is_wandb_available(): ...@@ -75,6 +75,10 @@ def is_wandb_available():
return importlib.util.find_spec("wandb") is not None return importlib.util.find_spec("wandb") is not None
def is_clearml_available():
return importlib.util.find_spec("clearml") is not None
def is_comet_available(): def is_comet_available():
return _has_comet return _has_comet
...@@ -528,6 +532,8 @@ def get_available_reporting_integrations(): ...@@ -528,6 +532,8 @@ def get_available_reporting_integrations():
integrations.append("wandb") integrations.append("wandb")
if is_codecarbon_available(): if is_codecarbon_available():
integrations.append("codecarbon") integrations.append("codecarbon")
if is_clearml_available():
integrations.append("clearml")
return integrations return integrations
...@@ -1299,6 +1305,112 @@ class CodeCarbonCallback(TrainerCallback): ...@@ -1299,6 +1305,112 @@ class CodeCarbonCallback(TrainerCallback):
self.tracker.stop() self.tracker.stop()
class ClearMLCallback(TrainerCallback):
"""
A [`TrainerCallback`] that sends the logs to [ClearML](https://clear.ml/).
Environment:
CLEARML_PROJECT (`str`, *optional*, defaults to `"HuggingFace Transformers"`):
ClearML project name.
CLEARML_TASK (`str`, *optional* defaults to `"Trainer"`):
ClearML task name.
"""
def __init__(self):
if is_clearml_available():
import clearml
self._clearml = clearml
else:
raise RuntimeError("ClearMLCallback requires 'clearml' to be installed. Run `pip install clearml`.")
self._initialized = False
self._clearml_task = None
def setup(self, args, state, model, tokenizer, **kwargs):
if self._clearml is None:
return
if state.is_world_process_zero:
logger.info("Automatic ClearML logging enabled.")
if self._clearml_task is None:
self._clearml_task = self._clearml.Task.init(
project_name=os.getenv("CLEARML_PROJECT", "HuggingFace Transformers"),
task_name=os.getenv("CLEARML_TASK", "Trainer"),
auto_connect_frameworks={"tensorboard": False, "pytorch": False},
output_uri=True,
)
self._initialized = True
logger.info("ClearML Task has been initialized.")
self._clearml_task.connect(args, "Args")
if hasattr(model, "config") and model.config is not None:
self._clearml_task.connect(model.config, "Model Configuration")
def on_train_begin(self, args, state, control, model=None, tokenizer=None, **kwargs):
if self._clearml is None:
return
if state.is_hyper_param_search:
self._initialized = False
if not self._initialized:
self.setup(args, state, model, tokenizer, **kwargs)
def on_train_end(self, args, state, control, model=None, tokenizer=None, metrics=None, logs=None, **kwargs):
if self._clearml is None:
return
if self._clearml_task and state.is_world_process_zero:
# Close ClearML Task at the end end of training
self._clearml_task.close()
def on_log(self, args, state, control, model=None, tokenizer=None, logs=None, **kwargs):
if self._clearml is None:
return
if not self._initialized:
self.setup(args, state, model, tokenizer, **kwargs)
if state.is_world_process_zero:
eval_prefix = "eval_"
eval_prefix_len = len(eval_prefix)
test_prefix = "test_"
test_prefix_len = len(test_prefix)
single_value_scalars = [
"train_runtime",
"train_samples_per_second",
"train_steps_per_second",
"train_loss",
"total_flos",
"epoch",
]
for k, v in logs.items():
if isinstance(v, (int, float)):
if k in single_value_scalars:
self._clearml_task.get_logger().report_single_value(name=k, value=v)
elif k.startswith(eval_prefix):
self._clearml_task.get_logger().report_scalar(
title=k[eval_prefix_len:], series="eval", value=v, iteration=state.global_step
)
elif k.startswith(test_prefix):
self._clearml_task.get_logger().report_scalar(
title=k[test_prefix_len:], series="test", value=v, iteration=state.global_step
)
else:
self._clearml_task.get_logger().report_scalar(
title=k, series="train", value=v, iteration=state.global_step
)
else:
logger.warning(
"Trainer is attempting to log a value of "
f'"{v}" of type {type(v)} for key "{k}" as a scalar. '
"This invocation of ClearML logger's report_scalar() "
"is incorrect so we dropped this attribute."
)
def on_save(self, args, state, control, **kwargs):
if self._clearml_task and state.is_world_process_zero:
ckpt_dir = f"checkpoint-{state.global_step}"
artifact_path = os.path.join(args.output_dir, ckpt_dir)
logger.info(f"Logging checkpoint artifacts in {ckpt_dir}. This may take time.")
self._clearml_task.update_output_model(artifact_path, iteration=state.global_step, auto_delete_file=False)
INTEGRATION_TO_CALLBACK = { INTEGRATION_TO_CALLBACK = {
"azure_ml": AzureMLCallback, "azure_ml": AzureMLCallback,
"comet_ml": CometCallback, "comet_ml": CometCallback,
...@@ -1307,6 +1419,7 @@ INTEGRATION_TO_CALLBACK = { ...@@ -1307,6 +1419,7 @@ INTEGRATION_TO_CALLBACK = {
"tensorboard": TensorBoardCallback, "tensorboard": TensorBoardCallback,
"wandb": WandbCallback, "wandb": WandbCallback,
"codecarbon": CodeCarbonCallback, "codecarbon": CodeCarbonCallback,
"clearml": ClearMLCallback,
} }
...@@ -1316,4 +1429,5 @@ def get_reporting_integration_callbacks(report_to): ...@@ -1316,4 +1429,5 @@ def get_reporting_integration_callbacks(report_to):
raise ValueError( raise ValueError(
f"{integration} is not supported, only {', '.join(INTEGRATION_TO_CALLBACK.keys())} are supported." f"{integration} is not supported, only {', '.join(INTEGRATION_TO_CALLBACK.keys())} are supported."
) )
return [INTEGRATION_TO_CALLBACK[integration] for integration in report_to] return [INTEGRATION_TO_CALLBACK[integration] for integration in report_to]
...@@ -39,6 +39,7 @@ from transformers import logging as transformers_logging ...@@ -39,6 +39,7 @@ from transformers import logging as transformers_logging
from .deepspeed import is_deepspeed_available from .deepspeed import is_deepspeed_available
from .integrations import ( from .integrations import (
is_clearml_available,
is_fairscale_available, is_fairscale_available,
is_optuna_available, is_optuna_available,
is_ray_available, is_ray_available,
...@@ -579,6 +580,16 @@ def require_wandb(test_case): ...@@ -579,6 +580,16 @@ def require_wandb(test_case):
return unittest.skipUnless(is_wandb_available(), "test requires wandb")(test_case) return unittest.skipUnless(is_wandb_available(), "test requires wandb")(test_case)
def require_clearml(test_case):
"""
Decorator marking a test requires clearml.
These tests are skipped when clearml isn't installed.
"""
return unittest.skipUnless(is_clearml_available(), "test requires clearml")(test_case)
def require_soundfile(test_case): def require_soundfile(test_case):
""" """
Decorator marking a test that requires soundfile Decorator marking a test that requires soundfile
......
...@@ -413,8 +413,8 @@ class TrainingArguments: ...@@ -413,8 +413,8 @@ class TrainingArguments:
instance of `Dataset`. instance of `Dataset`.
report_to (`str` or `List[str]`, *optional*, defaults to `"all"`): report_to (`str` or `List[str]`, *optional*, defaults to `"all"`):
The list of integrations to report the results and logs to. Supported platforms are `"azure_ml"`, The list of integrations to report the results and logs to. Supported platforms are `"azure_ml"`,
`"comet_ml"`, `"mlflow"`, `"neptune"`, `"tensorboard"` and `"wandb"`. Use `"all"` to report to all `"comet_ml"`, `"mlflow"`, `"neptune"`, `"tensorboard"`,`"clearml"` and `"wandb"`. Use `"all"` to report to
integrations installed, `"none"` for no integrations. all integrations installed, `"none"` for no integrations.
ddp_find_unused_parameters (`bool`, *optional*): ddp_find_unused_parameters (`bool`, *optional*):
When using distributed training, the value of the flag `find_unused_parameters` passed to When using distributed training, the value of the flag `find_unused_parameters` passed to
`DistributedDataParallel`. Will default to `False` if gradient checkpointing is used, `True` otherwise. `DistributedDataParallel`. Will default to `False` if gradient checkpointing is used, `True` otherwise.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment