# hytop - monitoring tools ## Quick start Use `uv` to run hytop: ```bash uv run hytop --help ``` Run hytop directly: ```bash source .venv/Scripts/activate hytop --help ``` ## Prerequisites - Python >= 3.10 - Python packages: `rich`, `typer` - Passwordless SSH for remote ## `hytop` ```bash # Show the version number hytop --version # Specify a timeout for the subcommand hytop --timeout 300 [COMMAND] # 0.5-second interval and 5-second rolling window for the subcommand hytop -n 0.5 --window 5 [COMMAND] # Specify a list of nodes for the subcommand hytop -H node01,node02 [COMMAND] ``` ## `hytop gpu` A lightweight script for live `hy-smi` polling with rolling averages across multiple hosts. It features a modern terminal UI and can be used as a blocking scheduler for GPU jobs. ### Usage Simple examples: ```bash # Local node, all GPUs hytop gpu # Two nodes, 0.5-second interval hytop -H node01,node02 -n 0.5 gpu # Exit with code 0 when all monitored GPUs are available hytop gpu --devices 0,1 --wait-idle # Wait for GPUs to be idle for 30 seconds before exiting hytop gpu --devices 0,1 --wait-idle --wait-idle-seconds 30 # Wait at most 300s for availability (exit 124 on timeout) hytop gpu --devices 0,1 --wait-idle --timeout 300 # Fine-grained columns (output order follows show-flag order) hytop gpu --showtemp --showpower hytop gpu --showpower --showtemp ``` Queue jobs in shared environments: ```bash if hytop -H node01,node02 gpu --timeout 300 --wait-idle; then echo "GPUs available, starting workload..." # YOUR COMMAND HERE (e.g., python train.py) else echo "Error: GPUs not available in time, aborting pipeline." fi ``` ### Exit Codes Designed to be script-friendly: * `0`: Availability condition met (GPUs are idle). * `124`: Timeout reached before the availability condition was met. * `130`: Interrupted by the user (Ctrl+C). * `2`: Argument or input error. ### Fine-grained metric flags `hytop gpu` uses formatted `hy-smi --json` output and supports a subset of `hy-smi` `--show*` flags: - `--showtemp`: GPU core temperature (`Temp`) - `--showpower`: average package power (`AvgPwr`, plus `AvgPwr@window`) - `--showsclk`: sclk frequency (`sclk`) - `--showmemuse`: VRAM usage (`VRAM%`) - `--showuse`: GPU utilization (`GPU%`, plus `GPU%@window`) If no `--show*` flags are specified, hytop defaults to: `--showtemp --showpower --showsclk --showmemuse --showuse`. ## Development Clone the repo and run `make setup` to create the virtual environment, install all dependencies (including dev), and configure pre-commit hooks: ```bash make setup ``` Common development commands: ```bash make format # Auto-fix and format code (ruff) make lint # Check code style and errors without modifying files make test # Run all unit tests (pytest) make bump part=patch # Bump version (patch/minor/major or X.Y.Z) make clean # Remove build caches and the virtual environment ``` ### Version bump Version is sourced from `src/hytop/__init__.py` (`__version__`). ```bash make bump part=patch # 0.1.0 -> 0.1.1 make bump part=minor # 0.1.1 -> 0.2.0 make bump part=major # 0.2.0 -> 1.0.0 make bump part="set 1.2.3" # set an explicit version ```