README.md

# hytop - monitoring tools

## Quick start

### Install from PyPI

Using `pipx` (recommended):

```bash
pipx install hytop
hytop --help
```

Using `uv`:

```bash
uv tool install hytop
hytop --help
```

### Install from source

uv:

```bash
uv run hytop --help
```

pip:

```bash
pip install .
hytop --help
```

pipx:

```bash
pipx install .
hytop --help
```

## Prerequisites

- Python >= 3.10
- Python packages: `rich`, `typer`
- Passwordless SSH for remote 

## `hytop`

```bash
# Show the version number
hytop --version

# Specify a timeout for the subcommand
hytop --timeout 300 [COMMAND]

# 0.5-second interval and 5-second rolling window for the subcommand
hytop -n 0.5 --window 5 [COMMAND]

# Specify a list of nodes for the subcommand
hytop -H node01,node02 [COMMAND]
```

### SSH transport

`hytop` uses a lightweight SSH pull model and enables SSH connection reuse by default in the core layer (applies to all subcommands using SSH collection):

- `ControlMaster=auto`
- `ControlPersist=30s`
- `ControlPath=~/.ssh/hytop-%C`
- `ServerAliveInterval=5`
- `ServerAliveCountMax=1`

## `hytop gpu`

A lightweight script for live `hy-smi` polling with rolling averages across multiple hosts. It features a modern terminal UI and can be used as a blocking scheduler for GPU jobs.

### Usage

Simple examples:

```bash
# Local node, all GPUs
hytop gpu

# Two nodes, 0.5-second interval
hytop -H node01,node02 -n 0.5 gpu

# Exit with code 0 when all monitored GPUs are available
hytop gpu --devices 0,1 --wait-idle

# Wait for GPUs to be idle for 30 seconds before exiting
hytop gpu --devices 0,1 --wait-idle --wait-idle-seconds 30

# Wait at most 300s for availability (exit 124 on timeout)
hytop gpu --devices 0,1 --wait-idle --timeout 300

# Fine-grained columns (output order follows show-flag order)
hytop gpu --showtemp --showpower
hytop gpu --showpower --showtemp
```

Queue jobs in shared environments:

```bash
if hytop -H node01,node02 gpu --timeout 300 --wait-idle; then
  echo "GPUs available, starting workload..."
  # YOUR COMMAND HERE (e.g., python train.py)
else
  echo "Error: GPUs not available in time, aborting pipeline."
fi
```

### Exit codes

Designed to be script-friendly:

* `0`: Availability condition met (GPUs are idle).
* `124`: Timeout reached before the availability condition was met.
* `130`: Interrupted by the user (Ctrl+C).
* `2`: Argument or input error.

### Fine-grained metric flags

`hytop gpu` uses formatted `hy-smi --json` output and supports a subset of `hy-smi` `--show*` flags:

- `--showtemp`: GPU core temperature (`Temp`)
- `--showpower`: average package power (`AvgPwr`, plus `AvgPwr@window`)
- `--showsclk`: sclk frequency (`sclk`)
- `--showmemuse`: VRAM usage (`VRAM%`)
- `--showuse`: GPU utilization (`GPU%`, plus `GPU%@window`)

If no `--show*` flags are specified, hytop defaults to:
`--showtemp --showpower --showsclk --showmemuse --showuse`.

## `hytop net`

Lightweight pull-based network monitor for Ethernet and InfiniBand across one or more hosts.

### Usage

```bash
# Local host, auto-discover eth+ib interfaces
hytop net

# Two hosts, 0.5-second interval
hytop -H node01,node02 -n 0.5 net

# IB-only monitoring
hytop net --kind ib

# Include only selected interfaces
hytop net --ifaces eth0,mlx5_0/p1

# Stop after 60 seconds (returns 124 on timeout)
hytop --timeout 60 net
```

## Development

Clone the repo and run `make setup` to create the virtual environment, install all dependencies (including dev), and configure pre-commit hooks:

```bash
make setup
```

Common development commands:

```bash
make format     # Auto-fix and format code (ruff)
make lint       # Check code style and errors without modifying files
make test       # Run all unit tests (pytest)
make bump part=patch  # Bump version (patch/minor/major or X.Y.Z)
make clean      # Remove build caches and the virtual environment
```

### Version bump

Version is managed automatically via `bump-my-version`. Running the bump command will:
1. Update `__version__` in `src/hytop/__init__.py`
2. Update `current_version` in `pyproject.toml`
3. Create a commit (e.g., `[hytop] Bump version: 0.1.1 → 0.1.2`)
4. Create a tag (e.g., `hytop-0.1.2`)

```bash
make bump part=patch          # 0.1.1 -> 0.1.2
make bump part=minor          # 0.1.2 -> 0.2.0
make bump part=major          # 0.2.0 -> 1.0.0
make bump part=1.2.3          # set an explicit version
```

### Publish

Releases are automatically published to PyPI via GitHub Actions when pushing a version tag.

```bash
# 1. Bump version (auto-commits and auto-tags)
make bump part=patch

# 2. Push commits and tags to trigger GitHub Actions release
git push --follow-tags
```

To test building distributions locally:

```bash
make build
```