README.md 3.16 KB
Newer Older
one's avatar
one committed
1
2
3
4
# hytop - monitoring tools

## Quick start

5
uv:
one's avatar
one committed
6

one's avatar
one committed
7
```bash
one's avatar
one committed
8
9
10
uv run hytop --help
```

11
pip:
one's avatar
one committed
12
13

```bash
14
15
16
17
18
19
20
21
pip install .
hytop --help
```

pipx:

```bash
pipx install .
22
hytop --help
one's avatar
one committed
23
24
```

one's avatar
one committed
25
## Prerequisites
one's avatar
one committed
26
27
28

- Python >= 3.10
- Python packages: `rich`, `typer`
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
- Passwordless SSH for remote 

## `hytop`

```bash
# Show the version number
hytop --version

# Specify a timeout for the subcommand
hytop --timeout 300 [COMMAND]

# 0.5-second interval and 5-second rolling window for the subcommand
hytop -n 0.5 --window 5 [COMMAND]

# Specify a list of nodes for the subcommand
hytop -H node01,node02 [COMMAND]
```
one's avatar
one committed
46
47
48
49
50
51
52
53
54
55

## `hytop gpu`

A lightweight script for live `hy-smi` polling with rolling averages across multiple hosts. It features a modern terminal UI and can be used as a blocking scheduler for GPU jobs.

### Usage

Simple examples:

```bash
56
57
# Local node, all GPUs
hytop gpu
one's avatar
one committed
58

59
60
# Two nodes, 0.5-second interval
hytop -H node01,node02 -n 0.5 gpu
one's avatar
one committed
61
62
63
64

# Exit with code 0 when all monitored GPUs are available
hytop gpu --devices 0,1 --wait-idle

65
66
67
# Wait for GPUs to be idle for 30 seconds before exiting
hytop gpu --devices 0,1 --wait-idle --wait-idle-seconds 30

one's avatar
one committed
68
69
# Wait at most 300s for availability (exit 124 on timeout)
hytop gpu --devices 0,1 --wait-idle --timeout 300
one's avatar
one committed
70
71
72
73

# Fine-grained columns (output order follows show-flag order)
hytop gpu --showtemp --showpower
hytop gpu --showpower --showtemp
one's avatar
one committed
74
75
76
77
78
```

Queue jobs in shared environments:

```bash
79
if hytop -H node01,node02 gpu --timeout 300 --wait-idle; then
one's avatar
one committed
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
  echo "GPUs available, starting workload..."
  # YOUR COMMAND HERE (e.g., python train.py)
else
  echo "Error: GPUs not available in time, aborting pipeline."
fi
```

### Exit Codes

Designed to be script-friendly:

* `0`: Availability condition met (GPUs are idle).
* `124`: Timeout reached before the availability condition was met.
* `130`: Interrupted by the user (Ctrl+C).
* `2`: Argument or input error.

one's avatar
one committed
96
97
98
99
100
101
### Fine-grained metric flags

`hytop gpu` uses formatted `hy-smi --json` output and supports a subset of `hy-smi` `--show*` flags:

- `--showtemp`: GPU core temperature (`Temp`)
- `--showpower`: average package power (`AvgPwr`, plus `AvgPwr@window`)
102
- `--showsclk`: sclk frequency (`sclk`)
one's avatar
one committed
103
104
105
106
- `--showmemuse`: VRAM usage (`VRAM%`)
- `--showuse`: GPU utilization (`GPU%`, plus `GPU%@window`)

If no `--show*` flags are specified, hytop defaults to:
107
`--showtemp --showpower --showsclk --showmemuse --showuse`.
one's avatar
one committed
108

one's avatar
one committed
109
110
## Development

one's avatar
one committed
111
Clone the repo and run `make setup` to create the virtual environment, install all dependencies (including dev), and configure pre-commit hooks:
one's avatar
one committed
112

one's avatar
one committed
113
114
115
116
117
```bash
make setup
```

Common development commands:
one's avatar
one committed
118
119

```bash
one's avatar
one committed
120
121
122
123
124
125
make format     # Auto-fix and format code (ruff)
make lint       # Check code style and errors without modifying files
make test       # Run all unit tests (pytest)
make bump part=patch  # Bump version (patch/minor/major or X.Y.Z)
make clean      # Remove build caches and the virtual environment
```
one's avatar
one committed
126

one's avatar
one committed
127
### Version bump
one's avatar
one committed
128

one's avatar
one committed
129
Version is sourced from `src/hytop/__init__.py` (`__version__`).
one's avatar
one committed
130

one's avatar
one committed
131
132
133
134
135
```bash
make bump part=patch          # 0.1.0 -> 0.1.1
make bump part=minor          # 0.1.1 -> 0.2.0
make bump part=major          # 0.2.0 -> 1.0.0
make bump part="set 1.2.3"   # set an explicit version
one's avatar
one committed
136
```