README.md 4.73 KB
Newer Older
one's avatar
one committed
1
2
3
4
# hytop - monitoring tools

## Quick start

one's avatar
one committed
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
### Install from PyPI

Using `pipx` (recommended):

```bash
pipx install hytop
hytop --help
```

Using `uv`:

```bash
uv tool install hytop
hytop --help
```

### Install from source

23
uv:
one's avatar
one committed
24

one's avatar
one committed
25
```bash
one's avatar
one committed
26
27
28
uv run hytop --help
```

29
pip:
one's avatar
one committed
30
31

```bash
32
33
34
35
36
37
38
39
pip install .
hytop --help
```

pipx:

```bash
pipx install .
40
hytop --help
one's avatar
one committed
41
42
```

one's avatar
one committed
43
## Prerequisites
one's avatar
one committed
44
45
46

- Python >= 3.10
- Python packages: `rich`, `typer`
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
- Passwordless SSH for remote 

## `hytop`

```bash
# Show the version number
hytop --version

# Specify a timeout for the subcommand
hytop --timeout 300 [COMMAND]

# 0.5-second interval and 5-second rolling window for the subcommand
hytop -n 0.5 --window 5 [COMMAND]

# Specify a list of nodes for the subcommand
hytop -H node01,node02 [COMMAND]
one's avatar
one committed
63
64
65

# Specify a list of nodes with non-standard ssh ports for the subcommand
hytop -H node01:3333,node02:3333 [COMMAND]
66
```
one's avatar
one committed
67

one's avatar
one committed
68
69
70
71
72
73
74
75
76
77
### SSH transport

`hytop` uses a lightweight SSH pull model and enables SSH connection reuse by default in the core layer (applies to all subcommands using SSH collection):

- `ControlMaster=auto`
- `ControlPersist=30s`
- `ControlPath=~/.ssh/hytop-%C`
- `ServerAliveInterval=5`
- `ServerAliveCountMax=1`

one's avatar
one committed
78
79
80
81
82
83
84
85
86
## `hytop gpu`

A lightweight script for live `hy-smi` polling with rolling averages across multiple hosts. It features a modern terminal UI and can be used as a blocking scheduler for GPU jobs.

### Usage

Simple examples:

```bash
87
88
# Local node, all GPUs
hytop gpu
one's avatar
one committed
89

90
91
# Two nodes, 0.5-second interval
hytop -H node01,node02 -n 0.5 gpu
one's avatar
one committed
92
93
94
95

# Exit with code 0 when all monitored GPUs are available
hytop gpu --devices 0,1 --wait-idle

96
97
98
# Wait for GPUs to be idle for 30 seconds before exiting
hytop gpu --devices 0,1 --wait-idle --wait-idle-seconds 30

one's avatar
one committed
99
100
# Wait at most 300s for availability (exit 124 on timeout)
hytop gpu --devices 0,1 --wait-idle --timeout 300
one's avatar
one committed
101
102
103
104

# Fine-grained columns (output order follows show-flag order)
hytop gpu --showtemp --showpower
hytop gpu --showpower --showtemp
one's avatar
one committed
105
106
107
108
109
```

Queue jobs in shared environments:

```bash
110
if hytop -H node01,node02 gpu --timeout 300 --wait-idle; then
one's avatar
one committed
111
112
113
114
115
116
117
  echo "GPUs available, starting workload..."
  # YOUR COMMAND HERE (e.g., python train.py)
else
  echo "Error: GPUs not available in time, aborting pipeline."
fi
```

one's avatar
one committed
118
### Exit codes
one's avatar
one committed
119
120
121
122
123
124
125
126

Designed to be script-friendly:

* `0`: Availability condition met (GPUs are idle).
* `124`: Timeout reached before the availability condition was met.
* `130`: Interrupted by the user (Ctrl+C).
* `2`: Argument or input error.

one's avatar
one committed
127
128
129
130
131
132
### Fine-grained metric flags

`hytop gpu` uses formatted `hy-smi --json` output and supports a subset of `hy-smi` `--show*` flags:

- `--showtemp`: GPU core temperature (`Temp`)
- `--showpower`: average package power (`AvgPwr`, plus `AvgPwr@window`)
133
- `--showsclk`: sclk frequency (`sclk`)
one's avatar
one committed
134
135
136
137
- `--showmemuse`: VRAM usage (`VRAM%`)
- `--showuse`: GPU utilization (`GPU%`, plus `GPU%@window`)

If no `--show*` flags are specified, hytop defaults to:
138
`--showtemp --showpower --showsclk --showmemuse --showuse`.
one's avatar
one committed
139

one's avatar
one committed
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
## `hytop net`

Lightweight pull-based network monitor for Ethernet and InfiniBand across one or more hosts.

### Usage

```bash
# Local host, auto-discover eth+ib interfaces
hytop net

# Two hosts, 0.5-second interval
hytop -H node01,node02 -n 0.5 net

# IB-only monitoring
hytop net --kind ib

# Include only selected interfaces
hytop net --ifaces eth0,mlx5_0/p1

# Stop after 60 seconds (returns 124 on timeout)
hytop --timeout 60 net
```

one's avatar
one committed
163
164
## Development

one's avatar
one committed
165
Clone the repo and run `make setup` to create the virtual environment, install all dependencies (including dev), and configure pre-commit hooks:
one's avatar
one committed
166

one's avatar
one committed
167
168
169
170
171
```bash
make setup
```

Common development commands:
one's avatar
one committed
172
173

```bash
one's avatar
one committed
174
175
176
177
178
179
make format     # Auto-fix and format code (ruff)
make lint       # Check code style and errors without modifying files
make test       # Run all unit tests (pytest)
make bump part=patch  # Bump version (patch/minor/major or X.Y.Z)
make clean      # Remove build caches and the virtual environment
```
one's avatar
one committed
180

one's avatar
one committed
181
### Version bump
one's avatar
one committed
182

one's avatar
one committed
183
184
185
186
187
Version is managed automatically via `bump-my-version`. Running the bump command will:
1. Update `__version__` in `src/hytop/__init__.py`
2. Update `current_version` in `pyproject.toml`
3. Create a commit (e.g., `[hytop] Bump version: 0.1.1 → 0.1.2`)
4. Create a tag (e.g., `hytop-0.1.2`)
one's avatar
one committed
188

one's avatar
one committed
189
```bash
one's avatar
one committed
190
191
make bump part=patch          # 0.1.1 -> 0.1.2
make bump part=minor          # 0.1.2 -> 0.2.0
one's avatar
one committed
192
make bump part=major          # 0.2.0 -> 1.0.0
one's avatar
one committed
193
make bump part=1.2.3          # set an explicit version
one's avatar
one committed
194
```
one's avatar
one committed
195
196
197

### Publish

one's avatar
one committed
198
Releases are automatically published to PyPI via GitHub Actions when pushing a version tag.
one's avatar
one committed
199
200

```bash
one's avatar
one committed
201
# 1. Bump version (auto-commits and auto-tags)
one's avatar
one committed
202
203
make bump part=patch

one's avatar
one committed
204
205
# 2. Push commits and tags to trigger GitHub Actions release
git push --follow-tags
one's avatar
one committed
206
207
208
209
210
211
212
```

To test building distributions locally:

```bash
make build
```