README.md 4.62 KB
Newer Older
one's avatar
one committed
1
2
3
4
# hytop - monitoring tools

## Quick start

one's avatar
one committed
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
### Install from PyPI

Using `pipx` (recommended):

```bash
pipx install hytop
hytop --help
```

Using `uv`:

```bash
uv tool install hytop
hytop --help
```

### Install from source

23
uv:
one's avatar
one committed
24

one's avatar
one committed
25
```bash
one's avatar
one committed
26
27
28
uv run hytop --help
```

29
pip:
one's avatar
one committed
30
31

```bash
32
33
34
35
36
37
38
39
pip install .
hytop --help
```

pipx:

```bash
pipx install .
40
hytop --help
one's avatar
one committed
41
42
```

one's avatar
one committed
43
## Prerequisites
one's avatar
one committed
44
45
46

- Python >= 3.10
- Python packages: `rich`, `typer`
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
- Passwordless SSH for remote 

## `hytop`

```bash
# Show the version number
hytop --version

# Specify a timeout for the subcommand
hytop --timeout 300 [COMMAND]

# 0.5-second interval and 5-second rolling window for the subcommand
hytop -n 0.5 --window 5 [COMMAND]

# Specify a list of nodes for the subcommand
hytop -H node01,node02 [COMMAND]
```
one's avatar
one committed
64

one's avatar
one committed
65
66
67
68
69
70
71
72
73
74
### SSH transport

`hytop` uses a lightweight SSH pull model and enables SSH connection reuse by default in the core layer (applies to all subcommands using SSH collection):

- `ControlMaster=auto`
- `ControlPersist=30s`
- `ControlPath=~/.ssh/hytop-%C`
- `ServerAliveInterval=5`
- `ServerAliveCountMax=1`

one's avatar
one committed
75
76
77
78
79
80
81
82
83
## `hytop gpu`

A lightweight script for live `hy-smi` polling with rolling averages across multiple hosts. It features a modern terminal UI and can be used as a blocking scheduler for GPU jobs.

### Usage

Simple examples:

```bash
84
85
# Local node, all GPUs
hytop gpu
one's avatar
one committed
86

87
88
# Two nodes, 0.5-second interval
hytop -H node01,node02 -n 0.5 gpu
one's avatar
one committed
89
90
91
92

# Exit with code 0 when all monitored GPUs are available
hytop gpu --devices 0,1 --wait-idle

93
94
95
# Wait for GPUs to be idle for 30 seconds before exiting
hytop gpu --devices 0,1 --wait-idle --wait-idle-seconds 30

one's avatar
one committed
96
97
# Wait at most 300s for availability (exit 124 on timeout)
hytop gpu --devices 0,1 --wait-idle --timeout 300
one's avatar
one committed
98
99
100
101

# Fine-grained columns (output order follows show-flag order)
hytop gpu --showtemp --showpower
hytop gpu --showpower --showtemp
one's avatar
one committed
102
103
104
105
106
```

Queue jobs in shared environments:

```bash
107
if hytop -H node01,node02 gpu --timeout 300 --wait-idle; then
one's avatar
one committed
108
109
110
111
112
113
114
  echo "GPUs available, starting workload..."
  # YOUR COMMAND HERE (e.g., python train.py)
else
  echo "Error: GPUs not available in time, aborting pipeline."
fi
```

one's avatar
one committed
115
### Exit codes
one's avatar
one committed
116
117
118
119
120
121
122
123

Designed to be script-friendly:

* `0`: Availability condition met (GPUs are idle).
* `124`: Timeout reached before the availability condition was met.
* `130`: Interrupted by the user (Ctrl+C).
* `2`: Argument or input error.

one's avatar
one committed
124
125
126
127
128
129
### Fine-grained metric flags

`hytop gpu` uses formatted `hy-smi --json` output and supports a subset of `hy-smi` `--show*` flags:

- `--showtemp`: GPU core temperature (`Temp`)
- `--showpower`: average package power (`AvgPwr`, plus `AvgPwr@window`)
130
- `--showsclk`: sclk frequency (`sclk`)
one's avatar
one committed
131
132
133
134
- `--showmemuse`: VRAM usage (`VRAM%`)
- `--showuse`: GPU utilization (`GPU%`, plus `GPU%@window`)

If no `--show*` flags are specified, hytop defaults to:
135
`--showtemp --showpower --showsclk --showmemuse --showuse`.
one's avatar
one committed
136

one's avatar
one committed
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
## `hytop net`

Lightweight pull-based network monitor for Ethernet and InfiniBand across one or more hosts.

### Usage

```bash
# Local host, auto-discover eth+ib interfaces
hytop net

# Two hosts, 0.5-second interval
hytop -H node01,node02 -n 0.5 net

# IB-only monitoring
hytop net --kind ib

# Include only selected interfaces
hytop net --ifaces eth0,mlx5_0/p1

# Stop after 60 seconds (returns 124 on timeout)
hytop --timeout 60 net
```

one's avatar
one committed
160
161
## Development

one's avatar
one committed
162
Clone the repo and run `make setup` to create the virtual environment, install all dependencies (including dev), and configure pre-commit hooks:
one's avatar
one committed
163

one's avatar
one committed
164
165
166
167
168
```bash
make setup
```

Common development commands:
one's avatar
one committed
169
170

```bash
one's avatar
one committed
171
172
173
174
175
176
make format     # Auto-fix and format code (ruff)
make lint       # Check code style and errors without modifying files
make test       # Run all unit tests (pytest)
make bump part=patch  # Bump version (patch/minor/major or X.Y.Z)
make clean      # Remove build caches and the virtual environment
```
one's avatar
one committed
177

one's avatar
one committed
178
### Version bump
one's avatar
one committed
179

one's avatar
one committed
180
181
182
183
184
Version is managed automatically via `bump-my-version`. Running the bump command will:
1. Update `__version__` in `src/hytop/__init__.py`
2. Update `current_version` in `pyproject.toml`
3. Create a commit (e.g., `[hytop] Bump version: 0.1.1 → 0.1.2`)
4. Create a tag (e.g., `hytop-0.1.2`)
one's avatar
one committed
185

one's avatar
one committed
186
```bash
one's avatar
one committed
187
188
make bump part=patch          # 0.1.1 -> 0.1.2
make bump part=minor          # 0.1.2 -> 0.2.0
one's avatar
one committed
189
make bump part=major          # 0.2.0 -> 1.0.0
one's avatar
one committed
190
make bump part=1.2.3          # set an explicit version
one's avatar
one committed
191
```
one's avatar
one committed
192
193
194

### Publish

one's avatar
one committed
195
Releases are automatically published to PyPI via GitHub Actions when pushing a version tag.
one's avatar
one committed
196
197

```bash
one's avatar
one committed
198
# 1. Bump version (auto-commits and auto-tags)
one's avatar
one committed
199
200
make bump part=patch

one's avatar
one committed
201
202
# 2. Push commits and tags to trigger GitHub Actions release
git push --follow-tags
one's avatar
one committed
203
204
205
206
207
208
209
```

To test building distributions locally:

```bash
make build
```