README.md 3.88 KB
Newer Older
one's avatar
one committed
1
2
3
4
# hytop - monitoring tools

## Quick start

one's avatar
one committed
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
### Install from PyPI

Using `pipx` (recommended):

```bash
pipx install hytop
hytop --help
```

Using `uv`:

```bash
uv tool install hytop
hytop --help
```

### Install from source

23
uv:
one's avatar
one committed
24

one's avatar
one committed
25
```bash
one's avatar
one committed
26
27
28
uv run hytop --help
```

29
pip:
one's avatar
one committed
30
31

```bash
32
33
34
35
36
37
38
39
pip install .
hytop --help
```

pipx:

```bash
pipx install .
40
hytop --help
one's avatar
one committed
41
42
```

one's avatar
one committed
43
## Prerequisites
one's avatar
one committed
44
45
46

- Python >= 3.10
- Python packages: `rich`, `typer`
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
- Passwordless SSH for remote 

## `hytop`

```bash
# Show the version number
hytop --version

# Specify a timeout for the subcommand
hytop --timeout 300 [COMMAND]

# 0.5-second interval and 5-second rolling window for the subcommand
hytop -n 0.5 --window 5 [COMMAND]

# Specify a list of nodes for the subcommand
hytop -H node01,node02 [COMMAND]
```
one's avatar
one committed
64
65
66
67
68
69
70
71
72
73

## `hytop gpu`

A lightweight script for live `hy-smi` polling with rolling averages across multiple hosts. It features a modern terminal UI and can be used as a blocking scheduler for GPU jobs.

### Usage

Simple examples:

```bash
74
75
# Local node, all GPUs
hytop gpu
one's avatar
one committed
76

77
78
# Two nodes, 0.5-second interval
hytop -H node01,node02 -n 0.5 gpu
one's avatar
one committed
79
80
81
82

# Exit with code 0 when all monitored GPUs are available
hytop gpu --devices 0,1 --wait-idle

83
84
85
# Wait for GPUs to be idle for 30 seconds before exiting
hytop gpu --devices 0,1 --wait-idle --wait-idle-seconds 30

one's avatar
one committed
86
87
# Wait at most 300s for availability (exit 124 on timeout)
hytop gpu --devices 0,1 --wait-idle --timeout 300
one's avatar
one committed
88
89
90
91

# Fine-grained columns (output order follows show-flag order)
hytop gpu --showtemp --showpower
hytop gpu --showpower --showtemp
one's avatar
one committed
92
93
94
95
96
```

Queue jobs in shared environments:

```bash
97
if hytop -H node01,node02 gpu --timeout 300 --wait-idle; then
one's avatar
one committed
98
99
100
101
102
103
104
  echo "GPUs available, starting workload..."
  # YOUR COMMAND HERE (e.g., python train.py)
else
  echo "Error: GPUs not available in time, aborting pipeline."
fi
```

one's avatar
one committed
105
### Exit codes
one's avatar
one committed
106
107
108
109
110
111
112
113

Designed to be script-friendly:

* `0`: Availability condition met (GPUs are idle).
* `124`: Timeout reached before the availability condition was met.
* `130`: Interrupted by the user (Ctrl+C).
* `2`: Argument or input error.

one's avatar
one committed
114
115
116
117
118
119
### Fine-grained metric flags

`hytop gpu` uses formatted `hy-smi --json` output and supports a subset of `hy-smi` `--show*` flags:

- `--showtemp`: GPU core temperature (`Temp`)
- `--showpower`: average package power (`AvgPwr`, plus `AvgPwr@window`)
120
- `--showsclk`: sclk frequency (`sclk`)
one's avatar
one committed
121
122
123
124
- `--showmemuse`: VRAM usage (`VRAM%`)
- `--showuse`: GPU utilization (`GPU%`, plus `GPU%@window`)

If no `--show*` flags are specified, hytop defaults to:
125
`--showtemp --showpower --showsclk --showmemuse --showuse`.
one's avatar
one committed
126

one's avatar
one committed
127
128
## Development

one's avatar
one committed
129
Clone the repo and run `make setup` to create the virtual environment, install all dependencies (including dev), and configure pre-commit hooks:
one's avatar
one committed
130

one's avatar
one committed
131
132
133
134
135
```bash
make setup
```

Common development commands:
one's avatar
one committed
136
137

```bash
one's avatar
one committed
138
139
140
141
142
143
make format     # Auto-fix and format code (ruff)
make lint       # Check code style and errors without modifying files
make test       # Run all unit tests (pytest)
make bump part=patch  # Bump version (patch/minor/major or X.Y.Z)
make clean      # Remove build caches and the virtual environment
```
one's avatar
one committed
144

one's avatar
one committed
145
### Version bump
one's avatar
one committed
146

one's avatar
one committed
147
148
149
150
151
Version is managed automatically via `bump-my-version`. Running the bump command will:
1. Update `__version__` in `src/hytop/__init__.py`
2. Update `current_version` in `pyproject.toml`
3. Create a commit (e.g., `[hytop] Bump version: 0.1.1 → 0.1.2`)
4. Create a tag (e.g., `hytop-0.1.2`)
one's avatar
one committed
152

one's avatar
one committed
153
```bash
one's avatar
one committed
154
155
make bump part=patch          # 0.1.1 -> 0.1.2
make bump part=minor          # 0.1.2 -> 0.2.0
one's avatar
one committed
156
make bump part=major          # 0.2.0 -> 1.0.0
one's avatar
one committed
157
make bump part=1.2.3          # set an explicit version
one's avatar
one committed
158
```
one's avatar
one committed
159
160
161

### Publish

one's avatar
one committed
162
Releases are automatically published to PyPI via GitHub Actions when pushing a version tag.
one's avatar
one committed
163
164

```bash
one's avatar
one committed
165
# 1. Bump version (auto-commits and auto-tags)
one's avatar
one committed
166
167
make bump part=patch

one's avatar
one committed
168
169
# 2. Push commits and tags to trigger GitHub Actions release
git push --follow-tags
one's avatar
one committed
170
171
172
173
174
175
176
```

To test building distributions locally:

```bash
make build
```