Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
flash-attention
Commits
43798966
Commit
43798966
authored
Dec 30, 2022
by
Tri Dao
Browse files
[Docs] Fix formatting
parent
3c7cbfc1
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
2 additions
and
2 deletions
+2
-2
training/README.md
training/README.md
+2
-2
No files found.
training/README.md
View file @
43798966
...
...
@@ -156,13 +156,13 @@ python run.py experiment=pile/gpt3-2.7B-flash-hdim128 trainer.devices=8 # 2.7B
```
The default parameters are set for 8 x A100 80GB. We train with bf16 by default.
To train with rotary embedding, run the experiments
`pile/gpt3{s,m,l,xl
**
-flash-rotary
**
.
To train with rotary embedding, run the experiments
`pile/gpt3{s,m,l,xl
}
-flash-rotary
`
.
### Training options
**Gradient accumulation**
: to adjust device batch size to fit into GPU memory
(the global batch size stays the same, and gradient accumulation is calculated
automatically), set `
datamodule.batch_size=blah
**
.
automatically), set
`datamodule.batch_size=blah
`
.
**Multi-node**
: to train on multiple nodes, add
`trainer.num_nodes=blah`
.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment