Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
flash-attention
Commits
4a6eaa9f
"vscode:/vscode.git/clone" did not exist on "0d3e3072ee2775ee3fa2b0403e1629525790f1c9"
Commit
4a6eaa9f
authored
Nov 29, 2022
by
Tri Dao
Browse files
Update configs, add results
parent
0bf5e500
Changes
30
Hide whitespace changes
Inline
Side-by-side
Showing
10 changed files
with
80 additions
and
6 deletions
+80
-6
training/configs/experiment/pile/gpt3l-hf.yaml
training/configs/experiment/pile/gpt3l-hf.yaml
+16
-0
training/configs/experiment/pile/gpt3m-flash.yaml
training/configs/experiment/pile/gpt3m-flash.yaml
+1
-1
training/configs/experiment/pile/gpt3m-hf.yaml
training/configs/experiment/pile/gpt3m-hf.yaml
+11
-0
training/configs/experiment/pile/gpt3s-hf.yaml
training/configs/experiment/pile/gpt3s-hf.yaml
+12
-0
training/configs/experiment/pile/gpt3xl-flash-8k.yaml
training/configs/experiment/pile/gpt3xl-flash-8k.yaml
+1
-1
training/configs/experiment/pile/gpt3xl-flash-rotary-60B.yaml
...ning/configs/experiment/pile/gpt3xl-flash-rotary-60B.yaml
+1
-1
training/configs/experiment/pile/gpt3xl-flash-rotary-8k.yaml
training/configs/experiment/pile/gpt3xl-flash-rotary-8k.yaml
+1
-1
training/configs/experiment/pile/gpt3xl-flash-rotary.yaml
training/configs/experiment/pile/gpt3xl-flash-rotary.yaml
+1
-1
training/configs/experiment/pile/gpt3xl-flash.yaml
training/configs/experiment/pile/gpt3xl-flash.yaml
+1
-1
training/configs/experiment/pile/gpt3xl-hf.yaml
training/configs/experiment/pile/gpt3xl-hf.yaml
+35
-0
No files found.
training/configs/experiment/pile/gpt3l-hf.yaml
0 → 100644
View file @
4a6eaa9f
# @package _global_
defaults
:
-
/experiment/pile/gpt3s-hf.yaml
model
:
config
:
n_embd
:
1536
n_head
:
16
n_layer
:
24
datamodule
:
batch_size
:
2
train
:
optimizer
:
lr
:
2.5e-4
training/configs/experiment/pile/gpt3m-flash.yaml
View file @
4a6eaa9f
...
...
@@ -9,7 +9,7 @@ defaults:
# mlp_checkpoint_lvl: 1
datamodule
:
batch_size
:
${eval:"4 if ${train.gpu_mem} < 24 else (8 if ${train.gpu_mem} < 40 else 16)"}
batch_size
:
${eval:"4 if ${train.gpu_mem} < 24 else (8 if ${train.gpu_mem} < 40 else
(
16
if ${train.gpu_mem} < 80 else 32)
)"}
train
:
optimizer
:
...
...
training/configs/experiment/pile/gpt3m-hf.yaml
0 → 100644
View file @
4a6eaa9f
# @package _global_
defaults
:
-
/experiment/pile/gpt3s-hf.yaml
-
override /model/gpt2model
:
gpt2-medium
datamodule
:
batch_size
:
4
train
:
optimizer
:
lr
:
3.0e-4
training/configs/experiment/pile/gpt3s-hf.yaml
0 → 100644
View file @
4a6eaa9f
# @package _global_
defaults
:
-
/experiment/pile/base.yaml
-
override /model
:
gpt2-hf
-
override /model/gpt2model
:
gpt2-small
datamodule
:
batch_size
:
8
train
:
# Use the standard torch.nn.CrossEntropyLoss
loss_fn
:
null
training/configs/experiment/pile/gpt3xl-flash-8k.yaml
View file @
4a6eaa9f
# @package _global_
defaults
:
-
/experiment/pile/gpt
2
xl-flash.yaml
-
/experiment/pile/gpt
3
xl-flash.yaml
datamodule
:
max_length
:
8192
...
...
training/configs/experiment/pile/gpt3xl-flash-rotary-60B.yaml
View file @
4a6eaa9f
# @package _global_
defaults
:
-
/experiment/pile/gpt
2
xl-flash-rotary.yaml
-
/experiment/pile/gpt
3
xl-flash-rotary.yaml
trainer
:
max_steps
:
60000
...
...
training/configs/experiment/pile/gpt3xl-flash-rotary-8k.yaml
View file @
4a6eaa9f
# @package _global_
defaults
:
-
/experiment/pile/gpt
2
xl-flash-8k.yaml
-
/experiment/pile/gpt
3
xl-flash-8k.yaml
model
:
config
:
...
...
training/configs/experiment/pile/gpt3xl-flash-rotary.yaml
View file @
4a6eaa9f
# @package _global_
defaults
:
-
/experiment/pile/gpt
2
xl-flash.yaml
-
/experiment/pile/gpt
3
xl-flash.yaml
model
:
config
:
...
...
training/configs/experiment/pile/gpt3xl-flash.yaml
View file @
4a6eaa9f
...
...
@@ -10,7 +10,7 @@ model:
n_layer
:
24
datamodule
:
batch_size
:
${eval:"1 if ${train.gpu_mem} < 24 else (2 if ${train.gpu_mem} < 40 else (4 if ${train.gpu} < 80 else 8))"}
batch_size
:
${eval:"1 if ${train.gpu_mem} < 24 else (2 if ${train.gpu_mem} < 40 else (4 if ${train.gpu
_mem
} < 80 else 8))"}
train
:
global_batch_size
:
512
...
...
training/configs/experiment/pile/gpt3xl-hf.yaml
0 → 100644
View file @
4a6eaa9f
# @package _global_
defaults
:
-
/experiment/pile/gpt3s-hf.yaml
-
override /optimizer
:
adamw-zero
model
:
config
:
n_embd
:
2048
n_head
:
16
n_layer
:
24
datamodule
:
batch_size
:
2
train
:
global_batch_size
:
512
optimizer
:
lr
:
2.0e-4
scheduler
:
t_initial
:
300000
trainer
:
strategy
:
_target_
:
src.utils.ddp_zero1.DDPStrategyZero1
find_unused_parameters
:
False
gradient_as_bucket_view
:
True
max_steps
:
400000
val_check_interval
:
${eval:1000 * ${.accumulate_grad_batches}}
callbacks
:
model_checkpoint
:
every_n_train_steps
:
1000
model_checkpoint_progress
:
every_n_train_steps
:
12500
fault_tolerant
:
False
# Saving takes too long
Prev
1
2
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment