Flax CLM script (#12023) (15b498f3) · Commits · chenpangpang / transformers

Unverified Commit 15b498f3 authored Jun 11, 2021 by

Suraj Patil Committed by GitHub Jun 11, 2021

Flax CLM script (#12023)

* first draft

* max_seq_length => block_size

* fix arg names

* fix typos

* fix loss calculation

* add max examples, fix  train eval steps, metrics

* optimizer mask

* fix perpelexity, metric logging

* fix logging

* data_collator = > data_loader

* refactor loss_fn

* support single GPU

* pass distributed to write_metric

* fix jitting

* fix single device training

* fix single device metrics

* close inner progress bars once finished

* add overwrite_cache arg

* ifx dataset caching issue

* add more logs

* few small fixes,

* address nicholas suggestions

* fix docstr

* address patricks suggestions

* make flake happy

* pass new new_dropout_rng to apply_gradients

* reset train metrics after every epoc

* remove distributed logis, small fixes

parent e47765d8

Expand all Show whitespace changes

Inline Side-by-side

0 → 100644

View file @ 15b498f3

This diff is collapsed.

Please register or to comment