Commits · b9a768b3ffa80c4c19d024f9f42d5917e7d8109e · chenpangpang / transformers

"vscode:/vscode.git/clone" did not exist on "002915aa2ad7d7826294fabd7ba4e6297772768c"

30 Mar, 2022 1 commit
- [examples] max samples can't be bigger than the len of dataset (#16501) · a73281e3
  Stas Bekman authored Mar 30, 2022
```
* [examples] max samples can't be bigger than then len of dataset

* do tf and flax
```
  a73281e3
01 Mar, 2022 1 commit
- Update TF LM examples (#15855) · 3f2e6368
  Joao Gante authored Mar 01, 2022
  
  3f2e6368
12 Jan, 2022 1 commit

use block_size instead of max_seq_length in tf run_clm example (#15036) · 27b819b0

Russell Klopfer authored Jan 12, 2022



* use block_size instead of max_seq_length

* fixup

* remove pad_to_block_size
Co-authored-by: Russell Klopfer <russell@kloper.us>

27b819b0

06 Dec, 2021 1 commit
- [urls to hub] Replace outdated model tags with their now-canonical pipeline types (#14617) · 6cdc3a78
  Julien Chaumond authored Dec 06, 2021
```
* Replace outdated model tags with their now-canonical pipeline types

* spam the CI till it's green
```
  6cdc3a78
22 Nov, 2021 1 commit

Switch from using sum for flattening lists of lists in group_texts (#14472) · 69e16abf

Nicholas Broad authored Nov 22, 2021



* remove sum for list flattening

* change to chain(*)

* make chain object a list

* delete empty lines

per sgugger's suggestions
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

69e16abf

11 Nov, 2021 1 commit
- Fixing requirements for TF LM models and use correct model mappings (#14372) · 7f20bf0d
  Matt authored Nov 11, 2021
```
* Fixing requirements for TF LM models and use correct model mappings

* make style
```
  7f20bf0d
21 Oct, 2021 1 commit
- Replace "Masked" with "Causal" in TF CLM example (#14014) · f9c16b02
  Christopher Akiki authored Oct 21, 2021
  
  f9c16b02
31 Aug, 2021 1 commit
- Fixed CLM model still using MODEL_FOR_MASKED_LM_MAPPING (#13002) · 702f4a49
  Matt authored Aug 31, 2021
  
  702f4a49
28 Aug, 2021 1 commit

examples: only use keep_linebreaks when reading TXT files (#13320) · 4046e66e

Stefan Schweter authored Aug 28, 2021

* examples: only use keep_linebreaks when reading TXT files for all CLM examples

* examples: only use keep_linebreaks when reading TXT files for all CLM examples

* examples: only use keep_linebreaks when reading TXT files for all CLM examples

4046e66e

27 Aug, 2021 1 commit

examples: add keep_linebreaks option to CLM examples (#13150) · 319d840b

Stefan Schweter authored Aug 27, 2021

* examples: add keep_linebreaks option to text dataset loader for all CLM examples

* examples: introduce new keep_linebreaks option as data argument in CLM examples

319d840b

28 Jul, 2021 1 commit

Correct validation_split_percentage argument from int (ex:5) to float (0.05) (#12897) · f3d0866e

Elysium1436 authored Jul 27, 2021



* Fixed train_test_split test_size argument

* `Seq2SeqTrainer` set max_length and num_beams only when non None  (#12899)

* set max_length and num_beams only when non None

* fix instance variables

* fix code style

* [FLAX] Minor fixes in CLM example (#12914)

* readme: fix retrieval of vocab size for flax clm example

* examples: fix flax clm example when using training/evaluation files

* Fix module path for symbolic_trace example
Co-authored-by: cchen-dialpad <47165889+cchen-dialpad@users.noreply.github.com>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>

f3d0866e

08 Jul, 2021 1 commit
- Fix group_lengths for short datasets (#12558) · 6f1adc43
  Sylvain Gugger authored Jul 08, 2021
  
  6f1adc43
01 Jul, 2021 1 commit

Validation split added: custom data files @sgugger, @patil-suraj (#12407) · d5b8fe3b

Souvic Chakraborty authored Jul 01, 2021



* Validation split added: custom data files

Validation split added in case of no validation file and loading custom data

* Updated documentation with custom file usage

Updated documentation with custom file usage

* Update README.md

* Update README.md

* Update README.md

* Made some suggested stylistic changes

* Used logger instead of print.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Made similar changes to add validation split

In case of a missing validation file, a validation split will be used now.

* max_train_samples to be used for training only

max_train_samples got misplaced, now corrected so that it is applied on training data only, not whole data.

* styled

* changed ordering

* Improved language of documentation
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Improved language of documentation
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fixed styling issue

* Update run_mlm.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

d5b8fe3b

28 Jun, 2021 1 commit

Tensorflow LM examples (#12358) · 7e22609e

Matt authored Jun 28, 2021

* Tensorflow MLM example

* Add CLM example

* Style fixes, adding missing checkpoint code from the CLM example

* Fix TPU training, avoid massive dataset warnings

* Fix incorrect training length calculation for multi-GPU training

* Fix incorrect training length calculation for multi-GPU training

* Refactors and nitpicks from the review

* Style pass

* Adding README

7e22609e