Unverified Commit ca2a55e9 authored by Younes Belkada's avatar Younes Belkada Committed by GitHub
Browse files

BLOOM (#17474)



* adding template

* update model

* model update

* update conf for debug model

* update conversion

* update conversion script

* update conversion script

* fix missing keys check

* add tests to test the tokenizer in the local machine

* Change variable name

* add tests on xnli dataset

* add more description

* add descriptions + clearer code

* clearer code

* adding new tests + skipping few tests because of env problems

* change comment

* add dtype on the configuration

* add test embeddings

* add hardcoded test

* fix dtype issue

* adding torch.float16 to config

* adding more metrics (min, max, mean)

* add sum

* now the test passes with almost equal

* add files for conversion - test passes on cpu  gpu

* add final changes

* cleaning code

* add new args in the docstring

* fix one liner function

* remove macros

* remove forward attention

* clean up init funtion

* add comments on the issue

* rm scale mask softmax

* do make style

* fix dtype in init

* fixing for loop on att probs

* fix style with black

* fix style + doc error

* fix and debug CI errors (docs + style)

* some updates

- change new operations
- finally add scaled softmax
- added new args in the config

* make use cache working

* add changes

- save sharded models
- final changes on the modeling script

* add changes

- comment on alibi
- add TODO on seq length

* test commit

- added a text to test the commit
Co-authored-by: default avatarthomasw21 <24695242+thomasw21@users.noreply.github.com>

* final changes

- attention mask change
- generation works on BS176b
Co-authored-by: default avatarthomasw21 <24695242+thomasw21@users.noreply.github.com>

* changes - model + conversion

* move to correct dir

* put ,

* fex fixes

* fix tokenizer autodoc

* fix minor CI issues

* fix minor CI issues

* fix minor CI issues

* fix style issue

* fix minor import issues

* fix few issues

* remove def main on the test

* add require torch

* replace decorator with 'with'

* fix style

* change to bloom

* add quick fix tokenizer

* fix tokenizer file

* fix tokenizer

- merge tests
- small fixes

* fix import issue

* add bloom to readme

* fix consistency

* Update docs/source/en/model_doc/bloom.mdx
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

fix comment issues on file headers
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix doc issue

* small fix - modeling test

* some changes

- refactor some code
- taking into account reviews
- more tests should pass
- removed pruning tests

* remove useless division

* more tests should pass

* more tests should pass

* more tests should pass

* let's try this one

-add alibi offset
- remove all permutes to make the grad operations work
- finger crossed

* refactor

- refactor code
- style changes
- add new threshold for test

* major changes

- change BLOOM to Bloom
- add quick doc on bloom.mdx
- move embeddings test on modeling test

* modify readme

* small fixes

* small fix

- better threshold for a test

* remove old test file from fetcher

* fix small typo

* major change

- change BloomLMHead to BloomForCausalLM

* remove onnx config

* major changes

- refactor the code
- remove asserts
- change tol for test

* make style

* small change

* adding a slow test + commenting old ones for now

* make style

* Apply suggestions from code review
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make style

* fix duplicates

* cleaning comments on config

* clean a bit conversion file

* refacor a bit modeling file

* refactor tokenizer file

* fix tokenization test issue

* fix tokenization issue #2

* fix tokenization issue second try

* fix test issue

* make style + add suggestions

* change test fetcher

* try this one

- slow tests should pass
- finger crossed

* possible final changes

* make style

* try fix padding side issue

* fix side

* fix padding issue

* fix ko-readme

* fix config auto

* cleaning modeling file

* keep bloom in caps in ko

* update config docs

* remove pretraining_pp

* remove model parallel

* update config

- add correct config files

* fix duplicates

* fix fetcher

* fix refactor issue

- remove divide function

* try to remove alibi

* small fixes

- fix alibi
- remove seq length
- refactor a bit the code

* put correct values

- fix bos and eos token ids

* fix attention mask loop
Co-authored-by: default avatarthomasw21 <24695242+thomasw21@users.noreply.github.com>

* small fixes:

- remove skip bias add

* small fixes

- fix typo in readme
- fix typos in config

* small changes

- remove a test
- add reconstruction test
- change config

* small changes

- change Scaled Softmax to BloomScaledSoftmax

* small fixes

- fix alibi dtype

* major changes

- removing explicit dtype when loading modules
- fixing test args (torch_dtype=auto)
- add dosctring

* fix readmes

* major changes

- now bloom supports alibi shifting
- refactor a bit the code
- better test tolerance now

* refactor a bit

* refactor a bit

* put correct name on test

* change docstring

* small changes

- fix docstring modeling
- fix test tolerance

* fix small nit

- take dtype from tensors in the conversion script

* minor fix

- fix mdx issue

* minor fix

- change config docstring

* forward contrib credits from PR14084

* Apply suggestions from code review
Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>

* apply modifications
Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>

* resolve softmax upcast

* Apply suggestions from code review
Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: default avatarNiklas Muennighoff <n.muennighoff@gmail.com>

* final changes modeling
Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>

* Merge commit 'd156898f

'

* merge commit

* Apply suggestions from code review
Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>

* apply suggestions

Apply suggestions from Stas comments
Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>

* Fix gradient checkpointing
Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>

* add slow but exact

* add accelerate compatibility
Co-authored-by: default avatarNicolas Patry <Narsil@users.noreply.github.com>

* forward contrib credits
Co-authored-by: default avatarthomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: default avatarsgugger <sgugger@users.noreply.github.com>
Co-authored-by: default avatarpatrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: default avatarNiklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: default avatarLysandreJik <LysandreJik@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* fix torch device on tests

* make style

* Apply suggestions from code review
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* fix nits

Co-authored-by: patrickvonplaten<patrickvonplaten@users.noreply.github.com>

* remove final nits

* fix doc

- add more details on the doc
- add links to checkpoints

* Update src/transformers/__init__.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply suggestions
Co-authored-by: default avatarsgugger <sgugger@users.noreply.github.com>

* put test torchscript to false

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: default avatarjustheuristic <justheuristic@gmail.com>

* fix alibi

- create alibi only once

* add small doc

* make quality

* replace torch.nn

* remove token type emb

* fix fused op + output bias

* add fused op

- now can control fused operation from config

* remove fused op

* make quality

* small changes

- remove unsed args on config
- removed bias gelu file
- make the model torchscriptable
- add torchscript slow tests

* Update src/transformers/models/bloom/modeling_bloom.py

* fix slow

* make style

* add accelerate support

* add bloom to deepspeed tests

* minor changes

* Apply suggestions from code review
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* minor change

* slow tests pass

* Apply suggestions from code review
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/model_doc/bloom.mdx
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* minor changes:

- change docstring
- add link to paper
Co-authored-by: default avatarThomwolf <thomwolf@gmail.com>
Co-authored-by: default avatarThomas Wolf <thomas@huggingface.co>
Co-authored-by: default avatarthomasw21 <24695242+thomasw21@users.noreply.github.com>
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: default avatarsIncerass <sheng.s@berkeley.edu>
Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
Co-authored-by: default avatarNiklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: default avatarNicolas Patry <Narsil@users.noreply.github.com>
Co-authored-by: default avatarthomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: default avatarsgugger <sgugger@users.noreply.github.com>
Co-authored-by: default avatarpatrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: default avatarLysandreJik <LysandreJik@users.noreply.github.com>
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: default avatarjustheuristic <justheuristic@gmail.com>
Co-authored-by: default avatarStas Bekman <stas@stason.org>
parent dfc76b25
...@@ -240,6 +240,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h ...@@ -240,6 +240,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. 1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. 1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. 1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BLOOM](https://huggingface.co/docs/transformers/main/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry. 1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel. 1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot. 1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
......
...@@ -221,6 +221,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는 ...@@ -221,6 +221,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. 1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. 1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. 1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BLOOM](https://huggingface.co/docs/transformers/main/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry. 1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel. 1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot. 1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
......
...@@ -245,6 +245,7 @@ conda install -c huggingface transformers ...@@ -245,6 +245,7 @@ conda install -c huggingface transformers
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (来自 Google Research) 伴随论文 [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) 由 Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed 发布。 1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (来自 Google Research) 伴随论文 [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) 由 Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed 发布。
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。 1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。 1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。
1. **[BLOOM](https://huggingface.co/docs/transformers/main/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (来自 Alexa) 伴随论文 [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) 由 Adrian de Wynter and Daniel J. Perry 发布。 1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (来自 Alexa) 伴随论文 [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) 由 Adrian de Wynter and Daniel J. Perry 发布。
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (来自 Google Research) 伴随论文 [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) 由 Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel 发布。 1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (来自 Google Research) 伴随论文 [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) 由 Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel 发布。
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (来自 Inria/Facebook/Sorbonne) 伴随论文 [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) 由 Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot 发布。 1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (来自 Inria/Facebook/Sorbonne) 伴随论文 [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) 由 Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot 发布。
......
...@@ -257,6 +257,7 @@ conda install -c huggingface transformers ...@@ -257,6 +257,7 @@ conda install -c huggingface transformers
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. 1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. 1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. 1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BLOOM](https://huggingface.co/docs/transformers/main/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry. 1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel. 1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot. 1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
......
...@@ -176,6 +176,8 @@ ...@@ -176,6 +176,8 @@
title: Blenderbot title: Blenderbot
- local: model_doc/blenderbot-small - local: model_doc/blenderbot-small
title: Blenderbot Small title: Blenderbot Small
- local: model_doc/bloom
title: BLOOM
- local: model_doc/bort - local: model_doc/bort
title: BORT title: BORT
- local: model_doc/byt5 - local: model_doc/byt5
......
...@@ -63,6 +63,7 @@ The library currently contains JAX, PyTorch and TensorFlow implementations, pret ...@@ -63,6 +63,7 @@ The library currently contains JAX, PyTorch and TensorFlow implementations, pret
1. **[BigBird-RoBERTa](model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. 1. **[BigBird-RoBERTa](model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. 1. **[Blenderbot](model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. 1. **[BlenderbotSmall](model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BLOOM](model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry. 1. **[BORT](model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel. 1. **[ByT5](model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot. 1. **[CamemBERT](model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
...@@ -193,6 +194,7 @@ Flax), PyTorch, and/or TensorFlow. ...@@ -193,6 +194,7 @@ Flax), PyTorch, and/or TensorFlow.
| BigBird-Pegasus | ❌ | ❌ | ✅ | ❌ | ❌ | | BigBird-Pegasus | ❌ | ❌ | ✅ | ❌ | ❌ |
| Blenderbot | ✅ | ✅ | ✅ | ✅ | ✅ | | Blenderbot | ✅ | ✅ | ✅ | ✅ | ✅ |
| BlenderbotSmall | ✅ | ✅ | ✅ | ✅ | ✅ | | BlenderbotSmall | ✅ | ✅ | ✅ | ✅ | ✅ |
| BLOOM | ❌ | ✅ | ✅ | ❌ | ❌ |
| CamemBERT | ✅ | ✅ | ✅ | ✅ | ❌ | | CamemBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| CANINE | ✅ | ❌ | ✅ | ❌ | ❌ | | CANINE | ✅ | ❌ | ✅ | ❌ | ❌ |
| CLIP | ✅ | ✅ | ✅ | ✅ | ✅ | | CLIP | ✅ | ✅ | ✅ | ✅ | ✅ |
......
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# BLOOM
## Overview
The BLOOM model has been proposed with its various versions through the [BigScience Workshop](https://bigscience.huggingface.co/). BigScience is inspired by other open science initiatives where researchers have pooled their time and resources to collectively achieve a higher impact.
The architecture of BLOOM is essentially similar to GPT3 (auto-regressive model for next token prediction), but has been trained on different 46 languages including code.
Several smaller versions of the models have been trained on the same dataset. BLOOM is available in the following versions:
- [bloom-350m](https://huggingface.co/bigscience/bloom-350m)
- [bloom-760m](https://huggingface.co/bigscience/bloom-760m)
- [bloom-1b3](https://huggingface.co/bigscience/bloom-1b3)
- [bloom-2b5](https://huggingface.co/bigscience/bloom-2b5)
- [bloom-6b3](https://huggingface.co/bigscience/bloom-6b3)
- [bloom](https://huggingface.co/bigscience/bloom) (175B parameters)
## BloomConfig
[[autodoc]] BloomConfig
- all
## BloomModel
[[autodoc]] BloomModel
- forward
## BloomTokenizerFast
[[autodoc]] BloomTokenizerFast
- all
## BloomForCausalLM
[[autodoc]] BloomForCausalLM
- forward
\ No newline at end of file
...@@ -156,6 +156,7 @@ _import_structure = { ...@@ -156,6 +156,7 @@ _import_structure = {
"BlenderbotSmallConfig", "BlenderbotSmallConfig",
"BlenderbotSmallTokenizer", "BlenderbotSmallTokenizer",
], ],
"models.bloom": ["BLOOM_PRETRAINED_CONFIG_ARCHIVE_MAP", "BloomConfig"],
"models.bort": [], "models.bort": [],
"models.byt5": ["ByT5Tokenizer"], "models.byt5": ["ByT5Tokenizer"],
"models.camembert": ["CAMEMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "CamembertConfig"], "models.camembert": ["CAMEMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "CamembertConfig"],
...@@ -497,6 +498,7 @@ else: ...@@ -497,6 +498,7 @@ else:
_import_structure["models.big_bird"].append("BigBirdTokenizerFast") _import_structure["models.big_bird"].append("BigBirdTokenizerFast")
_import_structure["models.blenderbot"].append("BlenderbotTokenizerFast") _import_structure["models.blenderbot"].append("BlenderbotTokenizerFast")
_import_structure["models.blenderbot_small"].append("BlenderbotSmallTokenizerFast") _import_structure["models.blenderbot_small"].append("BlenderbotSmallTokenizerFast")
_import_structure["models.bloom"].append("BloomTokenizerFast")
_import_structure["models.camembert"].append("CamembertTokenizerFast") _import_structure["models.camembert"].append("CamembertTokenizerFast")
_import_structure["models.clip"].append("CLIPTokenizerFast") _import_structure["models.clip"].append("CLIPTokenizerFast")
_import_structure["models.convbert"].append("ConvBertTokenizerFast") _import_structure["models.convbert"].append("ConvBertTokenizerFast")
...@@ -858,6 +860,14 @@ else: ...@@ -858,6 +860,14 @@ else:
"BigBirdPegasusPreTrainedModel", "BigBirdPegasusPreTrainedModel",
] ]
) )
_import_structure["models.bloom"].extend(
[
"BLOOM_PRETRAINED_MODEL_ARCHIVE_LIST",
"BloomForCausalLM",
"BloomModel",
"BloomPreTrainedModel",
]
)
_import_structure["models.blenderbot"].extend( _import_structure["models.blenderbot"].extend(
[ [
"BLENDERBOT_PRETRAINED_MODEL_ARCHIVE_LIST", "BLENDERBOT_PRETRAINED_MODEL_ARCHIVE_LIST",
...@@ -2755,6 +2765,7 @@ if TYPE_CHECKING: ...@@ -2755,6 +2765,7 @@ if TYPE_CHECKING:
BlenderbotSmallConfig, BlenderbotSmallConfig,
BlenderbotSmallTokenizer, BlenderbotSmallTokenizer,
) )
from .models.bloom import BLOOM_PRETRAINED_CONFIG_ARCHIVE_MAP, BloomConfig
from .models.byt5 import ByT5Tokenizer from .models.byt5 import ByT5Tokenizer
from .models.camembert import CAMEMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, CamembertConfig from .models.camembert import CAMEMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, CamembertConfig
from .models.canine import CANINE_PRETRAINED_CONFIG_ARCHIVE_MAP, CanineConfig, CanineTokenizer from .models.canine import CANINE_PRETRAINED_CONFIG_ARCHIVE_MAP, CanineConfig, CanineTokenizer
...@@ -3064,6 +3075,7 @@ if TYPE_CHECKING: ...@@ -3064,6 +3075,7 @@ if TYPE_CHECKING:
from .models.big_bird import BigBirdTokenizerFast from .models.big_bird import BigBirdTokenizerFast
from .models.blenderbot import BlenderbotTokenizerFast from .models.blenderbot import BlenderbotTokenizerFast
from .models.blenderbot_small import BlenderbotSmallTokenizerFast from .models.blenderbot_small import BlenderbotSmallTokenizerFast
from .models.bloom import BloomTokenizerFast
from .models.camembert import CamembertTokenizerFast from .models.camembert import CamembertTokenizerFast
from .models.clip import CLIPTokenizerFast from .models.clip import CLIPTokenizerFast
from .models.convbert import ConvBertTokenizerFast from .models.convbert import ConvBertTokenizerFast
...@@ -3382,6 +3394,12 @@ if TYPE_CHECKING: ...@@ -3382,6 +3394,12 @@ if TYPE_CHECKING:
BlenderbotSmallModel, BlenderbotSmallModel,
BlenderbotSmallPreTrainedModel, BlenderbotSmallPreTrainedModel,
) )
from .models.bloom import (
BLOOM_PRETRAINED_MODEL_ARCHIVE_LIST,
BloomForCausalLM,
BloomModel,
BloomPreTrainedModel,
)
from .models.camembert import ( from .models.camembert import (
CAMEMBERT_PRETRAINED_MODEL_ARCHIVE_LIST, CAMEMBERT_PRETRAINED_MODEL_ARCHIVE_LIST,
CamembertForCausalLM, CamembertForCausalLM,
......
...@@ -31,6 +31,7 @@ from . import ( ...@@ -31,6 +31,7 @@ from . import (
bigbird_pegasus, bigbird_pegasus,
blenderbot, blenderbot,
blenderbot_small, blenderbot_small,
bloom,
bort, bort,
byt5, byt5,
camembert, camembert,
......
...@@ -38,6 +38,7 @@ CONFIG_MAPPING_NAMES = OrderedDict( ...@@ -38,6 +38,7 @@ CONFIG_MAPPING_NAMES = OrderedDict(
("bigbird_pegasus", "BigBirdPegasusConfig"), ("bigbird_pegasus", "BigBirdPegasusConfig"),
("blenderbot", "BlenderbotConfig"), ("blenderbot", "BlenderbotConfig"),
("blenderbot-small", "BlenderbotSmallConfig"), ("blenderbot-small", "BlenderbotSmallConfig"),
("bloom", "BloomConfig"),
("camembert", "CamembertConfig"), ("camembert", "CamembertConfig"),
("canine", "CanineConfig"), ("canine", "CanineConfig"),
("clip", "CLIPConfig"), ("clip", "CLIPConfig"),
...@@ -51,7 +52,6 @@ CONFIG_MAPPING_NAMES = OrderedDict( ...@@ -51,7 +52,6 @@ CONFIG_MAPPING_NAMES = OrderedDict(
("deberta", "DebertaConfig"), ("deberta", "DebertaConfig"),
("deberta-v2", "DebertaV2Config"), ("deberta-v2", "DebertaV2Config"),
("decision_transformer", "DecisionTransformerConfig"), ("decision_transformer", "DecisionTransformerConfig"),
("decision_transformer", "DecisionTransformerConfig"),
("deit", "DeiTConfig"), ("deit", "DeiTConfig"),
("detr", "DetrConfig"), ("detr", "DetrConfig"),
("distilbert", "DistilBertConfig"), ("distilbert", "DistilBertConfig"),
...@@ -155,6 +155,7 @@ CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict( ...@@ -155,6 +155,7 @@ CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict(
("bigbird_pegasus", "BIGBIRD_PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("bigbird_pegasus", "BIGBIRD_PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("blenderbot", "BLENDERBOT_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("blenderbot", "BLENDERBOT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("blenderbot-small", "BLENDERBOT_SMALL_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("blenderbot-small", "BLENDERBOT_SMALL_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("bloom", "BLOOM_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("camembert", "CAMEMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("camembert", "CAMEMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("canine", "CANINE_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("canine", "CANINE_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("clip", "CLIP_PRETRAINED_CONFIG_ARCHIVE_MAP"), ("clip", "CLIP_PRETRAINED_CONFIG_ARCHIVE_MAP"),
...@@ -262,6 +263,7 @@ MODEL_NAMES_MAPPING = OrderedDict( ...@@ -262,6 +263,7 @@ MODEL_NAMES_MAPPING = OrderedDict(
("bigbird_pegasus", "BigBird-Pegasus"), ("bigbird_pegasus", "BigBird-Pegasus"),
("blenderbot", "Blenderbot"), ("blenderbot", "Blenderbot"),
("blenderbot-small", "BlenderbotSmall"), ("blenderbot-small", "BlenderbotSmall"),
("bloom", "BLOOM"),
("bort", "BORT"), ("bort", "BORT"),
("byt5", "ByT5"), ("byt5", "ByT5"),
("camembert", "CamemBERT"), ("camembert", "CamemBERT"),
...@@ -362,7 +364,6 @@ MODEL_NAMES_MAPPING = OrderedDict( ...@@ -362,7 +364,6 @@ MODEL_NAMES_MAPPING = OrderedDict(
("van", "VAN"), ("van", "VAN"),
("vilt", "ViLT"), ("vilt", "ViLT"),
("vision-encoder-decoder", "Vision Encoder decoder"), ("vision-encoder-decoder", "Vision Encoder decoder"),
("vision-encoder-decoder", "Vision Encoder decoder"),
("vision-text-dual-encoder", "VisionTextDualEncoder"), ("vision-text-dual-encoder", "VisionTextDualEncoder"),
("visual_bert", "VisualBERT"), ("visual_bert", "VisualBERT"),
("vit", "ViT"), ("vit", "ViT"),
......
...@@ -37,6 +37,7 @@ MODEL_MAPPING_NAMES = OrderedDict( ...@@ -37,6 +37,7 @@ MODEL_MAPPING_NAMES = OrderedDict(
("bigbird_pegasus", "BigBirdPegasusModel"), ("bigbird_pegasus", "BigBirdPegasusModel"),
("blenderbot", "BlenderbotModel"), ("blenderbot", "BlenderbotModel"),
("blenderbot-small", "BlenderbotSmallModel"), ("blenderbot-small", "BlenderbotSmallModel"),
("bloom", "BloomModel"),
("camembert", "CamembertModel"), ("camembert", "CamembertModel"),
("canine", "CanineModel"), ("canine", "CanineModel"),
("clip", "CLIPModel"), ("clip", "CLIPModel"),
...@@ -50,7 +51,6 @@ MODEL_MAPPING_NAMES = OrderedDict( ...@@ -50,7 +51,6 @@ MODEL_MAPPING_NAMES = OrderedDict(
("deberta", "DebertaModel"), ("deberta", "DebertaModel"),
("deberta-v2", "DebertaV2Model"), ("deberta-v2", "DebertaV2Model"),
("decision_transformer", "DecisionTransformerModel"), ("decision_transformer", "DecisionTransformerModel"),
("decision_transformer", "DecisionTransformerModel"),
("decision_transformer_gpt2", "DecisionTransformerGPT2Model"), ("decision_transformer_gpt2", "DecisionTransformerGPT2Model"),
("deit", "DeiTModel"), ("deit", "DeiTModel"),
("detr", "DetrModel"), ("detr", "DetrModel"),
...@@ -144,6 +144,7 @@ MODEL_FOR_PRETRAINING_MAPPING_NAMES = OrderedDict( ...@@ -144,6 +144,7 @@ MODEL_FOR_PRETRAINING_MAPPING_NAMES = OrderedDict(
("bart", "BartForConditionalGeneration"), ("bart", "BartForConditionalGeneration"),
("bert", "BertForPreTraining"), ("bert", "BertForPreTraining"),
("big_bird", "BigBirdForPreTraining"), ("big_bird", "BigBirdForPreTraining"),
("bloom", "BloomForCausalLM"),
("camembert", "CamembertForMaskedLM"), ("camembert", "CamembertForMaskedLM"),
("ctrl", "CTRLLMHeadModel"), ("ctrl", "CTRLLMHeadModel"),
("data2vec-text", "Data2VecTextForMaskedLM"), ("data2vec-text", "Data2VecTextForMaskedLM"),
...@@ -194,6 +195,7 @@ MODEL_WITH_LM_HEAD_MAPPING_NAMES = OrderedDict( ...@@ -194,6 +195,7 @@ MODEL_WITH_LM_HEAD_MAPPING_NAMES = OrderedDict(
("big_bird", "BigBirdForMaskedLM"), ("big_bird", "BigBirdForMaskedLM"),
("bigbird_pegasus", "BigBirdPegasusForConditionalGeneration"), ("bigbird_pegasus", "BigBirdPegasusForConditionalGeneration"),
("blenderbot-small", "BlenderbotSmallForConditionalGeneration"), ("blenderbot-small", "BlenderbotSmallForConditionalGeneration"),
("bloom", "BloomForCausalLM"),
("camembert", "CamembertForMaskedLM"), ("camembert", "CamembertForMaskedLM"),
("convbert", "ConvBertForMaskedLM"), ("convbert", "ConvBertForMaskedLM"),
("ctrl", "CTRLLMHeadModel"), ("ctrl", "CTRLLMHeadModel"),
...@@ -252,6 +254,7 @@ MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = OrderedDict( ...@@ -252,6 +254,7 @@ MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = OrderedDict(
("bigbird_pegasus", "BigBirdPegasusForCausalLM"), ("bigbird_pegasus", "BigBirdPegasusForCausalLM"),
("blenderbot", "BlenderbotForCausalLM"), ("blenderbot", "BlenderbotForCausalLM"),
("blenderbot-small", "BlenderbotSmallForCausalLM"), ("blenderbot-small", "BlenderbotSmallForCausalLM"),
("bloom", "BloomForCausalLM"),
("camembert", "CamembertForCausalLM"), ("camembert", "CamembertForCausalLM"),
("ctrl", "CTRLLMHeadModel"), ("ctrl", "CTRLLMHeadModel"),
("data2vec-text", "Data2VecTextForCausalLM"), ("data2vec-text", "Data2VecTextForCausalLM"),
......
...@@ -76,6 +76,7 @@ else: ...@@ -76,6 +76,7 @@ else:
("bigbird_pegasus", ("PegasusTokenizer", "PegasusTokenizerFast" if is_tokenizers_available() else None)), ("bigbird_pegasus", ("PegasusTokenizer", "PegasusTokenizerFast" if is_tokenizers_available() else None)),
("blenderbot", ("BlenderbotTokenizer", "BlenderbotTokenizerFast")), ("blenderbot", ("BlenderbotTokenizer", "BlenderbotTokenizerFast")),
("blenderbot-small", ("BlenderbotSmallTokenizer", None)), ("blenderbot-small", ("BlenderbotSmallTokenizer", None)),
("bloom", (None, "BloomTokenizerFast" if is_tokenizers_available() else None)),
("byt5", ("ByT5Tokenizer", None)), ("byt5", ("ByT5Tokenizer", None)),
( (
"camembert", "camembert",
......
# flake8: noqa
# There's no way to ignore "F401 '...' imported but unused" warnings in this
# module, but to preserve other warnings. So, don't check this module at all.
# Copyright 2022 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_tokenizers_available, is_torch_available
_import_structure = {
"configuration_bloom": [
"BLOOM_PRETRAINED_CONFIG_ARCHIVE_MAP",
"BloomConfig",
],
}
try:
if not is_tokenizers_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["tokenization_bloom_fast"] = ["BloomTokenizerFast"]
try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_bloom"] = [
"BLOOM_PRETRAINED_MODEL_ARCHIVE_LIST",
"BloomForCausalLM",
"BloomModel",
"BloomPreTrainedModel",
]
if TYPE_CHECKING:
from .configuration_bloom import BLOOM_PRETRAINED_CONFIG_ARCHIVE_MAP, BloomConfig
try:
if not is_tokenizers_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .tokenization_bloom_fast import BloomTokenizerFast
try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_bloom import (
BLOOM_PRETRAINED_MODEL_ARCHIVE_LIST,
BloomForCausalLM,
BloomModel,
BloomPreTrainedModel,
)
else:
import sys
sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
# coding=utf-8
# Copyright 2022 the Big Science Workshop and HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" Bloom configuration"""
from ...configuration_utils import PretrainedConfig
from ...utils import logging
logger = logging.get_logger(__name__)
BLOOM_PRETRAINED_CONFIG_ARCHIVE_MAP = {
"bigscience/bloom": "https://huggingface.co/bigscience/bloom/resolve/main/config.json",
"bigscience/bloom-350m": "https://huggingface.co/bigscience/bloom-350m/blob/main/config.json",
"bigscience/bloom-760m": "https://huggingface.co/bigscience/bloom-760m/blob/main/config.json",
"bigscience/bloom-1b3": "https://huggingface.co/bigscience/bloom-1b3/blob/main/config.json",
"bigscience/bloom-2b5": "https://huggingface.co/bigscience/bloom-2b5/blob/main/config.json",
"bigscience/bloom-6b3": "https://huggingface.co/bigscience/bloom-6b3/blob/main/config.json",
}
class BloomConfig(PretrainedConfig):
"""
This is the configuration class to store the configuration of a [`BloomModel`]. It is used to instantiate a Bloom
model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
defaults will yield a similar configuration to the Bloom architecture
[bigscience/bloom](https://huggingface.co/bigscience/bloom).
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
Args:
vocab_size (`int`, *optional*, defaults to 50257):
Vocabulary size of the Bloom model. Defines the number of different tokens that can be represented by the
`inputs_ids` passed when calling [`BloomModel`].
hidden_size (`int`, *optional*, defaults to 768):
Dimensionality of the embeddings and hidden states.
n_layer (`int`, *optional*, defaults to 12):
Number of hidden layers in the Transformer encoder.
n_head (`int`, *optional*, defaults to 12):
Number of attention heads for each attention layer in the Transformer encoder.
attn_pdrop (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention.
layer_norm_epsilon (`float`, *optional*, defaults to 1e-5):
The epsilon to use in the layer normalization layers.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
apply_residual_connection_post_layernorm (`bool`, *optional*, defaults to `False`):
If enabled, use the layer norm of the hidden states as the residual in the transformer blocks
skip_bias_add (`bool`, *optional*, defaults to `True`):
If set to `True`, it will skip bias add for each linear layer in the transformer blocks
skip_bias_add_qkv (`bool`, *optional*, defaults to `False`):
If set to `True`, it will skip bias add for the first linear layer in the transformer blocks
attention_softmax_in_fp32 (`bool`, *optional*, defaults to `True`):
If set to `True` and the `dtype` is set to `float16` it will scale the input of the Softmax function to
`fp32`
hidden_dropout (`float`, *optional*, defaults to 0.1):
Dropout rate of the dropout function on the bias dropout.
attention_dropout (`float`, *optional*, defaults to 0.1):
Dropout rate applied to the attention probs
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models).
dtype (`str`, *optional*, defaults to `"bfloat16"`):
Precision that has been used for the model's training in Megatron. Please load the model in the correct
precision by doing `model = BloomModel.from_pretrained(model_name, torch_dtype="auto")`.`
pretraining_tp (`int`, *optional*, defaults to `1`):
Experimental feature. Tensor parallelism rank used during pretraining with Megatron. Please refer to [this
document](https://huggingface.co/docs/transformers/parallelism) to understand more about it. This value is
necessary to ensure exact reproducibility of the pretraining results. Please refer to [this
issue](https://github.com/pytorch/pytorch/issues/76232). Note also that this is enabled only when
`slow_but_exact=True`.
slow_but_exact (`bool`, *optional*, defaults to `False`):
Experimental feature. Whether to use slow but exact implementation of the attention mechanism. While
merging the TP rank tensors, due to slicing operations the results may be slightly different between the
model trained on Megatron and our model. Please refer to [this
issue](https://github.com/pytorch/pytorch/issues/76232). A solution to obtain more accurate results is to
enable this feature. Enabling this will hurt the computational time of the inference. Will be probably
resolved in the future once the main model has been fine-tuned with TP_rank=1.
Example:
```python
>>> from transformers import BloomModel, BloomConfig
>>> # Initializing a Bloom configuration
>>> configuration = BloomConfig()
>>> # Initializing a model from the configuration
>>> model = BloomModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
```"""
model_type = "bloom"
keys_to_ignore_at_inference = ["past_key_values"]
attribute_map = {
"num_hidden_layers": "n_layer",
"n_head": "num_attention_heads",
"hidden_size": "n_embed",
"dtype": "torch_dtype",
}
def __init__(
self,
vocab_size=250880,
hidden_size=64,
n_layer=2,
n_head=8,
masked_softmax_fusion=True,
layer_norm_epsilon=1e-5,
initializer_range=0.02,
use_cache=False,
bos_token_id=1,
eos_token_id=2,
apply_residual_connection_post_layernorm=False,
hidden_dropout=0.0,
attention_dropout=0.0,
attention_softmax_in_fp32=True,
pretraining_tp=1, # TP rank used when training with megatron
dtype="bfloat16",
slow_but_exact=False,
**kwargs,
):
self.vocab_size = vocab_size
self.hidden_size = hidden_size
self.n_layer = n_layer
self.n_head = n_head
self.masked_softmax_fusion = masked_softmax_fusion
self.layer_norm_epsilon = layer_norm_epsilon
self.initializer_range = initializer_range
self.use_cache = use_cache
self.pretraining_tp = pretraining_tp
self.apply_residual_connection_post_layernorm = apply_residual_connection_post_layernorm
self.hidden_dropout = hidden_dropout
self.attention_dropout = attention_dropout
self.attention_softmax_in_fp32 = attention_softmax_in_fp32
self.bos_token_id = bos_token_id
self.eos_token_id = eos_token_id
self.dtype = dtype
self.slow_but_exact = slow_but_exact
super().__init__(bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
# coding=utf-8
# Copyright 2022 The HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Convert BigScience BLOOM checkpoint."""
import argparse
import json
import os
import re
import torch
from transformers import BloomConfig, BloomModel
from transformers.file_utils import CONFIG_NAME, WEIGHTS_NAME
from transformers.utils import logging
logging.set_verbosity_info()
WEIGHTS_TO_AVERAGE_ENDSWITH = [
"word_embeddings_layernorm.weight",
"word_embeddings_layernorm.bias",
"input_layernorm.weight",
"input_layernorm.bias",
"post_attention_layernorm.weight",
"post_attention_layernorm.bias",
"self_attention.dense.bias",
"mlp.dense_4h_to_h.bias",
"ln_f.weight",
"ln_f.bias",
]
WEIGHTS_WITH_ROW_PARALLELISM_CONTAIN = [
"mlp.dense_4h_to_h.weight",
"self_attention.dense.weight",
]
def layer_name_mapping(key, file):
"""Convert Megatron-DeepSpeed TP/PP weights mapping in transformers PP only"""
# Handle first and last layers
layer_rename_map = {
"word_embeddings.weight": "word_embeddings.weight",
"word_embeddings.norm.weight": "word_embeddings_layernorm.weight",
"word_embeddings.norm.bias": "word_embeddings_layernorm.bias",
"weight": "ln_f.weight",
"bias": "ln_f.bias",
}
if key in layer_rename_map:
return layer_rename_map[key]
# Handle transformer blocks
layer_number = int(re.match(r".*layer_(\d*).*", file)[1])
layer_number -= 3
return f"h.{layer_number}." + key
def get_dtype_size(dtype):
if dtype == torch.bool:
return 1 / 8
bit_search = re.search("[^\d](\d+)$", str(dtype))
if bit_search is None:
raise ValueError(f"`dtype` is not a valid dtype: {dtype}.")
bit_size = int(bit_search.groups()[0])
return bit_size // 8
def convert_bloom_checkpoint_to_pytorch(
bloom_checkpoint_path, bloom_config_file, pytorch_dump_folder_path, shard_model, pretraining_tp
):
# Construct model
if bloom_config_file == "":
config = BloomConfig()
else:
config = BloomConfig.from_json_file(bloom_config_file)
if shard_model:
file_names = os.listdir(bloom_checkpoint_path)
file_names = list(sorted(filter(lambda s: s.startswith("layer") and "model_00" in s, file_names)))
index_dict = {"weight_map": {}, "metadata": {}}
total_size = 0
missing_keys = None
config = BloomConfig()
for j, file in enumerate(file_names):
print("Processing file: {}".format(file))
tensors = None
for i in range(pretraining_tp):
# load all TP files
f_name = file.replace("model_00", f"model_0{i}")
temp = torch.load(os.path.join(bloom_checkpoint_path, f_name), map_location="cpu")
# Rename keys in the transformers names
keys = list(temp.keys())
for key in keys:
temp[layer_name_mapping(key, file)] = temp.pop(key)
if tensors is None:
tensors = temp
else:
for key in tensors.keys():
if any(key.endswith(end) for end in WEIGHTS_TO_AVERAGE_ENDSWITH):
# We average (sum and then divide) some weights accross TP ranks (see https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/olruwase/sync_layer_norms/megatron/training.py#L425)
tensors[key] += temp[key]
else:
# Some weights are RowParallelLinear in Megatron-Deepspeed, others are ColumnParallel
cat_dim = 1 if any(text in key for text in WEIGHTS_WITH_ROW_PARALLELISM_CONTAIN) else 0
# We concatenate these weights accross TP ranks
tensors[key] = torch.cat([tensors[key], temp[key]], dim=cat_dim)
# Divide by the number of TP the weights we want to average
for key in tensors.keys():
if any(key.endswith(end) for end in WEIGHTS_TO_AVERAGE_ENDSWITH):
tensors[key] = tensors[key] / pretraining_tp
torch.save(
tensors,
os.path.join(
pytorch_dump_folder_path,
"pytorch_model_{}-of-{}.bin".format(str(j + 1).zfill(5), str(len(file_names)).zfill(5)),
),
)
for key in tensors.keys():
value = tensors[key]
total_size += value.numel() * get_dtype_size(value.dtype)
if key not in index_dict["weight_map"]:
index_dict["weight_map"][key] = "pytorch_model_{}-of-{}.bin".format(
str(j + 1).zfill(5), str(len(file_names)).zfill(5)
)
config = BloomConfig()
pytorch_config_dump_path = pytorch_dump_folder_path + "/" + CONFIG_NAME
index_dict["metadata"]["total_size"] = total_size
with open(pytorch_config_dump_path, "w", encoding="utf-8") as f:
f.write(config.to_json_string())
with open(os.path.join(pytorch_dump_folder_path, WEIGHTS_NAME + ".index.json"), "w", encoding="utf-8") as f:
json_config = json.dumps(index_dict, indent=2, sort_keys=True) + "\n"
f.write(json_config)
else:
model = BloomModel(config)
file_names = os.listdir(bloom_checkpoint_path)
file_names = list(sorted(filter(lambda s: s.startswith("layer") and "model_00" in s, file_names)))
missing_keys = None
for i, file in enumerate(file_names):
tensors = None
for i in range(pretraining_tp):
# load all TP files
f_name = file.replace("model_00", f"model_0{i}")
temp = torch.load(os.path.join(bloom_checkpoint_path, f_name), map_location="cpu")
# Rename keys in the transformers names
keys = list(temp.keys())
for key in keys:
temp[layer_name_mapping(key, file)] = temp.pop(key)
if tensors is None:
tensors = temp
else:
for key in tensors.keys():
# We average (sum and then divide) some weights accross TP ranks (see https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/olruwase/sync_layer_norms/megatron/training.py#L425)
if any(key.endswith(end) for end in WEIGHTS_TO_AVERAGE_ENDSWITH):
tensors[key] += temp[key]
else:
# Some weights are RowParallelLinear in Megatron-Deepspeed, others are ColumnParallel
cat_dim = 1 if any(text in key for text in WEIGHTS_WITH_ROW_PARALLELISM_CONTAIN) else 0
# We concatenate these weights accross TP ranks
tensors[key] = torch.cat([tensors[key], temp[key]], dim=cat_dim)
# Divide by the number of TP the weights we want to average
for key in tensors.keys():
if any(key.endswith(end) for end in WEIGHTS_TO_AVERAGE_ENDSWITH):
tensors[key] = tensors[key] / pretraining_tp
other_keys = model.load_state_dict(tensors, strict=False)
assert not other_keys.unexpected_keys
if missing_keys is None:
missing_keys = set(other_keys.missing_keys)
else:
missing_keys = missing_keys.intersection(set(other_keys.missing_keys))
assert not missing_keys
# Save pytorch-model
os.makedirs(pytorch_dump_folder_path, exist_ok=True)
pytorch_weights_dump_path = pytorch_dump_folder_path + "/" + WEIGHTS_NAME
pytorch_config_dump_path = pytorch_dump_folder_path + "/" + CONFIG_NAME
print(f"Save PyTorch model to {pytorch_weights_dump_path}")
torch.save(model.state_dict(), pytorch_weights_dump_path)
print(f"Save configuration file to {pytorch_config_dump_path}")
with open(pytorch_config_dump_path, "w", encoding="utf-8") as f:
f.write(config.to_json_string())
if __name__ == "__main__":
parser = argparse.ArgumentParser()
# Required parameters
parser.add_argument(
"--bloom_checkpoint_path",
default=None,
type=str,
required=True,
help="Path to the Megatron-LM checkpoint path.",
)
parser.add_argument(
"--pytorch_dump_folder_path", default=None, type=str, required=True, help="Path to the output PyTorch model."
)
parser.add_argument(
"--bloom_config_file",
default="",
type=str,
help=(
"An optional config json file corresponding to the pre-trained model. \n"
"This specifies the model architecture."
),
)
parser.add_argument(
"--shard_model",
action="store_true",
help="An optional setting to shard the output model \nThis enables sharding the converted checkpoint",
)
parser.add_argument(
"--pretraining_tp",
default=4,
type=int,
help="Pretraining TP rank that has been used when training the model in Megatron-LM \n",
)
args = parser.parse_args()
convert_bloom_checkpoint_to_pytorch(
args.bloom_checkpoint_path,
args.bloom_config_file,
args.pytorch_dump_folder_path,
args.shard_model,
args.pretraining_tp,
)
This diff is collapsed.
# coding=utf-8
# Copyright 2022 The HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tokenization classes for Bloom."""
import json
from typing import TYPE_CHECKING, List, Optional, Tuple
from tokenizers import pre_tokenizers
from ...tokenization_utils_base import BatchEncoding
from ...tokenization_utils_fast import PreTrainedTokenizerFast
from ...utils import logging
if TYPE_CHECKING:
from transformers.pipelines.conversational import Conversation
logger = logging.get_logger(__name__)
VOCAB_FILES_NAMES = {"tokenizer_file": "tokenizer.json"}
PRETRAINED_VOCAB_FILES_MAP = {
"tokenizer_file": {
"bigscience/tokenizer": "https://huggingface.co/bigscience/tokenizer/blob/main/tokenizer.json",
"bigscience/bloom-350m": "https://huggingface.co/bigscience/bloom-350m/blob/main/tokenizer.json",
"bigscience/bloom-760m": "https://huggingface.co/bigscience/bloom-760m/blob/main/tokenizer.json",
"bigscience/bloom-1b3": "https://huggingface.co/bigscience/bloom-1b3/blob/main/tokenizer.json",
"bigscience/bloom-2b5": "https://huggingface.co/bigscience/bloom-2b5/blob/main/tokenizer.json",
"bigscience/bloom-6b3": "https://huggingface.co/bigscience/bloom-2b5/blob/main/tokenizer.json",
"bigscience/bloom": "https://huggingface.co/bigscience/bloom/blob/main/tokenizer.json",
},
}
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
"bigscience/tokenizer": 1024,
"bigscience/bloom-350m": 1024,
"bigscience/bloom-760m": 1024,
"bigscience/bloom-1b3": 1024,
"bigscience/bloom-2b5": 1024,
"bigscience/bloom-6b3": 1024,
"bigscience/bloom": 1024,
}
class BloomTokenizerFast(PreTrainedTokenizerFast):
"""
Construct a "fast" Bloom tokenizer (backed by HuggingFace's *tokenizers* library). Based on byte-level
Byte-Pair-Encoding.
This tokenizer has been trained to treat spaces like parts of the tokens (a bit like sentencepiece) so a word will
be encoded differently whether it is at the beginning of the sentence (without space) or not:
```
>>> from transformers import BloomTokenizerFast
>>> tokenizer = BloomTokenizerFast.from_pretrained("bigscience/bloom")
>>> tokenizer("Hello world")['input_ids']
[15496, 995]
>>> tokenizer(" Hello world")['input_ids']
[18435, 995]
```
You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer, but since
the model was not pretrained this way, it might yield a decrease in performance.
<Tip>
When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`.
</Tip>
This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should
refer to this superclass for more information regarding those methods.
Args:
vocab_file (`str`):
Path to the vocabulary file.
merges_file (`str`):
Path to the merges file.
errors (`str`, *optional*, defaults to `"replace"`):
Paradigm to follow when decoding bytes to UTF-8. See
[bytes.decode](https://docs.python.org/3/library/stdtypes.html#bytes.decode) for more information.
unk_token (`str`, *optional*, defaults to `<|endoftext|>`):
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
token instead.
bos_token (`str`, *optional*, defaults to `<|endoftext|>`):
The beginning of sequence token.
eos_token (`str`, *optional*, defaults to `<|endoftext|>`):
The end of sequence token.
add_prefix_space (`bool`, *optional*, defaults to `False`):
Whether or not to add an initial space to the input. This allows to treat the leading word just as any
other word. (Bloom tokenizer detect beginning of words by the preceding space).
trim_offsets (`bool`, *optional*, defaults to `True`):
Whether or not the post-processing step should trim offsets to avoid including whitespaces.
"""
vocab_files_names = VOCAB_FILES_NAMES
pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
model_input_names = ["input_ids", "attention_mask"]
slow_tokenizer_class = None
def __init__(
self,
vocab_file=None,
merges_file=None,
tokenizer_file=None,
unk_token="<unk>",
bos_token="<s>",
eos_token="</s>",
pad_token="<pad>",
add_prefix_space=False,
**kwargs
):
super().__init__(
vocab_file,
merges_file,
tokenizer_file=tokenizer_file,
unk_token=unk_token,
bos_token=bos_token,
eos_token=eos_token,
pad_token=pad_token,
add_prefix_space=add_prefix_space,
**kwargs,
)
pre_tok_state = json.loads(self.backend_tokenizer.pre_tokenizer.__getstate__())
if pre_tok_state.get("add_prefix_space", add_prefix_space) != add_prefix_space:
pre_tok_class = getattr(pre_tokenizers, pre_tok_state.pop("type"))
pre_tok_state["add_prefix_space"] = add_prefix_space
self.backend_tokenizer.pre_tokenizer = pre_tok_class(**pre_tok_state)
self.add_prefix_space = add_prefix_space
def _batch_encode_plus(self, *args, **kwargs) -> BatchEncoding:
is_split_into_words = kwargs.get("is_split_into_words", False)
if not (self.add_prefix_space or not is_split_into_words):
raise Exception(
f"You need to instantiate {self.__class__.__name__} with add_prefix_space=True to use it with"
" pretokenized inputs."
)
return super()._batch_encode_plus(*args, **kwargs)
def _encode_plus(self, *args, **kwargs) -> BatchEncoding:
is_split_into_words = kwargs.get("is_split_into_words", False)
if not (self.add_prefix_space or not is_split_into_words):
raise Exception(
f"You need to instantiate {self.__class__.__name__} with add_prefix_space=True to use it with"
" pretokenized inputs."
)
return super()._encode_plus(*args, **kwargs)
def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
files = self._tokenizer.model.save(save_directory, name=filename_prefix)
return tuple(files)
def _build_conversation_input_ids(self, conversation: "Conversation") -> List[int]:
"""This corresponds to DialoGPT variants of models."""
input_ids = []
for is_user, text in conversation.iter_texts():
input_ids.extend(self.encode(text, add_special_tokens=False) + [self.eos_token_id])
if len(input_ids) > self.model_max_length:
input_ids = input_ids[-self.model_max_length :]
return input_ids
...@@ -966,6 +966,30 @@ class BlenderbotSmallPreTrainedModel(metaclass=DummyObject): ...@@ -966,6 +966,30 @@ class BlenderbotSmallPreTrainedModel(metaclass=DummyObject):
requires_backends(self, ["torch"]) requires_backends(self, ["torch"])
BLOOM_PRETRAINED_MODEL_ARCHIVE_LIST = None
class BloomForCausalLM(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
class BloomModel(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
class BloomPreTrainedModel(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
CAMEMBERT_PRETRAINED_MODEL_ARCHIVE_LIST = None CAMEMBERT_PRETRAINED_MODEL_ARCHIVE_LIST = None
......
...@@ -52,6 +52,13 @@ class BlenderbotSmallTokenizerFast(metaclass=DummyObject): ...@@ -52,6 +52,13 @@ class BlenderbotSmallTokenizerFast(metaclass=DummyObject):
requires_backends(self, ["tokenizers"]) requires_backends(self, ["tokenizers"])
class BloomTokenizerFast(metaclass=DummyObject):
_backends = ["tokenizers"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tokenizers"])
class CamembertTokenizerFast(metaclass=DummyObject): class CamembertTokenizerFast(metaclass=DummyObject):
_backends = ["tokenizers"] _backends = ["tokenizers"]
......
...@@ -58,6 +58,7 @@ BERT_TINY = "hf-internal-testing/tiny-bert" ...@@ -58,6 +58,7 @@ BERT_TINY = "hf-internal-testing/tiny-bert"
BIGBIRD_PEGASUS_TINY = "hf-internal-testing/tiny-random-bigbird_pegasus" BIGBIRD_PEGASUS_TINY = "hf-internal-testing/tiny-random-bigbird_pegasus"
BIG_BIRD_TINY = "hf-internal-testing/tiny-random-big_bird" BIG_BIRD_TINY = "hf-internal-testing/tiny-random-big_bird"
BLENDERBOT_TINY = "hf-internal-testing/tiny-random-blenderbot" BLENDERBOT_TINY = "hf-internal-testing/tiny-random-blenderbot"
BLOOM_TINY = "bigscience/bigscience-small-testing"
DEBERTA_TINY = "hf-internal-testing/tiny-random-deberta" DEBERTA_TINY = "hf-internal-testing/tiny-random-deberta"
DEBERTA_V2_TINY = "hf-internal-testing/tiny-random-deberta-v2" DEBERTA_V2_TINY = "hf-internal-testing/tiny-random-deberta-v2"
DISTILBERT_TINY = "sshleifer/tiny-distilbert-base-cased" DISTILBERT_TINY = "sshleifer/tiny-distilbert-base-cased"
...@@ -183,6 +184,7 @@ def make_task_cmds(): ...@@ -183,6 +184,7 @@ def make_task_cmds():
"big_bird", "big_bird",
"bigbird_pegasus", "bigbird_pegasus",
"blenderbot", "blenderbot",
"bloom",
"gpt2", "gpt2",
"gpt_neo", "gpt_neo",
"gptj", "gptj",
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment