Unverified Commit 653a79cc authored by abdullaholuk-loodos's avatar abdullaholuk-loodos Committed by GitHub
Browse files

Loodos model cards had errors on "Usage" section. It is fixed. Also...


Loodos model cards had errors on "Usage" section. It is fixed. Also "electra-base-turkish-uncased" model removed from s3 and re-uploaded as "electra-base-turkish-uncased-discriminator". Its README added. (#6921)
Co-authored-by: default avatarAbdullah Oluk <abdullaholuk123@gmail.com>
parent 5a3aec90
...@@ -4,11 +4,11 @@ language: tr ...@@ -4,11 +4,11 @@ language: tr
# Turkish Language Models with Huggingface's Transformers # Turkish Language Models with Huggingface's Transformers
As R&D Team at Loodos, we release cased and uncased versions of most recent language models for Turkish. More details about pretrained models and evaluations on downstream tasks can be found [here](https://github.com/Loodos/turkish-language-models) As R&D Team at Loodos, we release cased and uncased versions of most recent language models for Turkish. More details about pretrained models and evaluations on downstream tasks can be found [here (our repo)](https://github.com/Loodos/turkish-language-models).
# Turkish ALBERT-Base (uncased) # Turkish ALBERT-Base (uncased)
This is ALBERT-Base model which has 12 repeated encore layers with 768 hidden layer size trained on uncased Turkish dataset. This is ALBERT-Base model which has 12 repeated encoder layers with 768 hidden layer size trained on uncased Turkish dataset.
## Usage ## Usage
...@@ -16,16 +16,19 @@ Using AutoModel and AutoTokenizer from Transformers, you can import the model as ...@@ -16,16 +16,19 @@ Using AutoModel and AutoTokenizer from Transformers, you can import the model as
```python ```python
from transformers import AutoModel, AutoTokenizer from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("loodos/albert-base-turkish-uncased", do_lower_case=False, keep_accents=True) tokenizer = AutoTokenizer.from_pretrained("loodos/albert-base-turkish-uncased", do_lower_case=False, keep_accents=True)
model = AutoModel.from_pretrained("loodos/albert-base-turkish-uncased") model = AutoModel.from_pretrained("loodos/albert-base-turkish-uncased")
normalizer = TextNormalization() normalizer = TextNormalization()
normalized_text = normalizer(text, do_lower_case=True) normalized_text = normalizer.normalize(text, do_lower_case=True, is_turkish=True)
tokenizer.tokenize(normalized_text) tokenizer.tokenize(normalized_text)
``` ```
### Notes on Tokenizers ### Notes on Tokenizers
Currently, huggingface's tokenizers (which were written in Python) have a bug concerning letters "ı, i, I, İ" and non-ASCII Turkish specific letters. There are 2 reasons. Currently, Huggingface's tokenizers (which were written in Python) have a bug concerning letters "ı, i, I, İ" and non-ASCII Turkish specific letters. There are two reasons.
1- Vocabulary and sentence piece model is created with NFC/NFKC normalization but tokenizer uses NFD/NFKD. NFD/NFKD normalization changes text that contains Turkish characters I-ı, İ-i, Ç-ç, Ö-ö, Ş-ş, Ğ-ğ, Ü-ü. This causes wrong tokenization, wrong training and loss of information. Some tokens are never trained.(like "şanlıurfa", "öğün", "çocuk" etc.) NFD/NFKD normalization is not proper for Turkish. 1- Vocabulary and sentence piece model is created with NFC/NFKC normalization but tokenizer uses NFD/NFKD. NFD/NFKD normalization changes text that contains Turkish characters I-ı, İ-i, Ç-ç, Ö-ö, Ş-ş, Ğ-ğ, Ü-ü. This causes wrong tokenization, wrong training and loss of information. Some tokens are never trained.(like "şanlıurfa", "öğün", "çocuk" etc.) NFD/NFKD normalization is not proper for Turkish.
...@@ -38,11 +41,12 @@ respectively. However, in Turkish, 'I' and 'İ' are two different letters. ...@@ -38,11 +41,12 @@ respectively. However, in Turkish, 'I' and 'İ' are two different letters.
We opened an [issue](https://github.com/huggingface/transformers/issues/6680) in Huggingface's github repo about this bug. Until it is fixed, in case you want to train your model with uncased data, we provide a simple text normalization module (`TextNormalization()` in the code snippet above) in our [repo](https://github.com/Loodos/turkish-language-models). We opened an [issue](https://github.com/huggingface/transformers/issues/6680) in Huggingface's github repo about this bug. Until it is fixed, in case you want to train your model with uncased data, we provide a simple text normalization module (`TextNormalization()` in the code snippet above) in our [repo](https://github.com/Loodos/turkish-language-models).
# Details and Contact
## Details and Contact
You contact us to ask a question, open an issue or give feedback via our github [repo](https://github.com/Loodos/turkish-language-models). You contact us to ask a question, open an issue or give feedback via our github [repo](https://github.com/Loodos/turkish-language-models).
# Acknowledgments ## Acknowledgments
Many thanks to TFRC Team for providing us cloud TPUs on Tensorflow Research Cloud to train our models. Many thanks to TFRC Team for providing us cloud TPUs on Tensorflow Research Cloud to train our models.
...@@ -4,11 +4,11 @@ language: tr ...@@ -4,11 +4,11 @@ language: tr
# Turkish Language Models with Huggingface's Transformers # Turkish Language Models with Huggingface's Transformers
As R&D Team at Loodos, we release cased and uncased versions of most recent language models for Turkish. More details about pretrained models and evaluations on downstream tasks can be found [here](https://github.com/Loodos/turkish-language-models) As R&D Team at Loodos, we release cased and uncased versions of most recent language models for Turkish. More details about pretrained models and evaluations on downstream tasks can be found [here (our repo)](https://github.com/Loodos/turkish-language-models).
# Turkish BERT-Base (uncased) # Turkish BERT-Base (uncased)
This is BERT-Base model which has 12 encoder layer with 768 hidden layer size trained on uncased Turkish dataset. This is BERT-Base model which has 12 encoder layers with 768 hidden layer size trained on uncased Turkish dataset.
## Usage ## Usage
...@@ -16,16 +16,19 @@ Using AutoModel and AutoTokenizer from Transformers, you can import the model as ...@@ -16,16 +16,19 @@ Using AutoModel and AutoTokenizer from Transformers, you can import the model as
```python ```python
from transformers import AutoModel, AutoTokenizer from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("loodos/bert-base-turkish-uncased", do_lower_case=False) tokenizer = AutoTokenizer.from_pretrained("loodos/bert-base-turkish-uncased", do_lower_case=False)
model = AutoModel.from_pretrained("loodos/bert-base-turkish-uncased") model = AutoModel.from_pretrained("loodos/bert-base-turkish-uncased")
normalizer = TextNormalization() normalizer = TextNormalization()
normalized_text = normalizer(text, do_lower_case=True) normalized_text = normalizer.normalize(text, do_lower_case=True, is_turkish=True)
tokenizer.tokenize(normalized_text) tokenizer.tokenize(normalized_text)
``` ```
### Notes on Tokenizers ### Notes on Tokenizers
Currently, huggingface's tokenizers (which were written in Python) have a bug concerning letters "ı, i, I, İ" and non-ASCII Turkish specific letters. There are 2 reasons. Currently, Huggingface's tokenizers (which were written in Python) have a bug concerning letters "ı, i, I, İ" and non-ASCII Turkish specific letters. There are two reasons.
1- Vocabulary and sentence piece model is created with NFC/NFKC normalization but tokenizer uses NFD/NFKD. NFD/NFKD normalization changes text that contains Turkish characters I-ı, İ-i, Ç-ç, Ö-ö, Ş-ş, Ğ-ğ, Ü-ü. This causes wrong tokenization, wrong training and loss of information. Some tokens are never trained.(like "şanlıurfa", "öğün", "çocuk" etc.) NFD/NFKD normalization is not proper for Turkish. 1- Vocabulary and sentence piece model is created with NFC/NFKC normalization but tokenizer uses NFD/NFKD. NFD/NFKD normalization changes text that contains Turkish characters I-ı, İ-i, Ç-ç, Ö-ö, Ş-ş, Ğ-ğ, Ü-ü. This causes wrong tokenization, wrong training and loss of information. Some tokens are never trained.(like "şanlıurfa", "öğün", "çocuk" etc.) NFD/NFKD normalization is not proper for Turkish.
...@@ -39,11 +42,11 @@ respectively. However, in Turkish, 'I' and 'İ' are two different letters. ...@@ -39,11 +42,11 @@ respectively. However, in Turkish, 'I' and 'İ' are two different letters.
We opened an [issue](https://github.com/huggingface/transformers/issues/6680) in Huggingface's github repo about this bug. Until it is fixed, in case you want to train your model with uncased data, we provide a simple text normalization module (`TextNormalization()` in the code snippet above) in our [repo](https://github.com/Loodos/turkish-language-models). We opened an [issue](https://github.com/huggingface/transformers/issues/6680) in Huggingface's github repo about this bug. Until it is fixed, in case you want to train your model with uncased data, we provide a simple text normalization module (`TextNormalization()` in the code snippet above) in our [repo](https://github.com/Loodos/turkish-language-models).
# Contact ## Details and Contact
You contact us to ask a question, open an issue or give feedback via our github [repo](https://github.com/Loodos/turkish-language-models). You contact us to ask a question, open an issue or give feedback via our github [repo](https://github.com/Loodos/turkish-language-models).
# Acknowledgments ## Acknowledgments
Many thanks to TFRC Team for providing us cloud TPUs on Tensorflow Research Cloud to train our models. Many thanks to TFRC Team for providing us cloud TPUs on Tensorflow Research Cloud to train our models.
...@@ -4,28 +4,31 @@ language: tr ...@@ -4,28 +4,31 @@ language: tr
# Turkish Language Models with Huggingface's Transformers # Turkish Language Models with Huggingface's Transformers
As R&D Team at Loodos, we release cased and uncased versions of most recent language models for Turkish. More details about pretrained models and evaluations on downstream tasks can be found [here](https://github.com/Loodos/turkish-language-models) As R&D Team at Loodos, we release cased and uncased versions of most recent language models for Turkish. More details about pretrained models and evaluations on downstream tasks can be found [here (our repo)](https://github.com/Loodos/turkish-language-models).
# Turkish ELECTRA-Base-discriminator (uncased/64k) # Turkish ELECTRA-Base-discriminator (uncased/64k)
This is ELECTRA-Base model's discriminator which has the same structure with BERT-Base trained on uncased Turkish dataset. This version has a vocab of size 64k different from default, 32k. This is ELECTRA-Base model's discriminator which has the same structure with BERT-Base trained on uncased Turkish dataset. This version has a vocab of size 64k, different from default 32k.
## Usage ## Usage
Using AutoModel and AutoTokenizer from Transformers, you can import the model as described below. Using AutoModelWithLMHead and AutoTokenizer from Transformers, you can import the model as described below.
```python ```python
from transformers import AutoModel, AutoTokenizer from transformers import AutoModel, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("loodos/electra-base-turkish-64k-uncased-discriminator", do_lower_case=False) tokenizer = AutoTokenizer.from_pretrained("loodos/electra-base-turkish-64k-uncased-discriminator", do_lower_case=False)
model = AutoModel.from_pretrained("loodos/electra-base-turkish-64k-uncased-discriminator")
model = AutoModelWithLMHead.from_pretrained("loodos/electra-base-turkish-64k-uncased-discriminator")
normalizer = TextNormalization() normalizer = TextNormalization()
normalized_text = normalizer(text, do_lower_case=True) normalized_text = normalizer.normalize(text, do_lower_case=True, is_turkish=True)
tokenizer.tokenize(normalized_text) tokenizer.tokenize(normalized_text)
``` ```
### Notes on Tokenizers ### Notes on Tokenizers
Currently, huggingface's tokenizers (which were written in Python) have a bug concerning letters "ı, i, I, İ" and non-ASCII Turkish specific letters. There are 2 reasons. Currently, Huggingface's tokenizers (which were written in Python) have a bug concerning letters "ı, i, I, İ" and non-ASCII Turkish specific letters. There are two reasons.
1- Vocabulary and sentence piece model is created with NFC/NFKC normalization but tokenizer uses NFD/NFKD. NFD/NFKD normalization changes text that contains Turkish characters I-ı, İ-i, Ç-ç, Ö-ö, Ş-ş, Ğ-ğ, Ü-ü. This causes wrong tokenization, wrong training and loss of information. Some tokens are never trained.(like "şanlıurfa", "öğün", "çocuk" etc.) NFD/NFKD normalization is not proper for Turkish. 1- Vocabulary and sentence piece model is created with NFC/NFKC normalization but tokenizer uses NFD/NFKD. NFD/NFKD normalization changes text that contains Turkish characters I-ı, İ-i, Ç-ç, Ö-ö, Ş-ş, Ğ-ğ, Ü-ü. This causes wrong tokenization, wrong training and loss of information. Some tokens are never trained.(like "şanlıurfa", "öğün", "çocuk" etc.) NFD/NFKD normalization is not proper for Turkish.
...@@ -38,11 +41,12 @@ respectively. However, in Turkish, 'I' and 'İ' are two different letters. ...@@ -38,11 +41,12 @@ respectively. However, in Turkish, 'I' and 'İ' are two different letters.
We opened an [issue](https://github.com/huggingface/transformers/issues/6680) in Huggingface's github repo about this bug. Until it is fixed, in case you want to train your model with uncased data, we provide a simple text normalization module (`TextNormalization()` in the code snippet above) in our [repo](https://github.com/Loodos/turkish-language-models). We opened an [issue](https://github.com/huggingface/transformers/issues/6680) in Huggingface's github repo about this bug. Until it is fixed, in case you want to train your model with uncased data, we provide a simple text normalization module (`TextNormalization()` in the code snippet above) in our [repo](https://github.com/Loodos/turkish-language-models).
# Details and Contact
## Details and Contact
You contact us to ask a question, open an issue or give feedback via our github [repo](https://github.com/Loodos/turkish-language-models). You contact us to ask a question, open an issue or give feedback via our github [repo](https://github.com/Loodos/turkish-language-models).
# Acknowledgments ## Acknowledgments
Many thanks to TFRC Team for providing us cloud TPUs on Tensorflow Research Cloud to train our models. Many thanks to TFRC Team for providing us cloud TPUs on Tensorflow Research Cloud to train our models.
...@@ -4,28 +4,31 @@ language: tr ...@@ -4,28 +4,31 @@ language: tr
# Turkish Language Models with Huggingface's Transformers # Turkish Language Models with Huggingface's Transformers
As R&D Team at Loodos, we release cased and uncased versions of most recent language models for Turkish. More details about pretrained models and evaluations on downstream tasks can be found [here](https://github.com/Loodos/turkish-language-models) As R&D Team at Loodos, we release cased and uncased versions of most recent language models for Turkish. More details about pretrained models and evaluations on downstream tasks can be found [here (our repo)](https://github.com/Loodos/turkish-language-models).
# Turkish ELECTRA-Base (uncased) # Turkish ELECTRA-Base-discriminator (uncased)
This is ELECTRA-Base model's discriminator which has the same structure with BERT-Base trained on uncased Turkish dataset. This is ELECTRA-Base model's discriminator which has the same structure with BERT-Base trained on uncased Turkish dataset.
## Usage ## Usage
Using AutoModel and AutoTokenizer from Transformers, you can import the model as described below. Using AutoModelWithLMHead and AutoTokenizer from Transformers, you can import the model as described below.
```python ```python
from transformers import AutoModel, AutoTokenizer from transformers import AutoModel, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("loodos/electra-base-turkish-uncased", do_lower_case=False)
model = AutoModel.from_pretrained("loodos/electra-base-turkish-uncased") tokenizer = AutoTokenizer.from_pretrained("loodos/electra-base-turkish-uncased-discriminator", do_lower_case=False)
model = AutoModelWithLMHead.from_pretrained("loodos/electra-base-turkish-uncased-discriminator")
normalizer = TextNormalization() normalizer = TextNormalization()
normalized_text = normalizer(text, do_lower_case=True) normalized_text = normalizer.normalize(text, do_lower_case=True, is_turkish=True)
tokenizer.tokenize(normalized_text) tokenizer.tokenize(normalized_text)
``` ```
### Notes on Tokenizers ### Notes on Tokenizers
Currently, huggingface's tokenizers (which were written in Python) have a bug concerning letters "ı, i, I, İ" and non-ASCII Turkish specific letters. There are 2 reasons. Currently, Huggingface's tokenizers (which were written in Python) have a bug concerning letters "ı, i, I, İ" and non-ASCII Turkish specific letters. There are two reasons.
1- Vocabulary and sentence piece model is created with NFC/NFKC normalization but tokenizer uses NFD/NFKD. NFD/NFKD normalization changes text that contains Turkish characters I-ı, İ-i, Ç-ç, Ö-ö, Ş-ş, Ğ-ğ, Ü-ü. This causes wrong tokenization, wrong training and loss of information. Some tokens are never trained.(like "şanlıurfa", "öğün", "çocuk" etc.) NFD/NFKD normalization is not proper for Turkish. 1- Vocabulary and sentence piece model is created with NFC/NFKC normalization but tokenizer uses NFD/NFKD. NFD/NFKD normalization changes text that contains Turkish characters I-ı, İ-i, Ç-ç, Ö-ö, Ş-ş, Ğ-ğ, Ü-ü. This causes wrong tokenization, wrong training and loss of information. Some tokens are never trained.(like "şanlıurfa", "öğün", "çocuk" etc.) NFD/NFKD normalization is not proper for Turkish.
...@@ -39,11 +42,11 @@ respectively. However, in Turkish, 'I' and 'İ' are two different letters. ...@@ -39,11 +42,11 @@ respectively. However, in Turkish, 'I' and 'İ' are two different letters.
We opened an [issue](https://github.com/huggingface/transformers/issues/6680) in Huggingface's github repo about this bug. Until it is fixed, in case you want to train your model with uncased data, we provide a simple text normalization module (`TextNormalization()` in the code snippet above) in our [repo](https://github.com/Loodos/turkish-language-models). We opened an [issue](https://github.com/huggingface/transformers/issues/6680) in Huggingface's github repo about this bug. Until it is fixed, in case you want to train your model with uncased data, we provide a simple text normalization module (`TextNormalization()` in the code snippet above) in our [repo](https://github.com/Loodos/turkish-language-models).
# Details and Contact ## Details and Contact
You contact us to ask a question, open an issue or give feedback via our github [repo](https://github.com/Loodos/turkish-language-models). You contact us to ask a question, open an issue or give feedback via our github [repo](https://github.com/Loodos/turkish-language-models).
# Acknowledgments ## Acknowledgments
Many thanks to TFRC Team for providing us cloud TPUs on Tensorflow Research Cloud to train our models. Many thanks to TFRC Team for providing us cloud TPUs on Tensorflow Research Cloud to train our models.
...@@ -4,24 +4,29 @@ language: tr ...@@ -4,24 +4,29 @@ language: tr
# Turkish Language Models with Huggingface's Transformers # Turkish Language Models with Huggingface's Transformers
As R&D Team at Loodos, we release cased and uncased versions of most recent language models for Turkish. More details about pretrained models and evaluations on downstream tasks can be found [here](https://github.com/Loodos/turkish-language-models) As R&D Team at Loodos, we release cased and uncased versions of most recent language models for Turkish. More details about pretrained models and evaluations on downstream tasks can be found [here (our repo)](https://github.com/Loodos/turkish-language-models).
# Turkish ELECTRA-Small-discriminator (cased) # Turkish ELECTRA-Small-discriminator (cased)
This is ELECTRA-Small model's discriminator which has 12 encoder layers with 256 hidden layer size trained on cased Turkish dataset. This is ELECTRA-Small model's discriminator which has 12 encoder layers with 256 hidden layers size trained on cased Turkish dataset.
## Usage ## Usage
Using AutoModel and AutoTokenizer from Transformers, you can import the model as described below. Using AutoModelWithLMHead and AutoTokenizer from Transformers, you can import the model as described below.
```python ```python
from transformers import AutoModel, AutoTokenizer from transformers import AutoModel, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("loodos/electra-small-turkish-cased-discriminator") tokenizer = AutoTokenizer.from_pretrained("loodos/electra-small-turkish-cased-discriminator")
model = AutoModel.from_pretrained("loodos/electra-small-turkish-cased-discriminator")
model = AutoModelWithLMHead.from_pretrained("loodos/electra-small-turkish-cased-discriminator")
``` ```
# Details and Contact ## Details and Contact
You contact us to ask a question, open an issue or give feedback via our github [repo](https://github.com/Loodos/turkish-language-models). You contact us to ask a question, open an issue or give feedback via our github [repo](https://github.com/Loodos/turkish-language-models).
# Acknowledgments ## Acknowledgments
Many thanks to TFRC Team for providing us cloud TPUs on Tensorflow Research Cloud to train our models.
...@@ -4,28 +4,31 @@ language: tr ...@@ -4,28 +4,31 @@ language: tr
# Turkish Language Models with Huggingface's Transformers # Turkish Language Models with Huggingface's Transformers
As R&D Team at Loodos, we release cased and uncased versions of most recent language models for Turkish. The details about pretrained models and evaluations on downstream tasks can be found [here](https://github.com/Loodos/turkish-language-models) As R&D Team at Loodos, we release cased and uncased versions of most recent language models for Turkish. More details about pretrained models and evaluations on downstream tasks can be found [here (our repo)](https://github.com/Loodos/turkish-language-models).
# Turkish ELECTRA-Small-discriminator (uncased) # Turkish ELECTRA-Small-discriminator (uncased)
This is ELECTRA-Small model's discriminator which has 12 encoder layers with 256 hidden layer size trained on uncased Turkish dataset. Please refer to This is ELECTRA-Small model's discriminator which has 12 encoder layers with 256 hidden layer size trained on uncased Turkish dataset.
## Usage ## Usage
Using AutoModel and AutoTokenizer from Transformers, you can import the model as described below. Using AutoModelWithLMHead and AutoTokenizer from Transformers, you can import the model as described below.
```python ```python
from transformers import AutoModel, AutoTokenizer from transformers import AutoModel, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("loodos/electra-small-turkish-uncased-discriminator", do_lower_case=False) tokenizer = AutoTokenizer.from_pretrained("loodos/electra-small-turkish-uncased-discriminator", do_lower_case=False)
model = AutoModel.from_pretrained("loodos/electra-small-turkish-uncased-discriminator")
model = AutoModelWithLMHead.from_pretrained("loodos/electra-small-turkish-uncased-discriminator")
normalizer = TextNormalization() normalizer = TextNormalization()
normalized_text = normalizer(text, do_lower_case=True) normalized_text = normalizer.normalize(text, do_lower_case=True, is_turkish=True)
tokenizer.tokenize(normalized_text) tokenizer.tokenize(normalized_text)
``` ```
### Notes on Tokenizers ### Notes on Tokenizers
Currently, huggingface's tokenizers (which were written in Python) have a bug concerning letters "ı, i, I, İ" and non-ASCII Turkish specific letters. There are 2 reasons. Currently, Huggingface's tokenizers (which were written in Python) have a bug concerning letters "ı, i, I, İ" and non-ASCII Turkish specific letters. There are two reasons.
1- Vocabulary and sentence piece model is created with NFC/NFKC normalization but tokenizer uses NFD/NFKD. NFD/NFKD normalization changes text that contains Turkish characters I-ı, İ-i, Ç-ç, Ö-ö, Ş-ş, Ğ-ğ, Ü-ü. This causes wrong tokenization, wrong training and loss of information. Some tokens are never trained.(like "şanlıurfa", "öğün", "çocuk" etc.) NFD/NFKD normalization is not proper for Turkish. 1- Vocabulary and sentence piece model is created with NFC/NFKC normalization but tokenizer uses NFD/NFKD. NFD/NFKD normalization changes text that contains Turkish characters I-ı, İ-i, Ç-ç, Ö-ö, Ş-ş, Ğ-ğ, Ü-ü. This causes wrong tokenization, wrong training and loss of information. Some tokens are never trained.(like "şanlıurfa", "öğün", "çocuk" etc.) NFD/NFKD normalization is not proper for Turkish.
...@@ -39,11 +42,11 @@ respectively. However, in Turkish, 'I' and 'İ' are two different letters. ...@@ -39,11 +42,11 @@ respectively. However, in Turkish, 'I' and 'İ' are two different letters.
We opened an [issue](https://github.com/huggingface/transformers/issues/6680) in Huggingface's github repo about this bug. Until it is fixed, in case you want to train your model with uncased data, we provide a simple text normalization module (`TextNormalization()` in the code snippet above) in our [repo](https://github.com/Loodos/turkish-language-models). We opened an [issue](https://github.com/huggingface/transformers/issues/6680) in Huggingface's github repo about this bug. Until it is fixed, in case you want to train your model with uncased data, we provide a simple text normalization module (`TextNormalization()` in the code snippet above) in our [repo](https://github.com/Loodos/turkish-language-models).
# Details and Contact ## Details and Contact
You contact us to ask a question, open an issue or give feedback via our github [repo](https://github.com/Loodos/turkish-language-models). You contact us to ask a question, open an issue or give feedback via our github [repo](https://github.com/Loodos/turkish-language-models).
# Acknowledgments ## Acknowledgments
Many thanks to TFRC Team for providing us cloud TPUs on Tensorflow Research Cloud to train our models. Many thanks to TFRC Team for providing us cloud TPUs on Tensorflow Research Cloud to train our models.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment