Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
d7b3bf54
Commit
d7b3bf54
authored
Apr 27, 2020
by
monologg
Committed by
Julien Chaumond
Apr 27, 2020
Browse files
Model cards for KoELECTRA
parent
db9d56c0
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
194 additions
and
0 deletions
+194
-0
model_cards/monologg/koelectra-base-discriminator/README.md
model_cards/monologg/koelectra-base-discriminator/README.md
+52
-0
model_cards/monologg/koelectra-base-generator/README.md
model_cards/monologg/koelectra-base-generator/README.md
+45
-0
model_cards/monologg/koelectra-small-discriminator/README.md
model_cards/monologg/koelectra-small-discriminator/README.md
+52
-0
model_cards/monologg/koelectra-small-generator/README.md
model_cards/monologg/koelectra-small-generator/README.md
+45
-0
No files found.
model_cards/monologg/koelectra-base-discriminator/README.md
0 → 100644
View file @
d7b3bf54
---
language
:
Korean
---
# KoELECTRA (Base Discriminator)
Pretrained ELECTRA Language Model for Korean (
`koelectra-base-discriminator`
)
For more detail, please see
[
original repository
](
https://github.com/monologg/KoELECTRA/blob/master/README_EN.md
)
.
## Usage
### Load model and tokenizer
```
python
>>>
from
transformers
import
ElectraModel
,
ElectraTokenizer
>>>
model
=
ElectraModel
.
from_pretrained
(
"monologg/koelectra-base-discriminator"
)
>>>
tokenizer
=
ElectraTokenizer
.
from_pretrained
(
"monologg/koelectra-base-discriminator"
)
```
### Tokenizer example
```
python
>>>
from
transformers
import
ElectraTokenizer
>>>
tokenizer
=
ElectraTokenizer
.
from_pretrained
(
"monologg/koelectra-base-discriminator"
)
>>>
tokenizer
.
tokenize
(
"[CLS] 한국어 ELECTRA를 공유합니다. [SEP]"
)
[
'[CLS]'
,
'한국어'
,
'E'
,
'##L'
,
'##EC'
,
'##T'
,
'##RA'
,
'##를'
,
'공유'
,
'##합니다'
,
'.'
,
'[SEP]'
]
>>>
tokenizer
.
convert_tokens_to_ids
([
'[CLS]'
,
'한국어'
,
'E'
,
'##L'
,
'##EC'
,
'##T'
,
'##RA'
,
'##를'
,
'공유'
,
'##합니다'
,
'.'
,
'[SEP]'
])
[
2
,
18429
,
41
,
6240
,
15229
,
6204
,
20894
,
5689
,
12622
,
10690
,
18
,
3
]
```
## Example using ElectraForPreTraining
```
python
import
torch
from
transformers
import
ElectraForPreTraining
,
ElectraTokenizer
discriminator
=
ElectraForPreTraining
.
from_pretrained
(
"monologg/koelectra-base-discriminator"
)
tokenizer
=
ElectraTokenizer
.
from_pretrained
(
"monologg/koelectra-base-discriminator"
)
sentence
=
"나는 방금 밥을 먹었다."
fake_sentence
=
"나는 내일 밥을 먹었다."
fake_tokens
=
tokenizer
.
tokenize
(
fake_sentence
)
fake_inputs
=
tokenizer
.
encode
(
fake_sentence
,
return_tensors
=
"pt"
)
discriminator_outputs
=
discriminator
(
fake_inputs
)
predictions
=
torch
.
round
((
torch
.
sign
(
discriminator_outputs
[
0
])
+
1
)
/
2
)
print
(
list
(
zip
(
fake_tokens
,
predictions
.
tolist
()[
1
:
-
1
])))
```
model_cards/monologg/koelectra-base-generator/README.md
0 → 100644
View file @
d7b3bf54
---
language
:
Korean
---
# KoELECTRA (Base Generator)
Pretrained ELECTRA Language Model for Korean (
`koelectra-base-generator`
)
For more detail, please see
[
original repository
](
https://github.com/monologg/KoELECTRA/blob/master/README_EN.md
)
.
## Usage
### Load model and tokenizer
```
python
>>>
from
transformers
import
ElectraModel
,
ElectraTokenizer
>>>
model
=
ElectraModel
.
from_pretrained
(
"monologg/koelectra-base-generator"
)
>>>
tokenizer
=
ElectraTokenizer
.
from_pretrained
(
"monologg/koelectra-base-generator"
)
```
### Tokenizer example
```
python
>>>
from
transformers
import
ElectraTokenizer
>>>
tokenizer
=
ElectraTokenizer
.
from_pretrained
(
"monologg/koelectra-base-generator"
)
>>>
tokenizer
.
tokenize
(
"[CLS] 한국어 ELECTRA를 공유합니다. [SEP]"
)
[
'[CLS]'
,
'한국어'
,
'E'
,
'##L'
,
'##EC'
,
'##T'
,
'##RA'
,
'##를'
,
'공유'
,
'##합니다'
,
'.'
,
'[SEP]'
]
>>>
tokenizer
.
convert_tokens_to_ids
([
'[CLS]'
,
'한국어'
,
'E'
,
'##L'
,
'##EC'
,
'##T'
,
'##RA'
,
'##를'
,
'공유'
,
'##합니다'
,
'.'
,
'[SEP]'
])
[
2
,
18429
,
41
,
6240
,
15229
,
6204
,
20894
,
5689
,
12622
,
10690
,
18
,
3
]
```
## Example using ElectraForMaskedLM
```
python
from
transformers
import
pipeline
fill_mask
=
pipeline
(
"fill-mask"
,
model
=
"monologg/koelectra-base-generator"
,
tokenizer
=
"monologg/koelectra-base-generator"
)
print
(
fill_mask
(
"나는 {} 밥을 먹었다."
.
format
(
fill_mask
.
tokenizer
.
mask_token
)))
```
model_cards/monologg/koelectra-small-discriminator/README.md
0 → 100644
View file @
d7b3bf54
---
language
:
Korean
---
# KoELECTRA (Small Discriminator)
Pretrained ELECTRA Language Model for Korean (
`koelectra-small-discriminator`
)
For more detail, please see
[
original repository
](
https://github.com/monologg/KoELECTRA/blob/master/README_EN.md
)
.
## Usage
### Load model and tokenizer
```
python
>>>
from
transformers
import
ElectraModel
,
ElectraTokenizer
>>>
model
=
ElectraModel
.
from_pretrained
(
"monologg/koelectra-small-discriminator"
)
>>>
tokenizer
=
ElectraTokenizer
.
from_pretrained
(
"monologg/koelectra-small-discriminator"
)
```
### Tokenizer example
```
python
>>>
from
transformers
import
ElectraTokenizer
>>>
tokenizer
=
ElectraTokenizer
.
from_pretrained
(
"monologg/koelectra-small-discriminator"
)
>>>
tokenizer
.
tokenize
(
"[CLS] 한국어 ELECTRA를 공유합니다. [SEP]"
)
[
'[CLS]'
,
'한국어'
,
'E'
,
'##L'
,
'##EC'
,
'##T'
,
'##RA'
,
'##를'
,
'공유'
,
'##합니다'
,
'.'
,
'[SEP]'
]
>>>
tokenizer
.
convert_tokens_to_ids
([
'[CLS]'
,
'한국어'
,
'E'
,
'##L'
,
'##EC'
,
'##T'
,
'##RA'
,
'##를'
,
'공유'
,
'##합니다'
,
'.'
,
'[SEP]'
])
[
2
,
18429
,
41
,
6240
,
15229
,
6204
,
20894
,
5689
,
12622
,
10690
,
18
,
3
]
```
## Example using ElectraForPreTraining
```
python
import
torch
from
transformers
import
ElectraForPreTraining
,
ElectraTokenizer
discriminator
=
ElectraForPreTraining
.
from_pretrained
(
"monologg/koelectra-small-discriminator"
)
tokenizer
=
ElectraTokenizer
.
from_pretrained
(
"monologg/koelectra-small-discriminator"
)
sentence
=
"나는 방금 밥을 먹었다."
fake_sentence
=
"나는 내일 밥을 먹었다."
fake_tokens
=
tokenizer
.
tokenize
(
fake_sentence
)
fake_inputs
=
tokenizer
.
encode
(
fake_sentence
,
return_tensors
=
"pt"
)
discriminator_outputs
=
discriminator
(
fake_inputs
)
predictions
=
torch
.
round
((
torch
.
sign
(
discriminator_outputs
[
0
])
+
1
)
/
2
)
print
(
list
(
zip
(
fake_tokens
,
predictions
.
tolist
()[
1
:
-
1
])))
```
model_cards/monologg/koelectra-small-generator/README.md
0 → 100644
View file @
d7b3bf54
---
language
:
Korean
---
# KoELECTRA (Small Generator)
Pretrained ELECTRA Language Model for Korean (
`koelectra-small-generator`
)
For more detail, please see
[
original repository
](
https://github.com/monologg/KoELECTRA/blob/master/README_EN.md
)
.
## Usage
### Load model and tokenizer
```
python
>>>
from
transformers
import
ElectraModel
,
ElectraTokenizer
>>>
model
=
ElectraModel
.
from_pretrained
(
"monologg/koelectra-small-generator"
)
>>>
tokenizer
=
ElectraTokenizer
.
from_pretrained
(
"monologg/koelectra-small-generator"
)
```
### Tokenizer example
```
python
>>>
from
transformers
import
ElectraTokenizer
>>>
tokenizer
=
ElectraTokenizer
.
from_pretrained
(
"monologg/koelectra-small-generator"
)
>>>
tokenizer
.
tokenize
(
"[CLS] 한국어 ELECTRA를 공유합니다. [SEP]"
)
[
'[CLS]'
,
'한국어'
,
'E'
,
'##L'
,
'##EC'
,
'##T'
,
'##RA'
,
'##를'
,
'공유'
,
'##합니다'
,
'.'
,
'[SEP]'
]
>>>
tokenizer
.
convert_tokens_to_ids
([
'[CLS]'
,
'한국어'
,
'E'
,
'##L'
,
'##EC'
,
'##T'
,
'##RA'
,
'##를'
,
'공유'
,
'##합니다'
,
'.'
,
'[SEP]'
])
[
2
,
18429
,
41
,
6240
,
15229
,
6204
,
20894
,
5689
,
12622
,
10690
,
18
,
3
]
```
## Example using ElectraForMaskedLM
```
python
from
transformers
import
pipeline
fill_mask
=
pipeline
(
"fill-mask"
,
model
=
"monologg/koelectra-small-generator"
,
tokenizer
=
"monologg/koelectra-small-generator"
)
print
(
fill_mask
(
"나는 {} 밥을 먹었다."
.
format
(
fill_mask
.
tokenizer
.
mask_token
)))
```
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment