Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
e5c78c66
"docs/vscode:/vscode.git/clone" did not exist on "500be01c5d53af1f4dc6e20430f4591239a6281b"
Commit
e5c78c66
authored
Jan 10, 2019
by
thomwolf
Browse files
update readme and few typos
parent
fa5222c2
Changes
3
Show whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
7 additions
and
7 deletions
+7
-7
README.md
README.md
+4
-4
examples/extract_features.py
examples/extract_features.py
+2
-2
pytorch_pretrained_bert/modeling.py
pytorch_pretrained_bert/modeling.py
+1
-1
No files found.
README.md
View file @
e5c78c66
# PyTorch Pretrained Bert
-
PyTorch Pretrained OpenAI GPT
# PyTorch Pretrained Bert
(also with
PyTorch Pretrained OpenAI GPT
)
[

](https://circleci.com/gh/huggingface/pytorch-pretrained-BERT)
[

](https://circleci.com/gh/huggingface/pytorch-pretrained-BERT)
...
@@ -125,18 +125,18 @@ from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM
...
@@ -125,18 +125,18 @@ from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Tokenized input
# Tokenized input
text = "Who was Jim Henson ? Jim Henson was a puppeteer"
text = "
[CLS]
Who was Jim Henson ?
[SEP]
Jim Henson was a puppeteer
[SEP]
"
tokenized_text = tokenizer.tokenize(text)
tokenized_text = tokenizer.tokenize(text)
# Mask a token that we will try to predict back with `BertForMaskedLM`
# Mask a token that we will try to predict back with `BertForMaskedLM`
masked_index = 6
masked_index = 6
tokenized_text[masked_index] = '[MASK]'
tokenized_text[masked_index] = '[MASK]'
assert tokenized_text == ['who', 'was', 'jim', 'henson', '?', 'jim', '[MASK]', 'was', 'a', 'puppet', '##eer']
assert tokenized_text == [
'[CLS]',
'who', 'was', 'jim', 'henson', '?',
'[SEP]',
'jim', '[MASK]', 'was', 'a', 'puppet', '##eer'
, '[SEP]'
]
# Convert token to vocabulary indices
# Convert token to vocabulary indices
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
segments_ids = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
segments_ids = [0, 0, 0, 0, 0,
0, 0, 1,
1, 1, 1, 1, 1, 1]
# Convert inputs to PyTorch tensors
# Convert inputs to PyTorch tensors
tokens_tensor = torch.tensor([indexed_tokens])
tokens_tensor = torch.tensor([indexed_tokens])
...
...
examples/extract_features.py
View file @
e5c78c66
pytorch_pretrained_bert/modeling.py
View file @
e5c78c66
...
@@ -584,7 +584,7 @@ class BertModel(BertPreTrainedModel):
...
@@ -584,7 +584,7 @@ class BertModel(BertPreTrainedModel):
to the last attention block of shape [batch_size, sequence_length, hidden_size],
to the last attention block of shape [batch_size, sequence_length, hidden_size],
`pooled_output`: a torch.FloatTensor of size [batch_size, hidden_size] which is the output of a
`pooled_output`: a torch.FloatTensor of size [batch_size, hidden_size] which is the output of a
classifier pretrained on top of the hidden state associated to the first character of the
classifier pretrained on top of the hidden state associated to the first character of the
input (`CL
F
`) to train on the Next-Sentence task (see BERT's paper).
input (`CL
S
`) to train on the Next-Sentence task (see BERT's paper).
Example usage:
Example usage:
```python
```python
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment