Commit 906b638e authored by thomwolf's avatar thomwolf
Browse files

updating readme

parent 994d8660
...@@ -739,8 +739,8 @@ all_hidden_states = lower_hidden_states + [hidden_states] ...@@ -739,8 +739,8 @@ all_hidden_states = lower_hidden_states + [hidden_states]
*Outputs* a tuple of (last_hidden_state, new_mems) *Outputs* a tuple of (last_hidden_state, new_mems)
- `softmax_output`: output of the (adaptive) softmax: - `softmax_output`: output of the (adaptive) softmax:
- if target is None: Negative log likelihood of shape [batch_size, sequence_length] - if target is None: log probabilities of tokens, shape [batch_size, sequence_length, n_tokens]
- else: log probabilities of tokens, shape [batch_size, sequence_length, n_tokens] - else: Negative log likelihood of target tokens with shape [batch_size, sequence_length]
- `new_mems`: list (num layers) of updated mem states at the entry of each layer each mem state is a torch.FloatTensor of size [self.config.mem_len, batch_size, self.config.d_model]. Note that the first two dimensions are transposed in `mems` with regards to `input_ids`. - `new_mems`: list (num layers) of updated mem states at the entry of each layer each mem state is a torch.FloatTensor of size [self.config.mem_len, batch_size, self.config.d_model]. Note that the first two dimensions are transposed in `mems` with regards to `input_ids`.
#### 14. `GPT2Model` #### 14. `GPT2Model`
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment