[chatgpt] change critic input as state (#3042)

* fix Critic * fix Critic * fix Critic * fix neglect of attention mask * fix neglect of attention mask * fix neglect of attention mask * add return --------- Co-authored-by: yangwenjun <yangwenjun@soyoung.com> Co-authored-by: yangwjd <yangwjd@chanjet.com>

[chatgpt] change critic input as state (#3042)
* fix Critic * fix Critic * fix Critic * fix neglect of attention mask * fix neglect of attention mask * fix neglect of attention mask * add return --------- Co-authored-by: yangwenjun <yangwenjun@soyoung.com> Co-authored-by: yangwjd <yangwjd@chanjet.com>
b51bfec3 · wenjunyang · GitHub · 2ef855c7 · b51bfec3
Unverified Commit b51bfec3 authored Mar 08, 2023 by wenjunyang Committed by GitHub Mar 08, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 6 additions and 3 deletions

applications/ChatGPT/chatgpt/models/base/critic.py applications/ChatGPT/chatgpt/models/base/critic.py +6 -3

No files found.
--- a/applications/ChatGPT/chatgpt/models/base/critic.py
+++ b/applications/ChatGPT/chatgpt/models/base/critic.py
@@ -36,12 +36,15 @@ class Critic(LoRAModule):
        outputs = self.model(sequences, attention_mask=attention_mask)
        last_hidden_states = outputs['last_hidden_state']
-        values = self.value_head(last_hidden_states).squeeze(-1)[:, :-1]
+        values = self.value_head(last_hidden_states).squeeze(-1)
        if action_mask is not None:
            num_actions = action_mask.size(1)
-            values = values[:, -num_actions:]
+            prompt_mask = attention_mask[:, :-num_actions]
-            value = masked_mean(values, action_mask, dim=1)
+            values = values[:, :-num_actions]
+            value = masked_mean(values, prompt_mask, dim=1)
            return value
+        values = values[:, :-1]
        value = values.mean(dim=1).squeeze(1)
        return value