Commit fc5fb09f authored by Devon Rifkin's avatar Devon Rifkin
Browse files

model: fix boundary in bpe

0x007e is a tilde and was getting adjusted (+0x00a2) to 0x0120 in the
encode, but then in the decode it was getting adjusted down (-0x0100) to
0x0020. The boundary for the +0x00a2 case has been adjusted to fix this

Fixes: #11966
parent 048bd447
...@@ -109,7 +109,7 @@ func (bpe BytePairEncoding) Encode(s string, addSpecial bool) ([]int32, error) { ...@@ -109,7 +109,7 @@ func (bpe BytePairEncoding) Encode(s string, addSpecial bool) ([]int32, error) {
r = 0x0143 r = 0x0143
case r <= 0x0020: case r <= 0x0020:
r = r + 0x0100 r = r + 0x0100
case r >= 0x007e && r <= 0x00a0: case r >= 0x007f && r <= 0x00a0:
r = r + 0x00a2 r = r + 0x00a2
} }
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment