fix(router): Include special tokens when tokenizing (#14)

There's currently a discrepancy in the tokenization between the router and python server code. The latter includes special tokens but former does not. This results in a token count mismatch for seq2seq models such as mt0 where the tokenizer emits an EOS token at the end. This in turn results in some unexpected/incorrect output, in particular when batch concatenation is involved, because the python code uses the input length passed from the router for each row. As far as I can tell, it is better to include this token in the encoder `input_ids`, so I guess it's best to just adjust on the router side.

fix(router): Include special tokens when tokenizing (#14)
There's currently a discrepancy in the tokenization between the router and python server code. The latter includes special tokens but former does not. This results in a token count mismatch for seq2seq models such as mt0 where the tokenizer emits an EOS token at the end. This in turn results in some unexpected/incorrect output, in particular when batch concatenation is involved, because the python code uses the input length passed from the router for each row. As far as I can tell, it is better to include this token in the encoder `input_ids`, so I guess it's best to just adjust on the router side.
3efa5bbb · Nick Hill · GitHub · 686cc667 · 3efa5bbb
Unverified Commit 3efa5bbb authored Dec 30, 2022 by Nick Hill Committed by GitHub Dec 30, 2022
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

router/src/validation.rs router/src/validation.rs +1 -1

No files found.
--- a/router/src/validation.rs
+++ b/router/src/validation.rs
@@ -131,7 +131,7 @@ fn validation_worker(
        }
        // Get the number of tokens in the input
-        match tokenizer.encode(request.inputs.clone(), false) {
+        match tokenizer.encode(request.inputs.clone(), true) {
            Ok(inputs) => {
                let input_length = inputs.len();