fix(language): remove invalid UTF-16 surrogate pairs from input text
- Add `remove_invalid_surrogates` function to filter out invalid UTF-16 surrogate pairs - Integrate the new function into the `detect_lang` workflow - Include a test case with UTF-16 surrogates to verify the fix
Showing
Please register or sign in to comment