• Apoorv Garg's avatar
    Correct order of overflowing_tokens for slow tokenizer (#13179) · b91e65af
    Apoorv Garg authored
    * correct order of overflowing_tokens for slow tokenizer (issue fix #13148)
    
    * python 3.9 requires sentencepiece version 0.1.94 or above
    
    * slicing of ids fixed in truncated_sequence()
    
    * Update setup.py
    
    * Correct order of overflowing tokens for pair of sentences
    
    * code reformatted
    
    * Update tokenization_utils_base.py
    
    * reformatting file
    
    * test to check single_input added
    
    * missing function restored
    
    * test to check pair_input overflowing tokens order
    
    * test to check pair_input overflowing tokens order
    
    * test to check pair_input overflowing tokens order
    
    * added an error message for pair of seq and longest_first strategy
    
    * test for pair_input modified
    
    * variable name corrected
    
    * fixed a typo in error message
    
    * requested changes implemented
    
    * required test added
    
    * Corrected the message to match test message
    
    * added error message for Luke Tokenizer
    
    * lost test recovered
    
    * docstring for truncate_sequences and prepare_for_model updated
    
    * docstring for luke tokenizer updated
    
    * updated ENCODE_PLUS_ADDITIONAL_KWARGS_DOCSTRING
    
    * aligned text and fixed puncuatations
    
    * improved style and quality of code
    
    * fixed error_msg in truncate_sequences
    
    * replaced encode_plus method with regular call method
    
    * clean up
    
    * rephrased the docstring
    b91e65af
test_tokenization_common.py 176 KB