• Nicolas Patry's avatar
    Warning about too long input for fast tokenizers too (#8799) · a8c3f9aa
    Nicolas Patry authored
    * Warning about too long input for fast tokenizers too
    
    If truncation is not set in tokenizers, but the tokenization is too long
    for the model (`model_max_length`), we used to trigger a warning that
    
    The input would probably fail (which it most likely will).
    
    This PR re-enables the warning for fast tokenizers too and uses common
    code for the trigger to make sure it's consistent across.
    
    * Checking for pair of inputs too.
    
    * Making the function private and adding it's doc.
    
    * Remove formatting ?? in odd place.
    
    * Missed uppercase.
    a8c3f9aa
test_tokenization_common.py 141 KB