• Nicolas Patry's avatar
    Optimizing away the `fill-mask` pipeline. (#12113) · d4be4984
    Nicolas Patry authored
    
    
    * Optimizing away the `fill-mask` pipeline.
    
    - Don't send anything to the tokenizer unless needed. Vocab check is
    much faster
    - Keep BC by sending data to the tokenizer when needed. User handling warning messages will see performance benefits again
    - Make `targets` and `top_k` work together better `top_k` cannot be
    higher than `len(targets)` but can be smaller still.
    - Actually simplify the `target_ids` in case of duplicate (it can happen
    because we're parsing raw strings)
    - Removed useless code to fail on empty strings. It works only if empty
    string is in first position, moved to ignoring them instead.
    - Changed the related tests as only the tests would fail correctly
    (having incorrect value in first position)
    
    * Make tests compatible for 2 different vocabs... (at the price of a
    warning).
    
    Co-authored-by: @EtaoinWu
    
    * ValueError working globally
    
    * Update src/transformers/pipelines/fill_mask.py
    Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
    
    * `tokenizer.vocab` -> `tokenizer.get_vocab()` for more compatiblity +
    fallback.
    Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
    d4be4984
test_pipelines_fill_mask.py 11 KB