"...lm-evaluation-harness.git" did not exist on "fc329d319afc263fb6d600663944d3b24944f0de"
  • Cyril Vallez's avatar
    Reduce by 2 the memory requirement in `generate()` 馃敟馃敟馃敟 (#30536) · bd5091df
    Cyril Vallez authored
    * Fix contrastive_search for new cache structure, and improve performance by removing inneficient torch.stack(torch.split(x, top_k, dim=0))
    
    * Fix _contrastive_search for non-standard cache using ellipsis slicing
    
    * Fix all outputs.logits memory leaks for all decoding strategies!
    
    * Fix small error in _contrastive_search()
    
    * Make all necessary change and revert for the new class
    
    * Apply coding style
    
    * Remove pipes in type hints for compatibility
    
    * correct type hint
    
    * apply style
    
    * Use DynamicCache by default and solve conflicts
    
    * Fix rebase issues
    
    * Add `_supports_dynamic_cache_class` in models for models that support DynamicCache but not other caches to make DynamicCache the default for more models
    
    * Create generation config to return legacy format by default, or to choose not to
    
    * style
    
    * Fix case when use_cache is False
    
    * Remove default DynamicCache in assiste_decoding if assistant_model does not support it + fix _seen_tokens when cropping cache
    
    * Update prepare_inputs_for_generation() for case with empty DynamicCache
    
    * Correct return of args in _assisted_decoding
    
    * Remove EfficientDynamicCache as it is no longer needed
    
    * Correct mistake in generation config
    
    * Move cache logic of assisted decoding to AssistedCandidateGenerator.__init__
    
    * change DynamicCache function names from "split" to "batch_split" for readability + apply coding style
    
    * Remove `_supports_dynamic_cache_class` attribute after rebase
    
    * Correct missing line lost in conflict resolution during rebasing
    
    * Add special case for Jamba
    
    * Fix jamba test
    
    * Coding style
    
    * coding style
    
    * Correct missing import in rebasing
    
    * Simplify _validate_model_kwargs based on removal of _supports_dynamic_cache attribute
    
    * Simplify code paths in _contrastive_search
    
    * coding style
    
    * Update docstrings of cache methods
    
    * Update prepare_inputs_for_generation() -> past_key_values are always Cache objects
    bd5091df
modeling_persimmon.py 49.4 KB