• pdr's avatar
    Benchmarks - Add LLaMA-2 Models (#668) · 249e21c1
    pdr authored
    Added llama benchmark - training and inference in accordance with the
    existing pytorch models implementation like gpt2, lstm etc.
    
    - added llama fp8 unit test for better code coverage, to reduce memory
    required
    - updated transformers version >= 4.28.0 for LLamaConfig
    - set tokenizers version <= 0.20.3 to avoid 0.20.4 version
    [issues](https://github.com/huggingface/tokenizers/issues/1691
    
    ) with
    py3.8
    - added llama2 to tensorrt
    - llama2 tests not added to test_tensorrt_inference_performance.py due
    to large memory requirement for worker gpu. tests validated separately
    on gh200
    
    ---------
    Co-authored-by: default avatardpatlolla <dpatlolla@microsoft.com>
    249e21c1
setup.py 7.15 KB