1. 01 Jul, 2023 3 commits
  2. 30 Jun, 2023 3 commits
  3. 29 Jun, 2023 4 commits
  4. 28 Jun, 2023 2 commits
    • tpoisonooo's avatar
      feat(src): add kv cache int8 quantization (#22) · cc93136e
      tpoisonooo authored
      * feat(src): add int8 and compile passed
      
      * feat(kernels): fix
      
      * feat(llama): update kernel
      
      * feat(src): add debug
      
      * fix(kernel): k_cache use int8_t pointer
      
      * style(llama): clean code
      
      * feat(deploy.py): revert to enable fmha
      
      * style(LlamaV2): clean code
      
      * feat(deploy.py): add default quant policy
      cc93136e
    • Li Zhang's avatar
      fix-gemm-tuning (#24) · 4d42a781
      Li Zhang authored
      4d42a781
  5. 26 Jun, 2023 1 commit
  6. 25 Jun, 2023 4 commits
    • tpoisonooo's avatar
      style(doc): README.md · 93604c3f
      tpoisonooo authored
      93604c3f
    • tpoisonooo's avatar
      fix(deploy.py): qkv no bias assertion · e0c7f51b
      tpoisonooo authored
      e0c7f51b
    • tpoisonooo's avatar
      Update requirements.txt · 1b7151c1
      tpoisonooo authored
      1b7151c1
    • lvhan028's avatar
      Add profile (#15) · 23c05372
      lvhan028 authored
      * remove constraints on model name
      
      * remove duplicate model converter
      
      * add profile
      
      * get eos and bos from server
      
      * update stop_words
      
      * update sequence_length when the last generated token is eos_id
      
      * fix
      
      * fix
      
      * check-in models
      
      * valicate model_name
      
      * make stop_words as property
      
      * debug profiling
      
      * better stats
      
      * fix assistant reponse
      
      * update profile serving
      
      * update
      
      * update
      23c05372
  7. 24 Jun, 2023 1 commit
  8. 22 Jun, 2023 2 commits
  9. 21 Jun, 2023 3 commits
  10. 20 Jun, 2023 4 commits
  11. 18 Jun, 2023 4 commits