1. 29 Nov, 2023 1 commit
  2. 28 Nov, 2023 4 commits
  3. 27 Nov, 2023 1 commit
  4. 24 Nov, 2023 1 commit
  5. 23 Nov, 2023 2 commits
    • Xuanlei Zhao's avatar
      remove duplicate import (#5100) · 68fcaa22
      Xuanlei Zhao authored
      68fcaa22
    • YeAnbang's avatar
      [Feature] Add document retrieval QA (#5020) · e53e729d
      YeAnbang authored
      
      
      * add langchain
      
      * add langchain
      
      * Add files via upload
      
      * add langchain
      
      * fix style
      
      * fix style: remove extra space
      
      * add pytest; modified retriever
      
      * add pytest; modified retriever
      
      * add tests to build_on_pr.yml
      
      * fix build_on_pr.yml
      
      * fix build on pr; fix environ vars
      
      * seperate unit tests for colossalqa from build from pr
      
      * fix container setting; fix environ vars
      
      * commented dev code
      
      * add incremental update
      
      * remove stale code
      
      * fix style
      
      * change to sha3 224
      
      * fix retriever; fix style; add unit test for document loader
      
      * fix ci workflow config
      
      * fix ci workflow config
      
      * add set cuda visible device script in ci
      
      * fix doc string
      
      * fix style; update readme; refactored
      
      * add force log info
      
      * change build on pr, ignore colossalqa
      
      * fix docstring, captitalize all initial letters
      
      * fix indexing; fix text-splitter
      
      * remove debug code, update reference
      
      * reset previous commit
      
      * update LICENSE update README add key-value mode, fix bugs
      
      * add files back
      
      * revert force push
      
      * remove junk file
      
      * add test files
      
      * fix retriever bug, add intent classification
      
      * change conversation chain design
      
      * rewrite prompt and conversation chain
      
      * add ui v1
      
      * ui v1
      
      * fix atavar
      
      * add header
      
      * Refactor the RAG Code and support Pangu
      
      * Refactor the ColossalQA chain to Object-Oriented Programming and the UI demo.
      
      * resolved conversation. tested scripts under examples. web demo still buggy
      
      * fix ci tests
      
      * Some modifications to add ChatGPT api
      
      * modify llm.py and remove unnecessary files
      
      * Delete applications/ColossalQA/examples/ui/test_frontend_input.json
      
      * Remove OpenAI api key
      
      * add colossalqa
      
      * move files
      
      * move files
      
      * move files
      
      * move files
      
      * fix style
      
      * Add Readme and fix some bugs.
      
      * Add something to readme and modify some code
      
      * modify a directory name for clarity
      
      * remove redundant directory
      
      * Correct a type in  llm.py
      
      * fix AI prefix
      
      * fix test_memory.py
      
      * fix conversation
      
      * fix some erros and typos
      
      * Fix a missing import in RAG_ChatBot.py
      
      * add colossalcloud LLM wrapper, correct issues in code review
      
      ---------
      Co-authored-by: default avatarYeAnbang <anbangy2@outlook.com>
      Co-authored-by: default avatarOrion-Zheng <zheng_zian@u.nus.edu>
      Co-authored-by: default avatarZian(Andy) Zheng <62330719+Orion-Zheng@users.noreply.github.com>
      Co-authored-by: default avatarOrion-Zheng <zhengzian@u.nus.edu>
      e53e729d
  6. 22 Nov, 2023 5 commits
  7. 21 Nov, 2023 3 commits
  8. 20 Nov, 2023 7 commits
  9. 19 Nov, 2023 1 commit
    • Xu Kai's avatar
      [inference] Refactor inference architecture (#5057) · fd6482ad
      Xu Kai authored
      
      
      * [inference] support only TP (#4998)
      
      * support only tp
      
      * enable tp
      
      * add support for bloom (#5008)
      
      * [refactor] refactor gptq and smoothquant llama (#5012)
      
      * refactor gptq and smoothquant llama
      
      * fix import error
      
      * fix linear import torch-int
      
      * fix smoothquant llama import error
      
      * fix import accelerate error
      
      * fix bug
      
      * fix import smooth cuda
      
      * fix smoothcuda
      
      * [Inference Refactor] Merge chatglm2 with pp and tp (#5023)
      
      merge chatglm with pp and tp
      
      * [Refactor] remove useless inference code (#5022)
      
      * remove useless code
      
      * fix quant model
      
      * fix test import bug
      
      * mv original inference legacy
      
      * fix chatglm2
      
      * [Refactor] refactor policy search and quant type controlling in inference (#5035)
      
      * [Refactor] refactor policy search and quant type controling in inference
      
      * [inference] update readme (#5051)
      
      * update readme
      
      * update readme
      
      * fix architecture
      
      * fix table
      
      * fix table
      
      * [inference] udpate example (#5053)
      
      * udpate example
      
      * fix run.sh
      
      * fix rebase bug
      
      * fix some errors
      
      * update readme
      
      * add some features
      
      * update interface
      
      * update readme
      
      * update benchmark
      
      * add requirements-infer
      
      ---------
      Co-authored-by: default avatarBin Jia <45593998+FoolPlayer@users.noreply.github.com>
      Co-authored-by: default avatarZhongkai Zhao <kanezz620@gmail.com>
      fd6482ad
  10. 18 Nov, 2023 1 commit
  11. 17 Nov, 2023 1 commit
  12. 16 Nov, 2023 4 commits
    • flybird11111's avatar
      [shardformer] fix llama error when transformers upgraded. (#5055) · 97cd0cd5
      flybird11111 authored
      * fix-llama
      
      * Update llama.py
      97cd0cd5
    • flybird11111's avatar
      [gemini] gemini support extra-dp (#5043) · 3e021547
      flybird11111 authored
      * support ddp
      
      * fix
      
      * fix
      
      * fix
      
      fix
      
      * support ddp
      
      * fix
      
      * fix
      
      * fix
      
      fix
      
      * simplify tests
      
      * fix
      
      * fix
      
      * fix
      
      fix
      
      fix
      
      * fix
      3e021547
    • Elsa Granger's avatar
      [pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading... · b2ad0d9e
      Elsa Granger authored
      
      [pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when `strict=False`, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017)
      
      * Use p2p
      
      * Cannot bidirectonal send p2p
      
      * Refactor tensor creation and serialization in P2P
      communication
      
      * Fix llama forward args in flash attention
      
      * Add flop estimate from megatron
      
      * Support loading weight not in weight_map when strict=False in hybrid_parallel
      
      * Use send_forward_recv_backward, etc in 1f1b
      
      * Use dataclass for metdata
      Remove torch.cuda.synchronize() as suggested
      
      * Add comment about the torch.cuda.synchronize for potential error
      
      * Typo
      
      * Update hybrid_parallel_checkpoint_io.py
      
      * Update p2p.py
      
      * Update one_f_one_b.py
      
      * Update p2p.py
      
      ---------
      Co-authored-by: default avatarflybird11111 <1829166702@qq.com>
      b2ad0d9e
    • Cuiqing Li (李崔卿)'s avatar
      [Kernels]Update triton kernels into 2.1.0 (#5046) · 28052a71
      Cuiqing Li (李崔卿) authored
      
      
      * update flash-context-attention
      
      * adding kernels
      
      * fix
      
      * reset
      
      * add build script
      
      * add building process
      
      * add llama2 exmaple
      
      * add colossal-llama2 test
      
      * clean
      
      * fall back test setting
      
      * fix test file
      
      * clean
      
      * clean
      
      * clean
      
      ---------
      Co-authored-by: default avatarcuiqing.li <lixx336@gmail.com>
      28052a71
  13. 14 Nov, 2023 1 commit
  14. 10 Nov, 2023 2 commits
    • Zhongkai Zhao's avatar
      [hotfix] Suport extra_kwargs in ShardConfig (#5031) · 70885d70
      Zhongkai Zhao authored
      * [refactor]: replace inference args with extra_kwargs in ShardConfig
      
      * modify shardconfig
      
      * polish code
      
      * fix policy bug in llama
      
      * fix bug in auto policy
      
      * remove setattr in ShardConfig
      70885d70
    • flybird11111's avatar
      [gemini] gemini support tensor parallelism. (#4942) · 576a2f7b
      flybird11111 authored
      * [colossalai]fix typo
      
      * [inference] Add smmoothquant for llama (#4904)
      
      * [inference] add int8 rotary embedding kernel for smoothquant (#4843)
      
      * [inference] add smoothquant llama attention (#4850)
      
      * add smoothquant llama attention
      
      * remove uselss code
      
      * remove useless code
      
      * fix import error
      
      * rename file name
      
      * [inference] add silu linear fusion for smoothquant llama mlp  (#4853)
      
      * add silu linear
      
      * update skip condition
      
      * catch smoothquant cuda lib exception
      
      * prcocess exception for tests
      
      * [inference] add llama mlp for smoothquant (#4854)
      
      * add llama mlp for smoothquant
      
      * fix down out scale
      
      * remove duplicate lines
      
      * add llama mlp check
      
      * delete useless code
      
      * [inference] add smoothquant llama (#4861)
      
      * add smoothquant llama
      
      * fix attention accuracy
      
      * fix accuracy
      
      * add kv cache and save pretrained
      
      * refactor example
      
      * delete smooth
      
      * refactor code
      
      * [inference] add smooth function and delete useless code for smoothquant (#4895)
      
      * add smooth function and delete useless code
      
      * update datasets
      
      * remove duplicate import
      
      * delete useless file
      
      * refactor codes (#4902)
      
      * rafactor code
      
      * add license
      
      * add torch-int and smoothquant license
      
      * Update flash_attention_patch.py
      
      To be compatible with the new change in the Transformers library, where a new argument 'padding_mask' was added to forward function of attention layer.
      https://github.com/huggingface/transformers/pull/25598
      
      
      
      * [kernel] support pure fp16 for cpu adam and update gemini optim tests (#4921)
      
      * [kernel] support pure fp16 for cpu adam (#4896)
      
      * [kernel] fix cpu adam kernel for pure fp16 and update tests (#4919)
      
      * [kernel] fix cpu adam
      
      * [test] update gemini optim test
      
      * [format] applied code formatting on changed files in pull request 4908 (#4918)
      Co-authored-by: default avatargithub-actions <github-actions@github.com>
      
      * [gemini] support gradient accumulation (#4869)
      
      * add test
      
      * fix no_sync bug in low level zero plugin
      
      * fix test
      
      * add argument for grad accum
      
      * add grad accum in backward hook for gemini
      
      * finish implementation, rewrite tests
      
      * fix test
      
      * skip stuck model in low level zero test
      
      * update doc
      
      * optimize communication & fix gradient checkpoint
      
      * modify doc
      
      * cleaning codes
      
      * update cpu adam fp16 case
      
      * [hotfix] fix torch 2.0 compatibility (#4936)
      
      * [hotfix] fix launch
      
      * [test] fix test gemini optim
      
      * [shardformer] fix vit
      
      * [test] add no master test for low level zero plugin (#4934)
      
      * [format] applied code formatting on changed files in pull request 4820 (#4886)
      Co-authored-by: default avatargithub-actions <github-actions@github.com>
      
      * [nfc] fix some typo with colossalai/ docs/ etc. (#4920)
      
      * [Refactor] Integrated some lightllm kernels into token-attention  (#4946)
      
      * add some req for inference
      
      * clean codes
      
      * add codes
      
      * add some lightllm deps
      
      * clean codes
      
      * hello
      
      * delete rms files
      
      * add some comments
      
      * add comments
      
      * add doc
      
      * add lightllm deps
      
      * add lightllm cahtglm2 kernels
      
      * add lightllm cahtglm2 kernels
      
      * replace rotary embedding with lightllm kernel
      
      * add some commnets
      
      * add some comments
      
      * add some comments
      
      * add
      
      * replace fwd kernel att1
      
      * fix a arg
      
      * add
      
      * add
      
      * fix token attention
      
      * add some comments
      
      * clean codes
      
      * modify comments
      
      * fix readme
      
      * fix bug
      
      * fix bug
      
      ---------
      Co-authored-by: default avatarcuiqing.li <lixx336@gmail.com>
      Co-authored-by: default avatarCjhHa1 <cjh18671720497@outlook.com>
      
      * [test] merge old components to test to model zoo (#4945)
      
      * [test] add custom models in model zoo
      
      * [test] update legacy test
      
      * [test] update model zoo
      
      * [test] update gemini test
      
      * [test] remove components to test
      
      * [inference] add reference and fix some bugs (#4937)
      
      * add reference and fix some bugs
      
      * update gptq init
      
      ---------
      Co-authored-by: default avatarXu Kai <xukai16@foxamil.com>
      
      * [Inference]ADD Bench Chatglm2 script (#4963)
      
      * add bench chatglm
      
      * fix bug and make utils
      
      ---------
      
      Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
      
      * [Pipeline inference] Combine kvcache with pipeline inference (#4938)
      
      * merge kvcache with pipeline inference and refactor the code structure
      
      * support ppsize > 2
      
      * refactor pipeline code
      
      * do pre-commit
      
      * modify benchmark
      
      * fix bench mark
      
      * polish code
      
      * add docstring and update readme
      
      * refactor the code
      
      * fix some logic bug of ppinfer
      
      * polish readme
      
      * fix typo
      
      * skip infer test
      
      * updated c++17 compiler flags (#4983)
      
      * [Inference] Dynamic Batching Inference, online and offline (#4953)
      
      * [inference] Dynamic Batching for Single and Multiple GPUs (#4831)
      
      * finish batch manager
      
      * 1
      
      * first
      
      * fix
      
      * fix dynamic batching
      
      * llama infer
      
      * finish test
      
      * support different lengths generating
      
      * del prints
      
      * del prints
      
      * fix
      
      * fix bug
      
      ---------
      
      Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
      
      * [inference] Async dynamic batching  (#4894)
      
      * finish input and output logic
      
      * add generate
      
      * test forward
      
      * 1
      
      * [inference]Re push async dynamic batching (#4901)
      
      * adapt to ray server
      
      * finish async
      
      * finish test
      
      * del test
      
      ---------
      Co-authored-by: default avataryuehuayingxueluo <867460659@qq.com>
      
      * Revert "[inference]Re push async dynamic batching (#4901)" (#4905)
      
      This reverts commit fbf3c09e673794ed18c91d4bab1a7dfea052e95a.
      
      * Revert "[inference] Async dynamic batching  (#4894)"
      
      This reverts commit fced14025043e29ce816b315f440601188f7f79f.
      
      * Revert "[inference] Async dynamic batching  (#4894)" (#4909)
      
      This reverts commit fced14025043e29ce816b315f440601188f7f79f.
      
      * Add Ray Distributed Environment Init Scripts
      
      * support DynamicBatchManager base function
      
      * revert _set_tokenizer version
      
      * add driver async generate
      
      * add async test
      
      * fix bugs in test_ray_dist.py
      
      * add get_tokenizer.py
      
      * fix code style
      
      * fix bugs about No module named 'pydantic' in ci test
      
      * fix bugs in ci test
      
      * fix bugs in ci test
      
      * fix bugs in ci test
      
      * [infer]Add Ray Distributed Environment Init Scripts (#4911)
      
      * Revert "[inference] Async dynamic batching  (#4894)"
      
      This reverts commit fced14025043e29ce816b315f440601188f7f79f.
      
      * Add Ray Distributed Environment Init Scripts
      
      * support DynamicBatchManager base function
      
      * revert _set_tokenizer version
      
      * add driver async generate
      
      * add async test
      
      * fix bugs in test_ray_dist.py
      
      * add get_tokenizer.py
      
      * fix code style
      
      * fix bugs about No module named 'pydantic' in ci test
      
      * fix bugs in ci test
      
      * fix bugs in ci test
      
      * fix bugs in ci test
      
      * support dynamic batch for bloom model and is_running function
      
      * [Inference]Test for new Async engine (#4935)
      
      * infer engine
      
      * infer engine
      
      * test engine
      
      * test engine
      
      * new manager
      
      * change step
      
      * add
      
      * test
      
      * fix
      
      * fix
      
      * finish test
      
      * finish test
      
      * finish test
      
      * finish test
      
      * add license
      
      ---------
      Co-authored-by: default avataryuehuayingxueluo <867460659@qq.com>
      
      * add assertion for config (#4947)
      
      * [Inference] Finish dynamic batching offline test (#4948)
      
      * test
      
      * fix test
      
      * fix quant
      
      * add default
      
      * fix
      
      * fix some bugs
      
      * fix some bugs
      
      * fix
      
      * fix bug
      
      * fix bugs
      
      * reset param
      
      ---------
      Co-authored-by: default avataryuehuayingxueluo <867460659@qq.com>
      Co-authored-by: default avatarCuiqing Li <lixx3527@gmail.com>
      Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
      
      * [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention  (#4965)
      
      * adding flash-decoding
      
      * clean
      
      * adding kernel
      
      * adding flash-decoding
      
      * add integration
      
      * add
      
      * adding kernel
      
      * adding kernel
      
      * adding triton 2.1.0 features for inference
      
      * update bloom triton kernel
      
      * remove useless vllm kernels
      
      * clean codes
      
      * fix
      
      * adding files
      
      * fix readme
      
      * update llama flash-decoding
      
      ---------
      Co-authored-by: default avatarcuiqing.li <lixx336@gmail.com>
      
      * fix ColossalEval (#4992)
      Co-authored-by: default avatarXu Yuanchen <yuanchen.xu00@gmail.com>
      
      * [doc]Update doc for colossal-inference (#4989)
      
      * update doc
      
      * Update README.md
      
      ---------
      Co-authored-by: default avatarcuiqing.li <lixx336@gmail.com>
      
      * [hotfix] Fix the bug where process groups were not being properly released. (#4940)
      
      * Fix the bug where process groups were not being properly released.
      
      * test
      
      * Revert "test"
      
      This reverts commit 479900c1398637310abf92eefa3cd168038ea02f.
      
      * [hotfix] fix the bug of repeatedly storing param group (#4951)
      
      * [doc] add supported feature diagram for hybrid parallel plugin (#4996)
      
      * [Pipeline Inference] Merge pp with tp (#4993)
      
      * refactor pipeline into new CaiInferEngine
      
      * updata llama modeling forward
      
      * merge tp with pp
      
      * update docstring
      
      * optimize test workflow and example
      
      * fix typo
      
      * add assert and todo
      
      * [release] update version (#4995)
      
      * [release] update version
      
      * [hotfix] fix ci
      
      * [gemini] gemini support tp
      
      [gemini] gemini support tp
      
      [gemini] gemini support tp
      
      [gemini] gemini support tp
      
      [gemini] gemini support tp
      
      * fix
      
      fix
      
      fix
      
      * update checkpointIO
      
      update checkpointIO
      
      update checkpointIO
      
      update checkpointIO
      
      update checkpointIO
      
      update checkpointIO
      
      update checkpointIO
      
      update checkpointIO
      
      update checkpointIO
      
      * support fused layernorm
      
      support fused layernorm
      
      support fused layernorm
      
      * update fusedlayernorm
      
      update fusedlayernorm
      
      update fusedlayernorm
      
      * add sequence parallel to gemini
      
      add sequence parallel to gemini
      
      * fix
      
      * fix comments
      
      fix comments
      
      fix comments
      
      * fix
      
      * fix t5
      
      * clear cache
      
      * fix
      
      * activate ci
      
      * activate ci
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * revert
      
      * modify tp gather method
      
      modify tp gather method
      
      modify tp gather method
      
      modify tp gather method
      
      * fix test
      
      ---------
      Co-authored-by: default avatarXu Kai <xukai16@foxmail.com>
      Co-authored-by: default avatarZian(Andy) Zheng <62330719+Orion-Zheng@users.noreply.github.com>
      Co-authored-by: default avatarHongxin Liu <lhx0217@gmail.com>
      Co-authored-by: default avatargithub-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
      Co-authored-by: default avatargithub-actions <github-actions@github.com>
      Co-authored-by: default avatarBaizhou Zhang <eddiezhang@pku.edu.cn>
      Co-authored-by: default avatarZhongkai Zhao <kanezz620@gmail.com>
      Co-authored-by: default avatardigger yu <digger-yu@outlook.com>
      Co-authored-by: default avatarCuiqing Li <lixx3527@gmail.com>
      Co-authored-by: default avatarcuiqing.li <lixx336@gmail.com>
      Co-authored-by: default avatarCjhHa1 <cjh18671720497@outlook.com>
      Co-authored-by: default avatarXu Kai <xukai16@foxamil.com>
      Co-authored-by: default avatarJianghai <72591262+CjhHa1@users.noreply.github.com>
      Co-authored-by: default avatarBin Jia <45593998+FoolPlayer@users.noreply.github.com>
      Co-authored-by: default avatarアマデウス <kurisusnowdeng@users.noreply.github.com>
      Co-authored-by: default avataryuehuayingxueluo <867460659@qq.com>
      Co-authored-by: default avatarYuanchen <70520919+chengeharrison@users.noreply.github.com>
      Co-authored-by: default avatarXu Yuanchen <yuanchen.xu00@gmail.com>
      Co-authored-by: default avatarlittsk <1214689160@qq.com>
      Co-authored-by: default avatarppt0011 <143150326+ppt0011@users.noreply.github.com>
      576a2f7b
  15. 09 Nov, 2023 3 commits
  16. 08 Nov, 2023 2 commits
  17. 07 Nov, 2023 1 commit