• Hu Yaoqi's avatar
    fix: resolve issue with inability to correctly specify non-zero GPUs in multi-GPU systems (#404) · 1e621a58
    Hu Yaoqi authored
    * Fix: Correctly specify non-zero GPUs in multi-GPU environments
    
    This commit resolves an issue where the Nunchaku model could not be
    correctly initialized and run on a user-specified non-zero GPU in
    multi-GPU systems.
    
    Key changes include:
    - Using CUDADeviceContext in the FluxModel constructor to ensure
      the model and its submodules are created within the specified GPU context.
    - Modifying the logic in FluxModel::forward for copying residual data
      from CPU back to GPU, ensuring it returns to the correct original GPU device.
    - Adding explicit CUDA context management in Tensor::copy_ for data
      copy operations involving CUDA devices (H2D, D2H, D2D) to guarantee
      cudaMemcpyAsync executes on the correct device.
    
    These changes allow users to reliably run Nunchaku on any specified
    GPU in a multi-GPU setup.
    
    * finish pre-commit
    1e621a58
FluxModel.cpp 41.7 KB