fix: resolve issue with inability to correctly specify non-zero GPUs in multi-GPU systems (#404)
* Fix: Correctly specify non-zero GPUs in multi-GPU environments This commit resolves an issue where the Nunchaku model could not be correctly initialized and run on a user-specified non-zero GPU in multi-GPU systems. Key changes include: - Using CUDADeviceContext in the FluxModel constructor to ensure the model and its submodules are created within the specified GPU context. - Modifying the logic in FluxModel::forward for copying residual data from CPU back to GPU, ensuring it returns to the correct original GPU device. - Adding explicit CUDA context management in Tensor::copy_ for data copy operations involving CUDA devices (H2D, D2H, D2D) to guarantee cudaMemcpyAsync executes on the correct device. These changes allow users to reliably run Nunchaku on any specified GPU in a multi-GPU setup. * finish pre-commit
Showing
Please register or sign in to comment