@@ -20,15 +20,15 @@ class CachedParamMgr(torch.nn.Module):
...
@@ -20,15 +20,15 @@ class CachedParamMgr(torch.nn.Module):
CPU maintains the entire original weight.
CPU maintains the entire original weight.
CUDA maintains a fraction of the weights used in the upcoming computation. The row number in CUDA is controlled by `cuda_row_num`.
CUDA maintains a fraction of the weights used in the upcoming computation. The row number in CUDA is controlled by `cuda_row_num`.
During training, GPU needs to transmit embedding rows between CPU and GPU.
During training, GPU needs to transmit embedding rows between CPU and GPU.
Args:
Args:
weight (torch.Tensor): the weight of the Embedding layer.
weight (torch.Tensor): the weight of the Embedding layer.
cuda_row_num (int, optional): the number of rows cached in CUDA memory. Defaults to 0.
cuda_row_num (int, optional): the number of rows cached in CUDA memory. Defaults to 0.
buffer_size (int, optional): the number of rows in a data transmitter buffer. Defaults to 50_000.
buffer_size (int, optional): the number of rows in a data transmitter buffer. Defaults to 50_000.
pin_weight (bool, optional): use pin memory to store the cpu weight. If set `True`, the cpu memory usage will increase largely. Defaults to False.
pin_weight (bool, optional): use pin memory to store the cpu weight. If set `True`, the cpu memory usage will increase largely. Defaults to False.
evict_strategy (EvictionStrategy, optional): the eviction strategy. There are two options. `EvictionStrategy.LFU` uses the least frequently used cache. `EvictionStrategy.DATASET`: use the stats collected from the target dataset. It usually leads to less cpu-gpu communication volume.
evict_strategy (EvictionStrategy, optional): the eviction strategy. There are two options.
Default as EvictionStrategy.DATASET.
`EvictionStrategy.LFU`: use the least frequently used cache.
use_cpu_caching (bool, optional): use cpu to execute cache indexing. It is slower than use gpu.
`EvictionStrategy.DATASET`: use the stats collected from the target dataset. It usually leads to less cpu-gpu communication volume.
Defaults to EvictionStrategy.DATASET.
"""
"""
def__init__(
def__init__(
...
@@ -38,7 +38,6 @@ class CachedParamMgr(torch.nn.Module):
...
@@ -38,7 +38,6 @@ class CachedParamMgr(torch.nn.Module):