Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
dgl
Commits
1ca17bfa
"git@developer.sourcefind.cn:renzhc/diffusers_dcu.git" did not exist on "9f00c617a0bc50527c1498c36fde066f995a79dd"
Unverified
Commit
1ca17bfa
authored
Nov 22, 2023
by
Rhett Ying
Committed by
GitHub
Nov 22, 2023
Browse files
[doc] update cpu_best_practices (#6588)
parent
34da58da
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
21 additions
and
8 deletions
+21
-8
tutorials/cpu/cpu_best_practises.py
tutorials/cpu/cpu_best_practises.py
+21
-8
No files found.
tutorials/cpu/cpu_best_practises.py
View file @
1ca17bfa
...
@@ -29,17 +29,30 @@ To take advantage of optimizations *tcmalloc* provides, install it on your syste
...
@@ -29,17 +29,30 @@ To take advantage of optimizations *tcmalloc* provides, install it on your syste
OpenMP settings
OpenMP settings
---------------------------
---------------------------
During training on CPU, the training and dataloading part need to be maintained simultaneously.
As `OpenMP` is the default parallel backend, we could control performance
Best performance of parallelization in OpenMP
including sampling and training via `dgl.utils.set_num_threads()`.
can be achieved by setting up the optimal number of working threads and dataloading workers.
Nodes with high number of CPU cores may benefit from higher number of dataloading workers.
If number of OpenMP threads is not set and `num_workers` in dataloader is set
A good starting point could be setting num_threads=4 in Dataloader constructor for nodes with 32 cores or more.
to 0, the OpenMP runtime typically use the number of available CPU cores by
If number of cores is rather small, the best performance might be achieved with just one
default. This works well for most cases, and is also the default behavior in DGL.
dataloader worker or even with dataloader num_threads=0 for dataloading and trainig performed
in the same process
If `num_workers` in dataloader is set to greater than 0, the number of
OpenMP threads will be set to **1** for each worker process. This is the
default behavior in PyTorch. In this case, we can set the number of OpenMP
threads to the number of CPU cores in the main process.
Performance tuning is highly dependent on the workload and hardware
configuration. We recommend users to try different settings and choose the
best one for their own cases.
**Dataloader CPU affinity**
**Dataloader CPU affinity**
.. note::
This feature is available for `dgl.dataloading.DataLoader` only. Not
available for dataloaders in `dgl.graphbolt` yet.
If number of dataloader workers is more than 0, please consider using **use_cpu_affinity()** method
If number of dataloader workers is more than 0, please consider using **use_cpu_affinity()** method
of DGL Dataloader class, it will generally result in significant performance improvement for training.
of DGL Dataloader class, it will generally result in significant performance improvement for training.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment