Deep Learning Recommendation Model for Personalization and Recommendation Systems: ================================================================================= ## 模型结构 ``` output: probability of a click model: | /\ /__\ | _____________________> Op <___________________ / | \ /\ /\ /\ /__\ /__\ ... /__\ | | | | Op Op | ____/__\_____ ____/__\____ | |_Emb_|____|__| ... |_Emb_|__|___| input: [ dense features ] [sparse indices] , ..., [sparse indices] ``` More precise definition of model layers: 1) fully connected layers of an mlp z = f(y) y = Wx + b 2) embedding lookup (for a list of sparse indices p=[p1,...,pk]) z = Op(e1,...,ek) obtain vectors e1=E[:,p1], ..., ek=E[:,pk] 3) Operator Op can be one of the following Sum(e1,...,ek) = e1 + ... + ek Dot(e1,...,ek) = [e1'e1, ..., e1'ek, ..., ek'e1, ..., ek'ek] Cat(e1,...,ek) = [e1', ..., ek']' where ' denotes transpose operation 测试用例执行 -------------------- 1) 模型简单测试 ``` $ python dlrm_s_pytorch.py --mini-batch-size=2 --data-size=6 time/loss/accuracy (if enabled): Finished training it 1/3 of epoch 0, -1.00 ms/it, loss 0.451893, accuracy 0.000% Finished training it 2/3 of epoch 0, -1.00 ms/it, loss 0.402002, accuracy 0.000% Finished training it 3/3 of epoch 0, -1.00 ms/it, loss 0.275460, accuracy 0.000% ``` 2) debug模式(可以自行设置模型参数、规格) ``` $ python dlrm_s_pytorch.py --mini-batch-size=2 --data-size=6 --debug-mode model arch: mlp top arch 3 layers, with input to output dimensions: [8 4 2 1] # of interactions 8 mlp bot arch 2 layers, with input to output dimensions: [4 3 2] # of features (sparse and dense) 4 dense feature size 4 sparse feature size 2 # of embeddings (= # of sparse features) 3, with dimensions 2x: [4 3 2] data (inputs and targets): mini-batch: 0 [[0.69647 0.28614 0.22685 0.55131] [0.71947 0.42311 0.98076 0.68483]] [[[1], [0, 1]], [[0], [1]], [[1], [0]]] [[0.55679] [0.15896]] mini-batch: 1 [[0.36179 0.22826 0.29371 0.63098] [0.0921 0.4337 0.43086 0.49369]] [[[1], [0, 2, 3]], [[1], [1, 2]], [[1], [1]]] [[0.15307] [0.69553]] mini-batch: 2 [[0.60306 0.54507 0.34276 0.30412] [0.41702 0.6813 0.87546 0.51042]] [[[2], [0, 1, 2]], [[1], [2]], [[1], [1]]] [[0.31877] [0.69197]] initial parameters (weights and bias): [[ 0.05438 -0.11105] [ 0.42513 0.34167] [-0.1426 -0.45641] [-0.19523 -0.10181]] [[ 0.23667 0.57199] [-0.16638 0.30316] [ 0.10759 0.22136]] [[-0.49338 -0.14301] [-0.36649 -0.22139]] [[0.51313 0.66662 0.10591 0.13089] [0.32198 0.66156 0.84651 0.55326] [0.85445 0.38484 0.31679 0.35426]] [0.17108 0.82911 0.33867] [[0.55237 0.57855 0.52153] [0.00269 0.98835 0.90534]] [0.20764 0.29249] [[0.52001 0.90191 0.98363 0.25754 0.56436 0.80697 0.39437 0.73107] [0.16107 0.6007 0.86586 0.98352 0.07937 0.42835 0.20454 0.45064] [0.54776 0.09333 0.29686 0.92758 0.569 0.45741 0.75353 0.74186] [0.04858 0.7087 0.83924 0.16594 0.781 0.28654 0.30647 0.66526]] [0.11139 0.66487 0.88786 0.69631] [[0.44033 0.43821 0.7651 0.56564] [0.0849 0.58267 0.81484 0.33707]] [0.92758 0.75072] [[0.57406 0.75164]] [0.07915] DLRM_Net( (emb_l): ModuleList( (0): EmbeddingBag(4, 2, mode=sum) (1): EmbeddingBag(3, 2, mode=sum) (2): EmbeddingBag(2, 2, mode=sum) ) (bot_l): Sequential( (0): Linear(in_features=4, out_features=3, bias=True) (1): ReLU() (2): Linear(in_features=3, out_features=2, bias=True) (3): ReLU() ) (top_l): Sequential( (0): Linear(in_features=8, out_features=4, bias=True) (1): ReLU() (2): Linear(in_features=4, out_features=2, bias=True) (3): ReLU() (4): Linear(in_features=2, out_features=1, bias=True) (5): Sigmoid() ) ) time/loss/accuracy (if enabled): Finished training it 1/3 of epoch 0, -1.00 ms/it, loss 0.451893, accuracy 0.000% Finished training it 2/3 of epoch 0, -1.00 ms/it, loss 0.402002, accuracy 0.000% Finished training it 3/3 of epoch 0, -1.00 ms/it, loss 0.275460, accuracy 0.000% updated parameters (weights and bias): [[ 0.0543 -0.1112 ] [ 0.42513 0.34167] [-0.14283 -0.45679] [-0.19532 -0.10197]] [[ 0.23667 0.57199] [-0.1666 0.30285] [ 0.10751 0.22124]] [[-0.49338 -0.14301] [-0.36664 -0.22164]] [[0.51313 0.66663 0.10591 0.1309 ] [0.32196 0.66154 0.84649 0.55324] [0.85444 0.38482 0.31677 0.35425]] [0.17109 0.82907 0.33863] [[0.55238 0.57857 0.52154] [0.00265 0.98825 0.90528]] [0.20764 0.29244] [[0.51996 0.90184 0.98368 0.25752 0.56436 0.807 0.39437 0.73107] [0.16096 0.60055 0.86596 0.98348 0.07938 0.42842 0.20453 0.45064] [0.5476 0.0931 0.29701 0.92752 0.56902 0.45752 0.75351 0.74187] [0.04849 0.70857 0.83933 0.1659 0.78101 0.2866 0.30646 0.66526]] [0.11137 0.66482 0.88778 0.69627] [[0.44029 0.43816 0.76502 0.56561] [0.08485 0.5826 0.81474 0.33702]] [0.92754 0.75067] [[0.57379 0.7514 ]] [0.07908] ``` 基准测试 ------------ 1) 使用随机生成数据测试 ``` ./bench/dlrm_s_benchmark.sh ``` 2) 使用[Criteo Kaggle Display Advertising Challenge Dataset](https://ailab.criteo.com/ressources/) 数据测试方法. - 下载并解压数据到/data/kaggle路径下 ``` mkdir -p /data/kaggle tar xvf kaggle-display-advertising-challenge-dataset.tar.gz ``` - 执行测试脚本 ``` ./bench/dlrm_s_criteo_kaggle.sh [--test-freq=1024] ``` - 可以通过修改脚本中的以下参数来指定测试数据路径 - 首先可以指定训练数据地址 --raw-data-file= - 可以指定预处理后的数据地址 --processed-data-file= 训练结果参考如下 3) 多节点测试:代码支持分布式训练,目前支持gloo/nccl/mpi. ``` # 单节点4颗DCU测试,使用nccl通信,测试数据使用随机生成数据: python -m torch.distributed.launch --nproc_per_node=8 dlrm_s_pytorch.py --arch-embedding-size="80000-80000-80000-80000-80000-80000-80000-80000" --arch-sparse-feature-size=64 --arch-mlp-bot="128-128-128-128" --arch-mlp-top="512-512-512-256-1" --max-ind-range=40000000 --data-generation=random --loss-function=bce --round-targets=True --learning-rate=1.0 --mini-batch-size=2048 --print-freq=2 --print-time --test-freq=2 --test-mini-batch-size=2048 --memory-map --use-gpu --num-batches=100 --dist-backend=nccl # 多节点的情况可以添加如下参数: --nnodes=2 --node_rank=0 --master_addr="192.168.1.1" --master_port=1234 ``` 保存、加载模型参数 ------------------------------- * --save-model= : 保存模型地址、名称 * --load-model= : 加载模型 其他 ---- 想了解其他应用情况,可以参考地址:https://github.com/facebookresearch/dlrm Version ------- 0.1 : Initial release of the DLRM code 1.0 : DLRM with distributed training, cpu support for row-wise adagrad optimizer Requirements ------------ pytorch (*11/10/20*) scikit-learn numpy onnx (*optional*) pydot (*optional*) torchviz (*optional*) mpi (*optional for distributed backend*)