user guide korean version (#3559)

Co-authored-by: zhjwy9343 <6593865@qq.com>

user guide korean version (#3559)
Co-authored-by: zhjwy9343 <6593865@qq.com>
67f93144 · Muhyun Kim · GitHub · d2ef2433 · 67f93144 · 67f93144
Unverified Commit 67f93144 authored Dec 03, 2021 by Muhyun Kim Committed by GitHub Dec 03, 2021
20 changed files
--- a/docs/source/guide_ko/message-edge.rst
+++ b/docs/source/guide_ko/message-edge.rst
+.. _guide_ko-message-passing-edge:
+
+2.4 메시지 전달에 에지 가중치 적용하기
+-----------------------------
+
+:ref:`(English Version) <guide-message-passing-edge>`
+
+`GAT <https://arxiv.org/pdf/1710.10903.pdf>`__ 또는 일부 `GCN
+변형 <https://arxiv.org/abs/2004.00445>`__ 에서 사용되는 것처럼 메시지 병합이전에 에지의 가중치를 적용하는 것은 GNN 모델링에서 흔하게 사용되는 기법이다. DGL은 이를 다음과 같은 밥벙으로 지원하고 있다.
+
+- 가중치를 에지 피쳐로 저장
+- 메시지 함수에서 에지 피쳐를 소스 노드의 피쳐와 곱하기
+
+예를 들면,
+
+.. code::
+
+    import dgl.function as fn
+
+    # Suppose eweight is a tensor of shape (E, *), where E is the number of edges.
+    graph.edata['a'] = eweight
+    graph.update_all(fn.u_mul_e('ft', 'a', 'm'),
+                     fn.sum('m', 'ft'))
+
+이 예제는 eweight를 이제 가중치고 사용하고 있다. 에지 가중치는 보통은 스칼라 값을 갖는다.
\ No newline at end of file
--- a/docs/source/guide_ko/message-efficient.rst
+++ b/docs/source/guide_ko/message-efficient.rst
+.. _guide_ko-message-passing-efficient:
+
+2.2 효율적인 메시지 전달 코드 작성 방법
+------------------------------
+
+:ref:`(English Version) <guide-message-passing-efficient>`
+
+DGL은 메시지 전달에 대한 메모리 사용과 연산 속드를 최적화하고 있다. 이 최적화들을 활용하는 일반적으로 사용되는 방법은 직접 메시지 전달 함수를 만들어서 이를 :meth:`~dgl.DGLGraph.update_all` 호출시 빌트인 함수와 함께 파라메터로 사용하는 것이다. 
+
+만약 그래프의 에지들의 수가 노드들의 수보다 훨씬 많은 경우에는 노드에서 에지로의 불필요한 메모리 복사를 피하는 것이 도움이 된다. 에지에 메시지를 저장할 필요가 있는 :class:`~dgl.nn.pytorch.conv.GATConv` 와 같은 경우에는 빌트인 함수를 사용해서 :meth:`~dgl.DGLGraph.apply_edges` 를 호출해야 한다. 때로는 에지에 저장할 메시지의 차원이 너무 커서 메모리를 많이 차지하기도 한다. DGL에서는 가능한 에지 피쳐의 차원을 낮추는 것을 권장한다.
+
+에지에 대한 연산을 노드로 분할하여 이를 달성하는 방법에 대한 예제이다. 이 방법은 다음과 같다. ``src`` 피쳐와 ``dst`` 피쳐를 연결하고, 선형 레이어 :math:`W\times (u || v)`를 적용하는 경우를 들어보자. ``src``와 ``dst`` 피처 차원은 매우 높은 반면에 선형 레이어의 결과 차원은 낮다고 가정하자. 이 예제를 직관적으로 구현하면 다음과 같다.
+
+.. code::
+
+    import torch
+    import torch.nn as nn
+
+    linear = nn.Parameter(torch.FloatTensor(size=(node_feat_dim * 2, out_dim)))
+    def concat_message_function(edges):
+         return {'cat_feat': torch.cat([edges.src['feat'], edges.dst['feat']], dim=1)}
+    g.apply_edges(concat_message_function)
+    g.edata['out'] = g.edata['cat_feat'] @ linear
+
+제안하는 구현은 이 선형 연산을 두개로 나누는 것이다. 하나는 ``src`` 피처에 적용하고, 다른 하나는 ``dst`` 피쳐에 적용한다. 그 후, 에지에 대한 두 선형 연산의 결과를 마지막 단계에서 더한다. 즉, :math:`W_l\times u + W_r \times v` 를 실행하는 것이다. :math:`W` 행렬의 왼쪽 반과 오른쪽 반이 각각 :math:`W_l` 와 :math:`W_r` 일 때, :math:`W \times (u||v) = W_l \times u + W_r \times v` 가 성립하기 때문에 가능하다.
+
+.. code::
+
+    import dgl.function as fn
+
+    linear_src = nn.Parameter(torch.FloatTensor(size=(node_feat_dim, out_dim)))
+    linear_dst = nn.Parameter(torch.FloatTensor(size=(node_feat_dim, out_dim)))
+    out_src = g.ndata['feat'] @ linear_src
+    out_dst = g.ndata['feat'] @ linear_dst
+    g.srcdata.update({'out_src': out_src})
+    g.dstdata.update({'out_dst': out_dst})
+    g.apply_edges(fn.u_add_v('out_src', 'out_dst', 'out'))
+
+위 두 구현은 수학적으로 동일하다. 후자가 더 효율적인데, 그 이유는 메모리 비효율적인 에지에 feat_src와 feat_dst의 저장이 필요가 없기 때문이다. 추가로, 합은 연산속도가 더 빠르고 메모리 사용량을 줄인 DGL의 빌트인 함수 ``u_add_v`` 를 사용하면 최적화될 수 있다. 
\ No newline at end of file
--- a/docs/source/guide_ko/message-heterograph.rst
+++ b/docs/source/guide_ko/message-heterograph.rst
+.. _guide_ko-message-passing-heterograph:
+
+2.5 이종 그래프에서의 메시지 전달
+--------------------------
+
+:ref:`(English Version) <guide-message-passing-heterograph>`
+
+이종 그래프 ( :ref:`guide-graph-heterogeneous` ) 또는 헤테로그래프는 여러 타입의 노드와 에지를 갖는 그래프이다. 각 노드와 에지의 특징을 표현하기 위해서 다른 타입의 속성을 갖기 위해서 노드와 에지들이 다른 타입을 갖을 수 있다. 복잡한 그래프 뉴럴 네트워크들에서 어떤 노드나 에지 타입들은 다른 차원들을 갖게 모델링 되기도 한다.
+
+이종 그래프에서 메시지 전달은 두 파트로 나뉜다:
+
+1. 각 관계(relation) r에 대한, 메지시 연산과 집계(aggregation)
+2. 가 노트 타입에 대한 모든 관계의 집계 결과를 합치는 축약(reduction)
+
+이종 그래프에서 메시지 전달을 담당하는 DGL 인터페이스는 :meth:`~dgl.DGLGraph.multi_update_all` 이다. :meth:`~dgl.DGLGraph.multi_update_all` 는 :meth:`~dgl.DGLGraph.update_all` 에 대한 파라메터들을 갖는 사전(dictionary)을 인자로 받는다. 이 사전의 각 키값는 관계이고, 그에 대한 값은 크로스 타입 리듀셔(cross type reducer)에 대한 문자열이다. Reducer는 ``sum``, ``min``, ``max``, ``mean``, ``stack`` 중에 하나가 된다. 예제는 다음과 같다.
+
+.. code::
+
+    import dgl.function as fn
+
+    for c_etype in G.canonical_etypes:
+        srctype, etype, dsttype = c_etype
+        Wh = self.weight[etype](feat_dict[srctype])
+        # Save it in graph for message passing
+        G.nodes[srctype].data['Wh_%s' % etype] = Wh
+        # Specify per-relation message passing functions: (message_func, reduce_func).
+        # Note that the results are saved to the same destination feature 'h', which
+        # hints the type wise reducer for aggregation.
+        funcs[etype] = (fn.copy_u('Wh_%s' % etype, 'm'), fn.mean('m', 'h'))
+    # Trigger message passing of multiple types.
+    G.multi_update_all(funcs, 'sum')
+    # return the updated node feature dictionary
+    return {ntype : G.nodes[ntype].data['h'] for ntype in G.ntypes}
--- a/docs/source/guide_ko/message-part.rst
+++ b/docs/source/guide_ko/message-part.rst
+.. _guide_ko-message-passing-part:
+
+2.3 그래프 일부에 메지시 전달 적용하기
+------------------------------
+
+:ref:`(English Version) <guide-message-passing-part>`
+
+그래프 노드의 일부만 업데이트를 하기 원하는 경우, 업데이트를 하고 싶은 노드들의 ID를 사용해서 서브그래프를 만든 후, 그 서브그래프에 :meth:`~dgl.DGLGraph.update_all` 를 호출하는 방법으로 가능하다.
+
+.. code::
+
+    nid = [0, 2, 3, 6, 7, 9]
+    sg = g.subgraph(nid)
+    sg.update_all(message_func, reduce_func, apply_node_func)
+
+이는 미니-배치 학습에서 흔히 사용되는 방법이다. 자세한 사용법은 :ref:`guide-minibatch` 참고하자.
\ No newline at end of file
--- a/docs/source/guide_ko/message.rst
+++ b/docs/source/guide_ko/message.rst
+.. _guide_ko-message-passing:
+
+2장: 메지시 전달(Message Passing)
+=============================
+
+:ref:`(English Version) <guide-message-passing>`
+
+메지시 전달 패러다임(Message Passing Paradigm)
+-----------------------------------------
+
+:math:`x_v\in\mathbb{R}^{d_1}` 이 노드 :math:`v` 의 피처이고, :math:`w_{e}\in\mathbb{R}^{d_2}` 가 에지 :math:`({u}, {v})` 의 피처라고 하자. **메시지 전달 패러다임**은 :math:`t+1` 단계에서 노드별(node-wise) 그리고 에지별(edge-wise)의 연산을 다음과 같이 정의한다:
+
+.. math::  \text{에지별: } m_{e}^{(t+1)} = \phi \left( x_v^{(t)}, x_u^{(t)}, w_{e}^{(t)} \right) , ({u}, {v},{e}) \in \mathcal{E}.
+
+.. math::  \text{노드별: } x_v^{(t+1)} = \psi \left(x_v^{(t)}, \rho\left(\left\lbrace m_{e}^{(t+1)} : ({u}, {v},{e}) \in \mathcal{E} \right\rbrace \right) \right).
+
+위 수식에서 :math:`\phi` 는 각 에지에 대한 **메시지 함수** 로서 에지의 부속 노드(incident node)들의 피처를 그 에지 피처와 합쳐서 메시지를 만드는 역할을 수행한다. :math:`\psi` 는 각 노드에 대한 **업데이트 함수** 로, **축소 함수(reduce function)** :math:`\rho` 를 사용해서 전달된 메시지들을 통합하는 방식으로 노드의 피처를 업데이트한다.
+
+로드맵
+----
+
+이 장는 DGL의 메시지 전달 API들과, 노드와 에지에 효율적으로 적용하는 방법을 소개한다. 마지막 절에서는 이종 그래프에 메시지 전달을 어떻게 구현하는지 설명한다.
+
+* :ref:`guide_ko-message-passing-api`
+* :ref:`guide_ko-message-passing-efficient`
+* :ref:`guide_ko-message-passing-part`
+* :ref:`guide_ko-message-passing-edge`
+* :ref:`guide_ko-message-passing-heterograph`
+
+.. toctree::
+    :maxdepth: 1
+    :hidden:
+    :glob:
+
+    message-api
+    message-efficient
+    message-part
+    message-edge
+    message-heterograph
--- a/docs/source/guide_ko/minibatch-custom-sampler.rst
+++ b/docs/source/guide_ko/minibatch-custom-sampler.rst
+.. _guide_ko-minibatch-customizing-neighborhood-sampler:
+
+6.4 이웃 샘플러 커스터마이징하기
+-------------------------
+
+:ref:`(English Version) <guide-minibatch-customizing-neighborhood-sampler>`
+
+DGL이 여러 이웃 샘플링 방법들을 제공하지만, 샘플링 방법을 직접 만들어야할 경우도 있다. 이 절에서는 샘플링 방법을 직접 만드는 방법과 stochastic GNN 학습 프레임워크에서 사용하는 방법을 설명한다.
+
+`그래프 뉴럴 네트워크가 얼마나 강력한가(How Powerful are Graph Neural Networks) <https://arxiv.org/pdf/1810.00826.pdf>`__ 에서 설명했듯이, 메시지 전달은 다음과 같이 정의된다.
+
+.. math::
+
+
+   \begin{gathered}
+     \boldsymbol{a}_v^{(l)} = \rho^{(l)} \left(
+       \left\lbrace
+         \boldsymbol{h}_u^{(l-1)} : u \in \mathcal{N} \left( v \right)
+       \right\rbrace
+     \right)
+   \\
+     \boldsymbol{h}_v^{(l)} = \phi^{(l)} \left(
+       \boldsymbol{h}_v^{(l-1)}, \boldsymbol{a}_v^{(l)}
+     \right)
+   \end{gathered}
+
+여기서, :math:`\rho^{(l)}` 와 :math:`\phi^{(l)}` 는 파라메터를 갖는 함수이고, :math:`\mathcal{N}(v)`는 그래프 :math:`\mathcal{G}` 에 속한 노드 :math:`v` 의 선행 노드(predecessor)들 (또는 방향성 그래프의 경우 *이웃 노드들*)의 집합을 의미한다.
+
+아래 그래프의 빨간색 노드를 업데이트하는 메시지 전달을 수행하기 위해서는,
+
+.. figure:: https://data.dgl.ai/asset/image/guide_6_4_0.png
+   :alt: Imgur
+
+아래 그림의 녹색으로 표시된 이웃 노드들의 노드 피쳐들을 합쳐야한다(aggregate).
+
+.. figure:: https://data.dgl.ai/asset/image/guide_6_4_1.png
+   :alt: Imgur
+
+이웃 샘플링 직접 해보기
+~~~~~~~~~~~~~~~~~~
+
+우선 위 그림의 그래프를 DGL 그래프로 정의한다.
+
+.. code:: python
+
+    import torch
+    import dgl
+
+    src = torch.LongTensor(
+        [0, 0, 0, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 7, 7, 8, 9, 10,
+         1, 2, 3, 3, 3, 4, 5, 5, 6, 5, 8, 6, 8, 9, 8, 11, 11, 10, 11])
+    dst = torch.LongTensor(
+        [1, 2, 3, 3, 3, 4, 5, 5, 6, 5, 8, 6, 8, 9, 8, 11, 11, 10, 11,
+         0, 0, 0, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 7, 7, 8, 9, 10])
+    g = dgl.graph((src, dst))
+
+그리고 노드 한개에 대한 결과를 계산하기 위해서 멀티-레이어 메시지 전달을 어떻게 수행할지를 고려하자. 
+
+메시지 전달 의존성 찾기
+^^^^^^^^^^^^^^^^^
+
+아래 그래프에서 2-레이어 GNN을 사용해서 시드 노드 8의 결과를 계산하는 것을 생각해보자.
+
+.. figure:: https://data.dgl.ai/asset/image/guide_6_4_2.png
+   :alt: Imgur
+
+공식은 다음과 같다.
+
+.. math::
+
+
+   \begin{gathered}
+     \boldsymbol{a}_8^{(2)} = \rho^{(2)} \left(
+       \left\lbrace
+         \boldsymbol{h}_u^{(1)} : u \in \mathcal{N} \left( 8 \right)
+       \right\rbrace
+     \right) = \rho^{(2)} \left(
+       \left\lbrace
+         \boldsymbol{h}_4^{(1)}, \boldsymbol{h}_5^{(1)},
+         \boldsymbol{h}_7^{(1)}, \boldsymbol{h}_{11}^{(1)}
+       \right\rbrace
+     \right)
+   \\
+     \boldsymbol{h}_8^{(2)} = \phi^{(2)} \left(
+       \boldsymbol{h}_8^{(1)}, \boldsymbol{a}_8^{(2)}
+     \right)
+   \end{gathered}
+
+이 공식에 따르면, :math:`\boldsymbol{h}_8^{(2)}` 을 계산하기 위해서는 아래 그림에서와 같이 (녹색으로 표시된) 노드 4,5,7 그리고 11번에서 에지을 따라서 메시지를 수집하는 것이 필요하다.
+
+.. figure:: https://data.dgl.ai/asset/image/guide_6_4_3.png
+   :alt: Imgur
+
+이 그래프는 원본 그래프의 모든 노드들을 포함하고 있지만, 특정 출력 노드들에 메시지를 전달할 에지들만을 포함하고 있다. 이런 그래프를 빨간색 노드 8에 대한 두번째 GNN 레이어에 대한 *프론티어(frontier)* 라고 부른다.
+
+프론티어들을 생성하는데 여러 함수들이 사용된다. 예를 들어, :func:`dgl.in_subgraph()` 는 원본 그래프의 모든 노드를 포함하지만, 특정 노드의 진입 에지(incoming edge)들만 포함하는 서브 그래프를 유도하는 함수이다.
+
+.. code:: python
+
+    frontier = dgl.in_subgraph(g, [8])
+    print(frontier.all_edges())
+
+전체 구현은 :ref:`api-subgraph-extraction` 와 :ref:`api-sampling` 를 참고하자.
+
+기술적으로는 원본 그래프와 같은 노들들 집합을 잡는 어떤 그래프도 프로티어가 될 수 있다. 이는 :ref:`guide-minibatch-customizing-neighborhood-sampler-impl` 에 대한 기반이다.
+
+멀티-레이어 미니배치 메시지 전달을 위한 이분 구조(Bipartite Structure)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+하지만, :math:`\boldsymbol{h}_\cdot^{(1)}` 로부터 단순히 :math:`\boldsymbol{h}_8^{(2)}` 를 계산하는 것은 프론티어에서 메시지 전달을 계산하는 방식으로 할 수 없다. 그 이유는, 여전히 프론티어가 원본 그래프의 모든 노드를 포함하고 있기 때문이다. 이 그래프의 경우, (녹색과 빨간색 노드들) 4, 5, 7, 8, 11 노드들만이 입력으로 필요하고, 출력으로는 (빨간색 노드) 노드 8번이 필요하다. 입력과 출력의 노드 개수가 다르기 때문에, 작은 이분-구조(bipartite-structured) 그래프에서 메시지 전달을 수행할 필요가 있다.
+
+아래 그림은 노드 8에 대해서 2번째 GNN 레이어의 MFG를 보여준다.
+
+.. figure:: https://data.dgl.ai/asset/image/guide_6_4_4.png
+   :alt: Imgur
+
+.. note::
+
+   Message Flow Graph에 대한 개념은 :doc:`Stochastic Training Tutorial
+   <tutorials/large/L0_neighbor_sampling_overview>` 참고하자.
+
+목적지 노드들이 소스 노드에도 등장한다는 점을 유의하자. 그 이유는 메시지 전달(예를 들어, :math:`\phi^{(2)}` )이 수행된 후에 이전 레이어의 목적지 노드들의 representation들이 피처를 합치는데 사용되기 때문이다.
+
+DGL은 임의의 프론티어를 MFG로 변환하는 :func:`dgl.to_block` 함수를 제공한다. 이 함수의 첫번째 인자는 프론티어이고, 두번째 인자는 목적지 노드들이다. 예를 들어, 위 프론티어는 목적지 노드 8에 대한 MFG로 전환하는 코드는 다음과 같다.
+
+.. code:: python
+
+    dst_nodes = torch.LongTensor([8])
+    block = dgl.to_block(frontier, dst_nodes)
+
+:meth:`dgl.DGLHeteroGraph.number_of_src_nodes` 와
+:meth:`dgl.DGLHeteroGraph.number_of_dst_nodes` 메소스들 사용해서 특정 노트 타입의 소스 노드 및 목적지 노드의 수를 알아낼 수 있다.
+
+.. code:: python
+
+    num_src_nodes, num_dst_nodes = block.number_of_src_nodes(), block.number_of_dst_nodes()
+    print(num_src_nodes, num_dst_nodes)
+
+:attr:`dgl.DGLHeteroGraph.srcdata` 와 :attr:`dgl.DGLHeteroGraph.srcnodes` 같은 멤머를 통해서 MFG의 소스 노드 피쳐들을 접근할 수 있고, :attr:`dgl.DGLHeteroGraph.dstdata` 와 :attr:`dgl.DGLHeteroGraph.dstnodes` 를 통해서는 목적지 노드의 피쳐들을 접근할 수 있다. ``srcdata`` / ``dstdata`` 와 ``srcnodes`` / ``dstnodes`` 의 사용법은 일반 그래프에 사용하는 :attr:`dgl.DGLHeteroGraph.ndata` 와 :attr:`dgl.DGLHeteroGraph.nodes` 와 동일하다.
+
+.. code:: python
+
+    block.srcdata['h'] = torch.randn(num_src_nodes, 5)
+    block.dstdata['h'] = torch.randn(num_dst_nodes, 5)
+
+만약 MFG가 프론티어에서 만들어졌다면, 즉 프래프에서 만들어졌다면, MFG의 소스 및 목적지 노드의 피쳐는 다음과 같이 직접 읽을 수 있다.
+
+.. code:: python
+
+    print(block.srcdata['x'])
+    print(block.dstdata['y'])
+
+.. note::
+
+   MFG에서의 소스 노드와 목적지 노드의 원본의 노드 ID는 ``dgl.NID`` 피쳐에 저장되어 있고, MFG의 에지 ID들와 프론티어의 에지 ID 사이의 매핑은 ``dgl.EID`` 에 있다.
+
+DGL에서는 MFG의 목적지 노드들이 항상 소스 노드에도 있도록 하고 있다. 다음 코드에서 알수 있듯이, 목적지 노드들은 소스 노드들에서 늘 먼저 위치한다.
+
+.. code:: python
+
+    src_nodes = block.srcdata[dgl.NID]
+    dst_nodes = block.dstdata[dgl.NID]
+    assert torch.equal(src_nodes[:len(dst_nodes)], dst_nodes)
+
+그 결과, 목적지 노드들은 프론티어의 에지들의 목적지인 모든 노들들을 포함해야 한다.
+
+예를 들어, 아래 프론티어를 생각해 보자.
+
+.. figure:: https://data.dgl.ai/asset/image/guide_6_4_5.png
+   :alt: Imgur
+
+여기서 빨간 노드와 녹색 노드들 (즉, 4, 5, 7, 8 그리고 11번 노드)는 에지의 목적지가 되는 노드들이다. 이 경우, 아래 코드는 에러를 발생시키는데, 이유는 목적지 노드 목록이 이들 노드를 모두 포함하지 않기 때문이다.
+
+.. code:: python
+
+    dgl.to_block(frontier2, torch.LongTensor([4, 5]))   # ERROR
+
+하지만, 목적지 노드들은 위 보다 더 많은 노드들을 포함할 수 있다. 이 예제의 경우, 어떤 에지도 연결되지 않은 고립된 노드들(isolated node)이 있고, 이 고립 노드들은 소스 노드와 목적지 노드 모두에 포함될 수 있다.
+
+.. code:: python
+
+    # Node 3 is an isolated node that do not have any edge pointing to it.
+    block3 = dgl.to_block(frontier2, torch.LongTensor([4, 5, 7, 8, 11, 3]))
+    print(block3.srcdata[dgl.NID])
+    print(block3.dstdata[dgl.NID])
+
+Heterogeneous 그래프들
+^^^^^^^^^^^^^^^^^^^^
+
+MFG들은 heterogeneous 그래프에도 적용됩니다. 다음 프론티어를 예로 들어보자.
+
+.. code:: python
+
+    hetero_frontier = dgl.heterograph({
+        ('user', 'follow', 'user'): ([1, 3, 7], [3, 6, 8]),
+        ('user', 'play', 'game'): ([5, 5, 4], [6, 6, 2]),
+        ('game', 'played-by', 'user'): ([2], [6])
+    }, num_nodes_dict={'user': 10, 'game': 10})
+
+목적지 노드들 User #3, #4, #8 그리고 Game #2, #6을 포함한 MFG를 생성한다.
+
+.. code:: python
+
+    hetero_block = dgl.to_block(hetero_frontier, {'user': [3, 6, 8], 'game': [2, 6]})
+
+소스 노드들과 목적지 노드들의 타입별로 얻을 수 있다.
+
+.. code:: python
+
+    # source users and games
+    print(hetero_block.srcnodes['user'].data[dgl.NID], hetero_block.srcnodes['game'].data[dgl.NID])
+    # destination users and games
+    print(hetero_block.dstnodes['user'].data[dgl.NID], hetero_block.dstnodes['game'].data[dgl.NID])
+
+
+.. _guide-minibatch-customizing-neighborhood-sampler-impl:
+
+커스텀 이웃 샘플러 구현하기
+~~~~~~~~~~~~~~~~~~~~
+
+아래 코드는 노드 분류를 위한 이웃 샘플링을 수행한다는 것을 떠올려 보자.
+
+.. code:: python
+
+    sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
+
+이웃 샘플링 전략을 직접 구현하기 위해서는 ``sampler`` 를 직접 구현한 내용으로 바꾸기만 하면 된다. 이를 살펴보기 위해서, 우선 :class:`~dgl.dataloading.neighbor.MultiLayerFullNeighborSampler` 를 상속한 클래스인 :class:`~dgl.dataloading.dataloader.BlockSampler` 를 살펴보자.
+
+:class:`~dgl.dataloading.dataloader.BlockSampler` 클래스는 :meth:`~dgl.dataloading.dataloader.BlockSampler.sample_blocks` 메소드를 통해서 마지막 레이어로부터 시작하는 MFG들의 리스트를 만들어내는 역할을 한다. ``sample_blocks`` 의 기본 구현은 프론티어들과 그것들을 MFG들로 변환하면서 backwards를 iterate한다.
+
+따라서, 이웃 샘플링을 하기 위해서 단지 :meth:`~dgl.dataloading.dataloader.BlockSampler.sample_frontier` **메소드** 를 **구현하기만 하면된다**. 어떤 레이어를 위한 프론티어를 생성할 것인지, 원본 그래프, representation들을 계산할 노드들이 주어지면, 이 메소드는 그것들을 위한 프론티어를 생성하는것을 담당한다.
+
+GNN 레이어 수를 상위 클래스에 전달해야 한다.
+
+예를 들어, :class:`~dgl.dataloading.neighbor.MultiLayerFullNeighborSampler` 구현은 다음과 같다.
+
+.. code:: python
+
+    class MultiLayerFullNeighborSampler(dgl.dataloading.BlockSampler):
+        def __init__(self, n_layers):
+            super().__init__(n_layers)
+    
+        def sample_frontier(self, block_id, g, seed_nodes):
+            frontier = dgl.in_subgraph(g, seed_nodes)
+            return frontier
+
+:class:`dgl.dataloading.neighbor.MultiLayerNeighborSampler` 는 더 복잡한 이웃 샘플러로, 각 노들에 대해서 메시지를 수집할 적은 수의 이웃 노드들을 샘플하는 기능을 하는데, 구현은 다음과 같다.
+
+.. code:: python
+
+    class MultiLayerNeighborSampler(dgl.dataloading.BlockSampler):
+        def __init__(self, fanouts):
+            super().__init__(len(fanouts))
+    
+            self.fanouts = fanouts
+    
+        def sample_frontier(self, block_id, g, seed_nodes):
+            fanout = self.fanouts[block_id]
+            if fanout is None:
+                frontier = dgl.in_subgraph(g, seed_nodes)
+            else:
+                frontier = dgl.sampling.sample_neighbors(g, seed_nodes, fanout)
+            return frontier
+
+위의 함수는 프론티어를 생성하지만, 원본 그래프와 같은 노들을 갖는 어떤 그래프도 프론티어로 사용될 수 있다.
+
+예를 들어, 주어진 확률에 따라서 시드 노드들에 연결되는 인바운드 에지를 임의로 삭제하기를 원한다면, 다음과 같이 샘플러를 정의할 수 있다.
+
+.. code:: python
+
+    class MultiLayerDropoutSampler(dgl.dataloading.BlockSampler):
+        def __init__(self, p, num_layers):
+            super().__init__(num_layers)
+    
+            self.p = p
+    
+        def sample_frontier(self, block_id, g, seed_nodes, *args, **kwargs):
+            # Get all inbound edges to `seed_nodes`
+            src, dst = dgl.in_subgraph(g, seed_nodes).all_edges()
+            # Randomly select edges with a probability of p
+            mask = torch.zeros_like(src).bernoulli_(self.p)
+            src = src[mask]
+            dst = dst[mask]
+            # Return a new graph with the same nodes as the original graph as a
+            # frontier
+            frontier = dgl.graph((src, dst), num_nodes=g.number_of_nodes())
+            return frontier
+    
+        def __len__(self):
+            return self.num_layers
+
+샘플러를 직접 구현한 다음에는, 그 샘플러를 사용하는 데이터 로더를 생성하고, 예전과 같이 시드 노드들을 iterate하면서 MFG들의 리스트를 만들게 한다.
+
+.. code:: python
+
+    sampler = MultiLayerDropoutSampler(0.5, 2)
+    dataloader = dgl.dataloading.NodeDataLoader(
+        g, train_nids, sampler,
+        batch_size=1024,
+        shuffle=True,
+        drop_last=False,
+        num_workers=4)
+    
+    model = StochasticTwoLayerRGCN(in_features, hidden_features, out_features)
+    model = model.cuda()
+    opt = torch.optim.Adam(model.parameters())
+    
+    for input_nodes, blocks in dataloader:
+        blocks = [b.to(torch.device('cuda')) for b in blocks]
+        input_features = blocks[0].srcdata     # returns a dict
+        output_labels = blocks[-1].dstdata     # returns a dict
+        output_predictions = model(blocks, input_features)
+        loss = compute_loss(output_labels, output_predictions)
+        opt.zero_grad()
+        loss.backward()
+        opt.step()
+
+Heterogeneous 그래프들
+^^^^^^^^^^^^^^^^^^^^
+
+Heterogeneous 그래프에 대한 프론티어를 생성하는 것은 homogeneous 그래프의 경우와 동일하다. 리턴된 그래프가 원본 그래프와 같은 노드들을 갖도록 하면, 나머지는 그대로 동작할 것이다. 예를 들어, 위 ``MultiLayerDropoutSampler`` 를 재작성해서 모든 에지 타입들을 iterate 해서, heterogeneous 그래프에도 작동하게 만들 수 있다.
+
+.. code:: python
+
+    class MultiLayerDropoutSampler(dgl.dataloading.BlockSampler):
+        def __init__(self, p, num_layers):
+            super().__init__(num_layers)
+    
+            self.p = p
+    
+        def sample_frontier(self, block_id, g, seed_nodes, *args, **kwargs):
+            # Get all inbound edges to `seed_nodes`
+            sg = dgl.in_subgraph(g, seed_nodes)
+    
+            new_edges_masks = {}
+            # Iterate over all edge types
+            for etype in sg.canonical_etypes:
+                edge_mask = torch.zeros(sg.number_of_edges(etype))
+                edge_mask.bernoulli_(self.p)
+                new_edges_masks[etype] = edge_mask.bool()
+    
+            # Return a new graph with the same nodes as the original graph as a
+            # frontier
+            frontier = dgl.edge_subgraph(new_edges_masks, relabel_nodes=False)
+            return frontier
+    
+        def __len__(self):
+            return self.num_layers
--- a/docs/source/guide_ko/minibatch-edge.rst
+++ b/docs/source/guide_ko/minibatch-edge.rst
+.. _guide_ko-minibatch-edge-classification-sampler:
+
+6.2 이웃 샘플링을 사용한 에지 분류 GNN 모델 학습하기
+-----------------------------------------
+
+:ref:`(English Version) <guide-minibatch-edge-classification-sampler>`
+
+에지 분류/리그레션 모델을 학습하는 것은 몇 가지 눈에 띄는 차이점이 있지만 노드 분류/리그레션과 어느정도 비슷하다.
+
+이웃 샘플러 및 데이터 로더 정의하기
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+:ref:`노드 분류에서 사용한 것과 같은 이웃 샘플러<guide-minibatch-node-classification-sampler>` 를 사용할 수 있다.
+
+.. code:: python
+
+    sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
+
+에지 분류에 DGL이 제공하는 이웃 샘플러를 사용하려면, 미니-배치의 에지들의 집합을 iterate 하는 :class:`~dgl.dataloading.pytorch.EdgeDataLoader` 와 함께 사용해야한다. 이것은 아래 모듈에서 사용될 에지 미니-배치로부터 만들어질 서브 그래프와 *message flow graph* (MFG)들을 리턴한다.
+
+다음 코드 예제는 PyTorch DataLoader를 만든다. 이는 베치들에 있는 학습 에지 ID 배열 :math:`train_eids`들을 iterate 하고, 생성된 MFG들의 리스트를 GPU로 옮겨놓는다.
+
+.. code:: python
+
+    dataloader = dgl.dataloading.EdgeDataLoader(
+        g, train_eid_dict, sampler,
+        batch_size=1024,
+        shuffle=True,
+        drop_last=False,
+        num_workers=4)
+
+.. note::
+
+   Message flow graph의 개념은 :doc:`Stochastic Training Tutorial <tutorials/large/L0_neighbor_sampling_overview>` 를 참고하자.
+
+   빌트인으로 지원되는 샘플러들에 대한 전체 목록은 :ref:`neighborhood sampler API reference <api-dataloading-neighbor-sampling>` 에 있다.
+
+   :ref:`guide-minibatch-customizing-neighborhood-sampler` 에는 여러분만의 이웃 샘플러 만드는 방법과 MFG 개념에 대한 보다 상세한 설명을 담고 있다.
+
+이웃 샘플링을 위해서 원본 그래프에서 미니 배치의 에지들 제거하기
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+에지 분류 모델을 학습할 때, 때로는 computation dependency에서 학습 데이터에 있는 에지들을 존재하지 않았던 것처럼 만들기 위해 제거하는 것이 필요하다. 그렇지 않으면, 모델은 두 노드들 사이에 에지가 존재한다는 사실을 *인지*할 것이고, 이 정보를 학습에 잠재적으로 이용할 수 있기 때문이다.
+
+따라서, 에지 분류의 경우 때로는 이웃 샘플링은 미니-배치안에 샘플된 에지들 및 undirected 그래프인 경우 샘플된 에지의 역방향 에지들도 원본 그래프에서 삭제하기도 한다. :class:`~dgl.dataloading.pytorch.EdgeDataLoader` 객체를 만들 때, ``exclude='reverse_id'`` 를 에지 ID와 그와 연관된 reverse 에지 ID들의 매핑 정보와 함께 지정할 수 있다.
+
+.. code:: python
+
+    n_edges = g.number_of_edges()
+    dataloader = dgl.dataloading.EdgeDataLoader(
+        g, train_eid_dict, sampler,
+    
+        # The following two arguments are specifically for excluding the minibatch
+        # edges and their reverse edges from the original graph for neighborhood
+        # sampling.
+        exclude='reverse_id',
+        reverse_eids=torch.cat([
+            torch.arange(n_edges // 2, n_edges), torch.arange(0, n_edges // 2)]),
+    
+        batch_size=1024,
+        shuffle=True,
+        drop_last=False,
+        num_workers=4)
+
+모델을 미니-배치 학습에 맞게 만들기
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+에지 분류 모델은 보통은 다음과 같이 두 부분으로 구성된다:
+
+- 첫번째는 부속 노드(incident node)들의 representation을 얻는 부분
+- 두번째는 부속 노드의 representation들로부터 에지 점수를 계산하는 부분
+
+첫번째 부분은 :ref:`노드 분류<guide-minibatch-node-classification-model>` 와 완전히 동일하기에, 단순하게 이를 재사용할 수 있다. 입력 DGL에서 제공하는 데이터 로더가 만들어 낸 MFG들의 리스트와 입력 피쳐들이 된다.
+
+.. code:: python
+
+    class StochasticTwoLayerGCN(nn.Module):
+        def __init__(self, in_features, hidden_features, out_features):
+            super().__init__()
+            self.conv1 = dglnn.GraphConv(in_features, hidden_features)
+            self.conv2 = dglnn.GraphConv(hidden_features, out_features)
+    
+        def forward(self, blocks, x):
+            x = F.relu(self.conv1(blocks[0], x))
+            x = F.relu(self.conv2(blocks[1], x))
+            return x
+
+두번째 부분에 대한 입력은 보통은 이전 부분의 출력과 미니배치의 에지들에 의해서 유도된 원본 그래프의 서브 그래프가 된다. 서브 그래프는 같은 데이터 로더에서 리턴된다. :meth:`dgl.DGLHeteroGraph.apply_edges` 를 사용해서 에지 서브 그래프를 사용해서 에지들의 점수를 계산한다.
+
+다음 코드는 부속 노드 피처들을 연결하고, 이를 dense 레이어에 입력해서 얻은 결과로 에지들의 점수를 예측하는 예를 보여준다.
+
+.. code:: python
+
+    class ScorePredictor(nn.Module):
+        def __init__(self, num_classes, in_features):
+            super().__init__()
+            self.W = nn.Linear(2 * in_features, num_classes)
+    
+        def apply_edges(self, edges):
+            data = torch.cat([edges.src['x'], edges.dst['x']])
+            return {'score': self.W(data)}
+    
+        def forward(self, edge_subgraph, x):
+            with edge_subgraph.local_scope():
+                edge_subgraph.ndata['x'] = x
+                edge_subgraph.apply_edges(self.apply_edges)
+                return edge_subgraph.edata['score']
+
+전체 모델은 아래와 같이 데이터 로더로부터 얻은 MFG들의 리스트와 에지 서브 그래프, 그리고 입력 노드 피쳐들을 사용한다.
+
+.. code:: python
+
+    class Model(nn.Module):
+        def __init__(self, in_features, hidden_features, out_features, num_classes):
+            super().__init__()
+            self.gcn = StochasticTwoLayerGCN(
+                in_features, hidden_features, out_features)
+            self.predictor = ScorePredictor(num_classes, out_features)
+    
+        def forward(self, edge_subgraph, blocks, x):
+            x = self.gcn(blocks, x)
+            return self.predictor(edge_subgraph, x)
+
+DGL에서는 에지 서브 그래프의 노드들이 MFG들의 리스트에서 마지막 MFG의 출력 노드들과 동일하도록 확인한다.
+
+학습 룹
+~~~~~
+
+학습 룹은 노드 분류의 학습 룹과 비슷하다. 데이터 로더를 iterate해서, 미니배치의 에지들에 의해서 유도된 서브 그래프와 에지들의 부속 노드(incident node)들의 representation들을 계산하기 위한 MFG들의 목록을 얻는다.
+
+.. code:: python
+
+    model = Model(in_features, hidden_features, out_features, num_classes)
+    model = model.cuda()
+    opt = torch.optim.Adam(model.parameters())
+    
+    for input_nodes, edge_subgraph, blocks in dataloader:
+        blocks = [b.to(torch.device('cuda')) for b in blocks]
+        edge_subgraph = edge_subgraph.to(torch.device('cuda'))
+        input_features = blocks[0].srcdata['features']
+        edge_labels = edge_subgraph.edata['labels']
+        edge_predictions = model(edge_subgraph, blocks, input_features)
+        loss = compute_loss(edge_labels, edge_predictions)
+        opt.zero_grad()
+        loss.backward()
+        opt.step()
+
+Heterogeneous 그래프의 경우
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Heterogeneous 그래프들의 노드 representation들을 계산하는 모델은 에지 분류/리그레션을 위한 부속 노드 representation들을 구하는데 사용될 수 있다.
+
+.. code:: python
+
+    class StochasticTwoLayerRGCN(nn.Module):
+        def __init__(self, in_feat, hidden_feat, out_feat, rel_names):
+            super().__init__()
+            self.conv1 = dglnn.HeteroGraphConv({
+                    rel : dglnn.GraphConv(in_feat, hidden_feat, norm='right')
+                    for rel in rel_names
+                })
+            self.conv2 = dglnn.HeteroGraphConv({
+                    rel : dglnn.GraphConv(hidden_feat, out_feat, norm='right')
+                    for rel in rel_names
+                })
+    
+        def forward(self, blocks, x):
+            x = self.conv1(blocks[0], x)
+            x = self.conv2(blocks[1], x)
+            return x
+
+점수를 예측하기 위한 homogeneous 그래프와 heterogeneous 그래프간의 유일한 구현상의 차이점은 :meth:`~dgl.DGLHeteroGraph.apply_edges` 를 호출할 때 에지 타입들을 사용한다는 점이다.
+
+.. code:: python
+
+    class ScorePredictor(nn.Module):
+        def __init__(self, num_classes, in_features):
+            super().__init__()
+            self.W = nn.Linear(2 * in_features, num_classes)
+    
+        def apply_edges(self, edges):
+            data = torch.cat([edges.src['x'], edges.dst['x']])
+            return {'score': self.W(data)}
+    
+        def forward(self, edge_subgraph, x):
+            with edge_subgraph.local_scope():
+                edge_subgraph.ndata['x'] = x
+                for etype in edge_subgraph.canonical_etypes:
+                    edge_subgraph.apply_edges(self.apply_edges, etype=etype)
+                return edge_subgraph.edata['score']
+
+    class Model(nn.Module):
+        def __init__(self, in_features, hidden_features, out_features, num_classes,
+                     etypes):
+            super().__init__()
+            self.rgcn = StochasticTwoLayerRGCN(
+                in_features, hidden_features, out_features, etypes)
+            self.pred = ScorePredictor(num_classes, out_features)
+
+        def forward(self, edge_subgraph, blocks, x):
+            x = self.rgcn(blocks, x)
+            return self.pred(edge_subgraph, x)
+
+데이터 로더 구현도 노드 분류을 위한 것과 아주 비슷하다. 유일한 차이점은 :class:`~dgl.dataloading.pytorch.NodeDataLoader` 대신에 :class:`~dgl.dataloading.pytorch.EdgeDataLoader` 를 사용하고, 노드 타입과 노드 ID 텐서들의 사전 대신에 에지 타입과 에지 ID 텐서들의 사전을 사용한다는 것이다.
+
+.. code:: python
+
+    sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
+    dataloader = dgl.dataloading.EdgeDataLoader(
+        g, train_eid_dict, sampler,
+        batch_size=1024,
+        shuffle=True,
+        drop_last=False,
+        num_workers=4)
+
+만약 heterogeneous 그래프에서 역방향의 에지를 배제하고자 한다면 약간 달라진다. Heterogeneous 그래프에서 역방향 에지들은 에지와는 다른 에지 타입을 갖는 것이 보통이다. 이는 “forward”와 “backward” 관계들을 구분직기 위해서이다. (즉, ``follow`` 와 ``followed by`` 는 서로 역 관계이고, ``purchase`` 와 ``purchased by`` 는 서로 역 관계인 것 처럼)
+
+만약 어떤 타입의 에지들이 다른 타입의 같은 ID를 갖는 역방향 에지를 갖는다면, 에지 타입들과 
+그것들의 반대 타입간의 매핑을 명시할 수 있다. 미니배치에서 에지들과 그것들의 역방향 에지를 배제하는 것은
+다음과 같다.
+
+.. code:: python
+
+    dataloader = dgl.dataloading.EdgeDataLoader(
+        g, train_eid_dict, sampler,
+    
+        # The following two arguments are specifically for excluding the minibatch
+        # edges and their reverse edges from the original graph for neighborhood
+        # sampling.
+        exclude='reverse_types',
+        reverse_etypes={'follow': 'followed by', 'followed by': 'follow',
+                        'purchase': 'purchased by', 'purchased by': 'purchase'}
+    
+        batch_size=1024,
+        shuffle=True,
+        drop_last=False,
+        num_workers=4)
+
+학습 룹은 ``compute_loss`` 의 구현이 노드 타입들과 예측 값에 대한 두 사전들을 인자로 받는다는 점을 제외하면,
+homogeneous 그래프의 학습 룹 구현과 거의 같다.
+
+.. code:: python
+
+    model = Model(in_features, hidden_features, out_features, num_classes, etypes)
+    model = model.cuda()
+    opt = torch.optim.Adam(model.parameters())
+    
+    for input_nodes, edge_subgraph, blocks in dataloader:
+        blocks = [b.to(torch.device('cuda')) for b in blocks]
+        edge_subgraph = edge_subgraph.to(torch.device('cuda'))
+        input_features = blocks[0].srcdata['features']
+        edge_labels = edge_subgraph.edata['labels']
+        edge_predictions = model(edge_subgraph, blocks, input_features)
+        loss = compute_loss(edge_labels, edge_predictions)
+        opt.zero_grad()
+        loss.backward()
+        opt.step()
+
+`GCMC <https://github.com/dmlc/dgl/tree/master/examples/pytorch/gcmc>`__ 은 이분 그래프(bipartite graph)에 대한 에지 분류 예제이다.
+
--- a/docs/source/guide_ko/minibatch-gpu-sampling.rst
+++ b/docs/source/guide_ko/minibatch-gpu-sampling.rst
+.. _guide_ko-minibatch-gpu-sampling:
+
+6.7 이웃 샘플링에 GPU 사용하기
+------------------------
+
+DGL 0.7부터 GPU 기반의 이웃 샘플링을 지원하는데, 이는 CPU 기반의 이웃 샘플링에 비해서 상당한 속도 향상을 가져다 준다. 만약 다루는 그래프와 피쳐들이 GPU에 들어갈 수 있는 크기이고, 모델이 너무 많은 GPU 메모리를 차지하지 않는다면, GPU 메모리에 올려서 GPU 기반의 이웃 샘플링을 하는 것이 최선의 방법이다.
+
+예를 들어, `OGB Products <https://ogb.stanford.edu/docs/nodeprop/#ogbn-products>`__ 는 2.4M 노드들과 61M 에지들을 갖고, 각 노드는 100 차원의 피쳐를 갖는다. 노트 피쳐들을 모두 합해서 1GB 미만의 메모리를 차지하고, 그래프는 약 1GB 보다 적은 메모리를 사용한다. 그래프의 메모리 요구량은 에지의 개수에 관련이 있다. 따라서, 전체 그래프를 GPU에 로딩하는 것이 가능하다.
+
+.. note::
+
+   이 기능은 실험적인 것으로 개발이 진행 중이다. 추가 업데이트를 지켜보자.
+
+DGL 데이터 로더에서 GPU 기반의 이웃 샘플링 사용하기
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+DGL 데이터 로더에서 GPU 기반의 이웃 샘플링은 다음 방법으로 사용할 수 있다.
+
+* 그래프를 GPU에 넣기
+* ``num_workers`` 인자를 0으로 설정하기. CUDA는 같은 context를 사용하는 멀티 프로세스를 지원하지 않기 때문이다.
+* ``device`` 인자를 GPU 디바이스로 설정하기
+
+:class:`~dgl.dataloading.pytorch.NodeDataLoader` 의 다른 모든 인자들은 다른 가이드와 튜토리얼에서 사용한 것돠 같다.
+
+.. code:: python
+
+   g = g.to('cuda:0')
+   dataloader = dgl.dataloading.NodeDataLoader(
+       g,                                # The graph must be on GPU.
+       train_nid,
+       sampler,
+       device=torch.device('cuda:0'),    # The device argument must be GPU.
+       num_workers=0,                    # Number of workers must be 0.
+       batch_size=1000,
+       drop_last=False,
+       shuffle=True)
+
+GPU 기반의 이웃 샘플링은 커스텀 이웃 샘플러가 두가지 조건을 충족하면 동작한다. (1) 커스텀 샘플러가 :class:`~dgl.dataloading.BlockSampler` 의 서브 클래스이고, (2) 샘플러가 GPU에서 완전하게 동작한다.
+
+.. note::
+
+   현재는 :class:`~dgl.dataloading.pytorch.EdgeDataLoader` 와 heterogeneous 그래프는 지원하지 않는다.
+
+GPU 기반의 이웃 샘플러를 DGL 함수와 함께 사용하기
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+다음 함수들은 GPU에서 작동을 지원한다.
+
+* :func:`dgl.sampling.sample_neighbors`
+
+  * 균일 샘플링(uniform sampling)만 지원함. non-uniform샘플링은 CPU에서만 동작함.
+
+위 함수들 이외의 GPU에서 동작하는 함수들은 :func:`dgl.to_block` 를 참고하자.
\ No newline at end of file
--- a/docs/source/guide_ko/minibatch-inference.rst
+++ b/docs/source/guide_ko/minibatch-inference.rst
+.. _guide_ko-minibatch-inference:
+
+6.6 큰 그래프들에 대핸 정확한 오프라인 추론
+---------------------------------
+
+:ref:`(English Version) <guide-minibatch-inference>`
+
+GPU를 사용해서 GNN을 학습하는데 메모리와 걸리는 시간을 줄이기 위해서 서브 샘플링과 이웃 샘플링이 모두 사용된다. 추론을 수행할 때 보통은 샘플링으로 발생할 수 있는 임의성을 제거하기 위해서 전체 이웃들에 대해서 aggretate하는 것이 더 좋다. 하지만, GPU 메모리 제약이나, CPU의 느린 속도 때문에 전체 그래프에 대한 forward propagagtion을 수행하는 것은 쉽지 않다. 이 절은 미니배치와 이웃 샘플링을 통해서 제한적인 GPU를 사용한 전체 그래프 forward propagation의 방법을 소개한다.
+
+추론 알고리즘은 학습 알고리즘과는 다른데, 추론 알고리즘은 첫번째 레이어부터 시작해서 각 레이이별로 모든 노드의 representation들을 계산해야하기 때문이다. 특히, 특정 레이어의 경우에 우리는 미니배치의 모든 노드들에 대해서 이 레이어의 출력 representation을 계산해야한다. 그 결과, 추론 알고리즘은 모든 레이어들 iterate하는 outer 룹과 노들들의 미니배치를 iterate하는 inner 룹을 갖는다. 반면, 학습 알고리즘은 노드들의 미니배치를 iterate하는 outer 룹과, 이웃 샘플링과 메시지 전달을 위한 레이어들을 iterate하는 inner 룹을 갖는다.
+
+아래 애니매이션은 이 연산이 어떻게 일어나는지를 보여주고 있다 (각 레이어에 대해서 첫 3개의 미니배치만 표현되고 있음을 주의하자)
+
+.. figure:: https://data.dgl.ai/asset/image/guide_6_6_0.gif
+   :alt: Imgur
+
+오프라인 추론 구현하기
+~~~~~~~~~~~~~~~~
+
+6.1 :ref:`guide-minibatch-node-classification-model` 에서 다룬 2-레이어 GCN을 생각해 보자. 오프라인 추론을 구현하는 방법은 여전히 :class:`~dgl.dataloading.neighbor.MultiLayerFullNeighborSampler` 를 사용하지만, 한번에 하나의 레이어에 대한 샘플링을 수행한다. 하나의 레이어에 대한 계산은 메시지들어 어떻게 aggregate되고 합쳐지는지에 의존하기 때문에 오프라인 추론은 GNN 모듈의 메소드로 구현된다는 점을 주목하자.
+
+.. code:: python
+
+    class StochasticTwoLayerGCN(nn.Module):
+        def __init__(self, in_features, hidden_features, out_features):
+            super().__init__()
+            self.hidden_features = hidden_features
+            self.out_features = out_features
+            self.conv1 = dgl.nn.GraphConv(in_features, hidden_features)
+            self.conv2 = dgl.nn.GraphConv(hidden_features, out_features)
+            self.n_layers = 2
+    
+        def forward(self, blocks, x):
+            x_dst = x[:blocks[0].number_of_dst_nodes()]
+            x = F.relu(self.conv1(blocks[0], (x, x_dst)))
+            x_dst = x[:blocks[1].number_of_dst_nodes()]
+            x = F.relu(self.conv2(blocks[1], (x, x_dst)))
+            return x
+    
+        def inference(self, g, x, batch_size, device):
+            """
+            Offline inference with this module
+            """
+            # Compute representations layer by layer
+            for l, layer in enumerate([self.conv1, self.conv2]):
+                y = torch.zeros(g.number_of_nodes(),
+                                self.hidden_features
+                                if l != self.n_layers - 1
+                                else self.out_features)
+                sampler = dgl.dataloading.MultiLayerFullNeighborSampler(1)
+                dataloader = dgl.dataloading.NodeDataLoader(
+                    g, torch.arange(g.number_of_nodes()), sampler,
+                    batch_size=batch_size,
+                    shuffle=True,
+                    drop_last=False)
+                
+                # Within a layer, iterate over nodes in batches
+                for input_nodes, output_nodes, blocks in dataloader:
+                    block = blocks[0]
+    
+                    # Copy the features of necessary input nodes to GPU
+                    h = x[input_nodes].to(device)
+                    # Compute output.  Note that this computation is the same
+                    # but only for a single layer.
+                    h_dst = h[:block.number_of_dst_nodes()]
+                    h = F.relu(layer(block, (h, h_dst)))
+                    # Copy to output back to CPU.
+                    y[output_nodes] = h.cpu()
+
+                x = y
+    
+            return y
+
+모델 선택을 위해서 검증 데이터셋에 평가 metric을 계산하는 목적으로 정확한 오프라인 추론을 계산할 필요가 없다는 점을 주목하자. 모든 레이어에 대해서 모든 노드들의 representation을 계산하는 것이 필요한데, 이것은 레이블이 없는 데이터가 많은 semi-supervised 영역에서는 아주 많은 리소스를 필요로하기 때문이다. 이웃 샘플링은 모델 선택 및 평가 목적으로는 충분하다.
+
+오프라인 추론의 예들로 `GraphSAGE <https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/train_sampling.py>`__ 및 
+`RGCN <https://github.com/dmlc/dgl/blob/master/examples/pytorch/rgcn-hetero/entity_classify_mb.py>`__ 를 참고하자.
--- a/docs/source/guide_ko/minibatch-link.rst
+++ b/docs/source/guide_ko/minibatch-link.rst
+.. _guide_ko-minibatch-link-classification-sampler:
+
+6.3 이웃 샘플링을 사용한 링크 예측 GNN 모델 학습하기
+-----------------------------------------
+
+:ref:`(English Version) <guide-minibatch-link-classification-sampler>`
+
+Negative 샘플링을 사용한 이웃 샘플러 및 데이터 로더 정의하기
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+노드/에지 분류에서 사용한 이웃 샘플러를 그대로 사용하는 것이 가능하다.
+
+.. code:: python
+
+    sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
+
+DGL의 :class:`~dgl.dataloading.pytorch.EdgeDataLoader` 는 링크 예측를 위한 negative 샘플 생성을
+지원한다. 이를 사용하기 위해서는, negative 샘플링 함수를 제공해야한다. :class:`~dgl.dataloading.negative_sampler.Uniform` 은 uniform 샘플링을 해주는 함수이다. 에지의 각 소스 노드에 대해서,이 함수는 ``k`` 개의 negative 목적지 노드들을 샘플링한다.
+
+아래 코드는 에지의 각 소스 노드에 대해서 5개의 negative 목적지 노드를 균등하게 선택한다.
+
+.. code:: python
+
+    dataloader = dgl.dataloading.EdgeDataLoader(
+        g, train_seeds, sampler,
+        negative_sampler=dgl.dataloading.negative_sampler.Uniform(5),
+        batch_size=args.batch_size,
+        shuffle=True,
+        drop_last=False,
+        pin_memory=True,
+        num_workers=args.num_workers)
+
+빌드인 negative 샘플러들은 :ref:`api-dataloading-negative-sampling` 에서 확인하자.
+
+직접 만든 negative 샘플러 함수를 사용할 수도 있다. 이 함수는 원본 그래프 ``g`` 와, 미니배치 에지 ID 배열 ``eid`` 를 받아서
+소스 ID 배열과 목적지 ID 배열의 쌍을 리턴해야 한다.
+
+아래 코드 예제는 degree의 거듭제곱에 비례하는 확률 분포에 따라서 negative 목적지 노드들을 샘플링하는 custom negative 샘플러다.
+
+.. code:: python
+
+    class NegativeSampler(object):
+        def __init__(self, g, k):
+            # caches the probability distribution
+            self.weights = g.in_degrees().float() ** 0.75
+            self.k = k
+    
+        def __call__(self, g, eids):
+            src, _ = g.find_edges(eids)
+            src = src.repeat_interleave(self.k)
+            dst = self.weights.multinomial(len(src), replacement=True)
+            return src, dst
+    
+    dataloader = dgl.dataloading.EdgeDataLoader(
+        g, train_seeds, sampler,
+        negative_sampler=NegativeSampler(g, 5),
+        batch_size=args.batch_size,
+        shuffle=True,
+        drop_last=False,
+        pin_memory=True,
+        num_workers=args.num_workers)
+
+모델을 미니-배치 학습에 맞게 만들기
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+:ref:`guide-training-link-prediction` 에서 설명한 것처럼, 링크 예측은 (postive 예제인) 에지의 점수와 존재하지 않는 에지(즉, negative 예제)의 점수를 비교하는 것을 통해서 학습될 수 있다. 에지들의 점수를 계산하기 위해서, 에지 분류/리그레션에서 사용했던 노드 representation 계산 모델을 재사용한다.
+
+.. code:: python
+
+    class StochasticTwoLayerGCN(nn.Module):
+        def __init__(self, in_features, hidden_features, out_features):
+            super().__init__()
+            self.conv1 = dgl.nn.GraphConv(in_features, hidden_features)
+            self.conv2 = dgl.nn.GraphConv(hidden_features, out_features)
+    
+        def forward(self, blocks, x):
+            x = F.relu(self.conv1(blocks[0], x))
+            x = F.relu(self.conv2(blocks[1], x))
+            return x
+
+점수 예측을 위해서 확률 분포 대신 각 에지의 scalar 점수를 예측하기만 하면되기 때문에, 이 예제는 부속 노드 representation들의 dot product로 점수를 계산하는 방법을 사용한다.
+
+.. code:: python
+
+    class ScorePredictor(nn.Module):
+        def forward(self, edge_subgraph, x):
+            with edge_subgraph.local_scope():
+                edge_subgraph.ndata['x'] = x
+                edge_subgraph.apply_edges(dgl.function.u_dot_v('x', 'x', 'score'))
+                return edge_subgraph.edata['score']
+
+Negative 샘플러가 지정되면, DGL의 데이터 로더는 미니배치 마다 다음 3가지 아이템들을 만들어낸다.
+
+- 샘플된 미니배치에 있는 모든 에지를 포함한 postive 그래프
+- Negative 샘플러가 생성한 존재하지 않는 에지 모두를 포함한 negative 그래프
+- 이웃 샘플러가 생성한 *message flow graph* (MFG)들의 리스트
+
+이제 3가지 아이템와 입력 피쳐들을 받는 링크 예측 모델을 다음과 같이 정의할 수 있다.
+
+.. code:: python
+
+    class Model(nn.Module):
+        def __init__(self, in_features, hidden_features, out_features):
+            super().__init__()
+            self.gcn = StochasticTwoLayerGCN(
+                in_features, hidden_features, out_features)
+    
+        def forward(self, positive_graph, negative_graph, blocks, x):
+            x = self.gcn(blocks, x)
+            pos_score = self.predictor(positive_graph, x)
+            neg_score = self.predictor(negative_graph, x)
+            return pos_score, neg_score
+
+학습 룹
+~~~~~
+
+학습 룹은 데이터 로더를 iterate하고, 그래프들과 입력 피쳐들을 위해서 정의한 모델에 입력하는 것일 뿐이다.
+
+.. code:: python
+
+    def compute_loss(pos_score, neg_score):
+        # an example hinge loss
+        n = pos_score.shape[0]
+        return (neg_score.view(n, -1) - pos_score.view(n, -1) + 1).clamp(min=0).mean()
+
+    model = Model(in_features, hidden_features, out_features)
+    model = model.cuda()
+    opt = torch.optim.Adam(model.parameters())
+    
+    for input_nodes, positive_graph, negative_graph, blocks in dataloader:
+        blocks = [b.to(torch.device('cuda')) for b in blocks]
+        positive_graph = positive_graph.to(torch.device('cuda'))
+        negative_graph = negative_graph.to(torch.device('cuda'))
+        input_features = blocks[0].srcdata['features']
+        pos_score, neg_score = model(positive_graph, negative_graph, blocks, input_features)
+        loss = compute_loss(pos_score, neg_score)
+        opt.zero_grad()
+        loss.backward()
+        opt.step()
+
+DGL에서는 homogeneous 그래프들에 대한 링크 예측의 예제로 `unsupervised learning GraphSAGE <https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/train_sampling_unsupervised.py>`__ 를 제공한다.
+
+Heterogeneous 그래프의 경우
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Heterogeneous 그래프들의 노드 representation들을 계산하는 모델은 에지 분류/리그레션을 위한 부속 노드
+representation들을 구하는데 사용될 수 있다.
+
+.. code:: python
+
+    class StochasticTwoLayerRGCN(nn.Module):
+        def __init__(self, in_feat, hidden_feat, out_feat, rel_names):
+            super().__init__()
+            self.conv1 = dglnn.HeteroGraphConv({
+                    rel : dglnn.GraphConv(in_feat, hidden_feat, norm='right')
+                    for rel in rel_names
+                })
+            self.conv2 = dglnn.HeteroGraphConv({
+                    rel : dglnn.GraphConv(hidden_feat, out_feat, norm='right')
+                    for rel in rel_names
+                })
+    
+        def forward(self, blocks, x):
+            x = self.conv1(blocks[0], x)
+            x = self.conv2(blocks[1], x)
+            return x
+
+점수를 예측하기 위한 homogeneous 그래프와 heterogeneous 그래프간의 유일한 구현상의 차이점은
+:meth:`dgl.DGLHeteroGraph.apply_edges` 를 호출할 때 에지 타입들을 사용한다는 점이다.
+
+.. code:: python
+
+    class ScorePredictor(nn.Module):
+        def forward(self, edge_subgraph, x):
+            with edge_subgraph.local_scope():
+                edge_subgraph.ndata['x'] = x
+                for etype in edge_subgraph.canonical_etypes:
+                    edge_subgraph.apply_edges(
+                        dgl.function.u_dot_v('x', 'x', 'score'), etype=etype)
+                return edge_subgraph.edata['score']
+
+    class Model(nn.Module):
+        def __init__(self, in_features, hidden_features, out_features, num_classes,
+                     etypes):
+            super().__init__()
+            self.rgcn = StochasticTwoLayerRGCN(
+                in_features, hidden_features, out_features, etypes)
+            self.pred = ScorePredictor()
+
+        def forward(self, positive_graph, negative_graph, blocks, x):
+            x = self.rgcn(blocks, x)
+            pos_score = self.pred(positive_graph, x)
+            neg_score = self.pred(negative_graph, x)
+            return pos_score, neg_score
+
+데이터 로더 구현도 노드 분류을 위한 것과 아주 비슷하다. 유일한 차이점은 negative 샘플러를 사용하며, 노드 타입과 노드 ID 텐서들의 사전 대신에 에지 타입과 에지 ID 텐서들의 사전을 사용한다는 것이다.
+
+.. code:: python
+
+    sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
+    dataloader = dgl.dataloading.EdgeDataLoader(
+        g, train_eid_dict, sampler,
+        negative_sampler=dgl.dataloading.negative_sampler.Uniform(5),
+        batch_size=1024,
+        shuffle=True,
+        drop_last=False,
+        num_workers=4)
+
+만약 직접 만든 negative 샘플링 함수를 사용하기를 원한다면, 그 함수는 원본 그래프, 에지 타입과 에지 ID 텐서들의 dictionary를 인자로 받아야하고, 에지 타입들과 소스-목적지 배열 쌍의 dictionary를 리턴해야한다. 다음은 예제 함수이다.
+
+.. code:: python
+
+   class NegativeSampler(object):
+       def __init__(self, g, k):
+           # caches the probability distribution
+           self.weights = {
+               etype: g.in_degrees(etype=etype).float() ** 0.75
+               for etype in g.canonical_etypes}
+           self.k = k
+
+       def __call__(self, g, eids_dict):
+           result_dict = {}
+           for etype, eids in eids_dict.items():
+               src, _ = g.find_edges(eids, etype=etype)
+               src = src.repeat_interleave(self.k)
+               dst = self.weights[etype].multinomial(len(src), replacement=True)
+               result_dict[etype] = (src, dst)
+           return result_dict
+
+다음으로는 에지 타입들와 에지 ID들의 dictionary와 negative 샘플러를 데이터 로더에 전달한다. 예를 들면, 아래 코드는 heterogeneous 그래프의 모든 에지들을 iterate하는 예이다.
+
+.. code:: python
+
+    train_eid_dict = {
+        etype: g.edges(etype=etype, form='eid')
+        for etype in g.canonical_etypes}
+
+    dataloader = dgl.dataloading.EdgeDataLoader(
+        g, train_eid_dict, sampler,
+        negative_sampler=NegativeSampler(g, 5),
+        batch_size=1024,
+        shuffle=True,
+        drop_last=False,
+        num_workers=4)
+
+학습 룹은 ``compute_loss`` 의 구현이 노드 타입들과 예측 값에 대한 두 사전들을 인자로 받는다는 점을 제외하면, homogeneous 그래프의 학습 룹 구현과 거의 같다.
+
+.. code:: python
+
+    model = Model(in_features, hidden_features, out_features, num_classes, etypes)
+    model = model.cuda()
+    opt = torch.optim.Adam(model.parameters())
+    
+    for input_nodes, positive_graph, negative_graph, blocks in dataloader:
+        blocks = [b.to(torch.device('cuda')) for b in blocks]
+        positive_graph = positive_graph.to(torch.device('cuda'))
+        negative_graph = negative_graph.to(torch.device('cuda'))
+        input_features = blocks[0].srcdata['features']
+        pos_score, neg_score = model(positive_graph, negative_graph, blocks, input_features)
+        loss = compute_loss(pos_score, neg_score)
+        opt.zero_grad()
+        loss.backward()
+        opt.step()
+
+
+
--- a/docs/source/guide_ko/minibatch-nn.rst
+++ b/docs/source/guide_ko/minibatch-nn.rst
+.. _guide_ko-minibatch-custom-gnn-module:
+
+6.5 미니-배치 학습을 위한 커스텀 GNN 모듈 구현하기
+----------------------------------------
+
+:ref:`(English Version) <guide-minibatch-custom-gnn-module>`
+
+Homogeneous 그래프나 heterogeneous 그래프를 대상으로 전체 그래프를 업데이트하는 커스텀 GNN 모듈을 만드는 것에 익숙하다면, MFG에 대한 연산을 구현하는 코드도 비슷하다는 것을 알 수 있다. 차이점은 노드들이 입력 노드와 출력 노드로 나뉜다는 것 뿐이다.
+
+커스텀 graph convolution 모듈을 예로 들자. 이 코드는 단지 커스텀 GNN 모듈이 어떻게 동작하는지 보여주기 위함이지, 가장 효율적인 구현이 아님을 주의하자. 
+
+.. code:: python
+
+    class CustomGraphConv(nn.Module):
+        def __init__(self, in_feats, out_feats):
+            super().__init__()
+            self.W = nn.Linear(in_feats * 2, out_feats)
+    
+        def forward(self, g, h):
+            with g.local_scope():
+                g.ndata['h'] = h
+                g.update_all(fn.copy_u('h', 'm'), fn.mean('m', 'h_neigh'))
+                return self.W(torch.cat([g.ndata['h'], g.ndata['h_neigh']], 1))
+
+전체 그래프에 대한 커스텀 메시지 전달 NN 모듈이 있고, 이를 MFG에서 작동하도록 만들고 싶다면, 다음과 같이 forward 함수를 다시 작성하는 것만이 필요하다. 전체 그래프에 대한 구현은 주석 처리를 했으니, 새로운 코드들과 비교해 보자.
+
+.. code:: python
+
+    class CustomGraphConv(nn.Module):
+        def __init__(self, in_feats, out_feats):
+            super().__init__()
+            self.W = nn.Linear(in_feats * 2, out_feats)
+    
+        # h is now a pair of feature tensors for input and output nodes, instead of
+        # a single feature tensor.
+        # def forward(self, g, h):
+        def forward(self, block, h):
+            # with g.local_scope():
+            with block.local_scope():
+                # g.ndata['h'] = h
+                h_src = h
+                h_dst = h[:block.number_of_dst_nodes()]
+                block.srcdata['h'] = h_src
+                block.dstdata['h'] = h_dst
+    
+                # g.update_all(fn.copy_u('h', 'm'), fn.mean('m', 'h_neigh'))
+                block.update_all(fn.copy_u('h', 'm'), fn.mean('m', 'h_neigh'))
+    
+                # return self.W(torch.cat([g.ndata['h'], g.ndata['h_neigh']], 1))
+                return self.W(torch.cat(
+                    [block.dstdata['h'], block.dstdata['h_neigh']], 1))
+
+일반적으로, 직접 구현한 NN 모듈이 MFG에서 동작하게 만들기 위해서는 다음과 같은 것을 해야한다.
+
+- 첫 몇 행들(row)을 잘라서 입력 피쳐들로부터 출력 노드의 피처를 얻는다. 행의 개수는 :meth:`block.number_of_dst_nodes <dgl.DGLHeteroGraph.number_of_dst_nodes>` 로 얻는다.
+- 원본 그래프가 한 하나의 노드 타입을 갖는 경우, :attr:`g.ndata <dgl.DGLHeteroGraph.ndata>` 를 입력 노드의 피쳐의 경우 :attr:`block.srcdata <dgl.DGLHeteroGraph.srcdata>` 로 또는 출력 노드의 피쳐의 경우 :attr:`block.dstdata <dgl.DGLHeteroGraph.dstdata>` 로 교체한다.
+- 원본 그래프가 여러 종류의 노드 타입을 갖는 경우, :attr:`g.nodes <dgl.DGLHeteroGraph.nodes>` 를 입력 노드의 피쳐의 경우 :attr:`block.srcnodes <dgl.DGLHeteroGraph.srcnodes>` 로 또는 출력 노드의 피처의 경우 :attr:`block.dstnodes <dgl.DGLHeteroGraph.dstnodes>` 로 교체한다.
+- :meth:`g.number_of_nodes <dgl.DGLHeteroGraph.number_of_nodes>` 를 입력 노드의 개수는 :meth:`block.number_of_src_nodes <dgl.DGLHeteroGraph.number_of_src_nodes>` 로 출력 노드의 개수는 :meth:`block.number_of_dst_nodes <dgl.DGLHeteroGraph.number_of_dst_nodes>` 로 각각 교체한다.
+
+Heterogeneous 그래프들
+~~~~~~~~~~~~~~~~~~~~
+
+Heterogeneous 그래프의 경우도 커스텀 GNN 모듈을 만드는 것은 비슷하다. 예를 들어, 전체 그래프에 적용되는 다음 모듈을 예로 들어보자.
+
+.. code:: python
+
+    class CustomHeteroGraphConv(nn.Module):
+        def __init__(self, g, in_feats, out_feats):
+            super().__init__()
+            self.Ws = nn.ModuleDict()
+            for etype in g.canonical_etypes:
+                utype, _, vtype = etype
+                self.Ws[etype] = nn.Linear(in_feats[utype], out_feats[vtype])
+            for ntype in g.ntypes:
+                self.Vs[ntype] = nn.Linear(in_feats[ntype], out_feats[ntype])
+    
+        def forward(self, g, h):
+            with g.local_scope():
+                for ntype in g.ntypes:
+                    g.nodes[ntype].data['h_dst'] = self.Vs[ntype](h[ntype])
+                    g.nodes[ntype].data['h_src'] = h[ntype]
+                for etype in g.canonical_etypes:
+                    utype, _, vtype = etype
+                    g.update_all(
+                        fn.copy_u('h_src', 'm'), fn.mean('m', 'h_neigh'),
+                        etype=etype)
+                    g.nodes[vtype].data['h_dst'] = g.nodes[vtype].data['h_dst'] + \
+                        self.Ws[etype](g.nodes[vtype].data['h_neigh'])
+                return {ntype: g.nodes[ntype].data['h_dst'] for ntype in g.ntypes}
+
+``CustomHeteroGraphConv`` 에서의 원칙은 ``g.nodes`` 를 대상 피쳐가 입력 노드의 것인지 출력 노드의 것인지에 따라서 ``g.srcnodes`` 또는 ``g.dstnodes`` 바꾸는 것이다.
+
+.. code:: python
+
+    class CustomHeteroGraphConv(nn.Module):
+        def __init__(self, g, in_feats, out_feats):
+            super().__init__()
+            self.Ws = nn.ModuleDict()
+            for etype in g.canonical_etypes:
+                utype, _, vtype = etype
+                self.Ws[etype] = nn.Linear(in_feats[utype], out_feats[vtype])
+            for ntype in g.ntypes:
+                self.Vs[ntype] = nn.Linear(in_feats[ntype], out_feats[ntype])
+    
+        def forward(self, g, h):
+            with g.local_scope():
+                for ntype in g.ntypes:
+                    h_src, h_dst = h[ntype]
+                    g.dstnodes[ntype].data['h_dst'] = self.Vs[ntype](h[ntype])
+                    g.srcnodes[ntype].data['h_src'] = h[ntype]
+                for etype in g.canonical_etypes:
+                    utype, _, vtype = etype
+                    g.update_all(
+                        fn.copy_u('h_src', 'm'), fn.mean('m', 'h_neigh'),
+                        etype=etype)
+                    g.dstnodes[vtype].data['h_dst'] = \
+                        g.dstnodes[vtype].data['h_dst'] + \
+                        self.Ws[etype](g.dstnodes[vtype].data['h_neigh'])
+                return {ntype: g.dstnodes[ntype].data['h_dst']
+                        for ntype in g.ntypes}
+
+Homogeneous 그래프, 이분 그래프(bipartite graph), 그리고 MFG를 위한 모듈 작성하기
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+DGL의 모든 메시지 전달 모듈들은 homogeneous 그래프, 단방향 이분 그래프 (unidirectional bipartite graphs, 두개 노드 타입을 갖고, 하나의 에지 타입을 갖음), 그리고 하나의 에지 타입을 갖는 MFG에서 동작한다. 기본적으로 DGL 빌트인 뉴럴 네트워크 모듈의 입력 그래프와 피쳐는 아래 경우들 중에 하나를 만족해야 한다.
+
+- 입력 피쳐가 텐서들의 쌍인 경우, 입력 그래프는 단방향 이분(unidirectional bipartite) 그래프이어야 한다.
+- 입력 피쳐가 단일 텐서이고 입력 그래프가 MFG인 경우, DGL은 자동으로 출력 노드의 피쳐를 입력 노드 피처의 첫 몇개의 행으로 정의한다.
+- 입력 피쳐가 단일 텐서이고 입력 그래프가 MGF가 아닌 경우, 입력 그래프는 반드시 homogeneous여야 한다.
+
+다음 코드는 :class:`dgl.nn.pytorch.SAGEConv` 을 PyTorch로 단순하게 구현한 것이다. (MXNet이나 TensorFlow 버전도 제공함. (이 코드는 normalization이 제거되어 있고, mean aggregation만 사용한다.)
+
+.. code:: python
+
+    import dgl.function as fn
+    class SAGEConv(nn.Module):
+        def __init__(self, in_feats, out_feats):
+            super().__init__()
+            self.W = nn.Linear(in_feats * 2, out_feats)
+    
+        def forward(self, g, h):
+            if isinstance(h, tuple):
+                h_src, h_dst = h
+            elif g.is_block:
+                h_src = h
+                h_dst = h[:g.number_of_dst_nodes()]
+            else:
+                h_src = h_dst = h
+                 
+            g.srcdata['h'] = h_src
+            g.dstdata['h'] = h_dst
+            g.update_all(fn.copy_u('h', 'm'), fn.sum('m', 'h_neigh'))
+            return F.relu(
+                self.W(torch.cat([g.dstdata['h'], g.dstdata['h_neigh']], 1)))
+
+:ref:`guide-nn` 은 단방향 이분 그래프, homogeneous 그래프와 MFG에 적용되는 :class:`dgl.nn.pytorch.SAGEConv` 를 자세히 다루고 있다.
+
+
--- a/docs/source/guide_ko/minibatch-node.rst
+++ b/docs/source/guide_ko/minibatch-node.rst
+.. _guide_ko-minibatch-node-classification-sampler:
+
+6.1 이웃 샘플링을 사용한 노드 분류 GNN 모델 학습하기
+-----------------------------------------
+
+:ref:`(English Version) <guide-minibatch-node-classification-sampler>`
+
+Stochastic 학습이 되도록 모델을 만들기 위해서는, 다음과 같은 것이 필요하다.
+
+- 이웃 샘플러 정의하기
+- 미니 배치 학습이 되도록 모델을 변경하기
+- 학습 룹 고치기
+
+이제, 이 단계를 어떻게 구현하는 하나씩 살펴보자.
+
+이웃 샘플러 및 데이터 로더 정의하기
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+DGL는 계산하기를 원하는 노드들에 대해서 각 레이어에서 필요한 computation dependency들을 생성하는 몇 가지 이웃 샘플러 클래스들을 가지고 있다.
+
+가장 단순한 이웃 샘플러는 :class:`~dgl.dataloading.neighbor.MultiLayerFullNeighborSampler` 로, 노드가 그 노드의 모든 이웃들로부터 메시지를 수집하도록 해준다.
+
+DGL의 샘플러를 사용하기 위해서는 이를 미니배치에 있는 노드들의 집한은 iterate하는 :class:`~dgl.dataloading.pytorch.NodeDataLoader` 와 합쳐야한다.
+
+다음 예제 코드는 배치들의 학습 노드 ID 배열 ``train_nids`` 를 iterate하고, 생성된 MFG(Message Flow Graph)들의 목록을 GPU로 옮기는 PyTorch DataLoader를 만든다.
+
+.. code:: python
+
+    import dgl
+    import dgl.nn as dglnn
+    import torch
+    import torch.nn as nn
+    import torch.nn.functional as F
+    
+    sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
+    dataloader = dgl.dataloading.NodeDataLoader(
+        g, train_nids, sampler,
+        batch_size=1024,
+        shuffle=True,
+        drop_last=False,
+        num_workers=4)
+
+DataLoader를 iterate 하면서 각 레이어에 대한 computation dependency들을 대표하도록 특별하게 생성된 그래프들의 리스트를 얻을 수 있다. DGL에서 이것들을 *message flow graph*(MFG)라고 부른다.
+
+.. code:: python
+
+    input_nodes, output_nodes, blocks = next(iter(dataloader))
+    print(blocks)
+
+Iterator는 매번 세개의 아이템을 생성한다. ``input_nodes`` 는 ``output_nodes`` 의 representation을 계산하는데 필요한 노드들을 담고 있다. ``block`` 은 그것의 노드가 출력으로 계산되어야 할 각 GNN 레이어에 대해 어떤 노드 representation들이 입력으로 필요한지, 입력 노드들의 representation들이 출력 노드로 어떻게 전파되어야 하는지를 설명한다.
+
+.. note::
+
+   Message flow graph의 개념은 :doc:`Stochastic Training Tutorial <tutorials/large/L0_neighbor_sampling_overview>` 을 참고하자.
+
+   지원되는 빌드인 샘플러들의 전체 목록은 :ref:`neighborhood sampler API reference <api-dataloading-neighbor-sampling>` 에서 찾아볼 수 있다.
+
+   :ref:`guide-minibatch-customizing-neighborhood-sampler` 에는 여러분만의 이웃 샘플러 만드는 방법과 MFG 개념에 대한 보다 상세한 설명을 담고 있다.
+
+
+.. _guide-minibatch-node-classification-model:
+
+모델을 미니-배치 학습에 맞게 만들기
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+만약 DGL에서 제공하는 메시지 전달 모듈만을 사용하고 있다면, 모델을 미니-배치 학습에 맞도록 수정할 것은 적다. 멀티-레이어 GCN을 예로 들어보자. 그래프 전체에 대한 모델 구현은 아래와 같다.
+
+.. code:: python
+
+    class TwoLayerGCN(nn.Module):
+        def __init__(self, in_features, hidden_features, out_features):
+            super().__init__()
+            self.conv1 = dglnn.GraphConv(in_features, hidden_features)
+            self.conv2 = dglnn.GraphConv(hidden_features, out_features)
+    
+        def forward(self, g, x):
+            x = F.relu(self.conv1(g, x))
+            x = F.relu(self.conv2(g, x))
+            return x
+
+이 때, 변경해야할 것은 ``g`` 를 앞에서 생성된 ``block`` 로 교체하는 것이 전부이다.
+
+.. code:: python
+
+    class StochasticTwoLayerGCN(nn.Module):
+        def __init__(self, in_features, hidden_features, out_features):
+            super().__init__()
+            self.conv1 = dgl.nn.GraphConv(in_features, hidden_features)
+            self.conv2 = dgl.nn.GraphConv(hidden_features, out_features)
+    
+        def forward(self, blocks, x):
+            x = F.relu(self.conv1(blocks[0], x))
+            x = F.relu(self.conv2(blocks[1], x))
+            return x
+
+위 DGL ``GraphConv`` 모듈들은 데이터 로더가 생성한 ``block`` 의 원소를 argument로 받는다.
+
+:ref:`The API reference of each NN module <apinn>` 는 모듈이 MFG를 argument로 받을 수 있는지 없는지를 알려주고 있다.
+
+만약 여러분 자신의 메시지 전달 모듈을 사용하고 싶다면, :ref:`guide-minibatch-custom-gnn-module` 를 참고하자.
+
+학습 룹
+~~~~~
+
+단순하게 학습 룹은 커스터마이징된 배치 iterator를 사용해서 데이터셋을 iterating하는 것으로 구성된다. MFG들의 리스트를 반환하는 매 iteration마다, 다음과 같은 일을 한다.
+
+1. 입력 노드들의 노드 피처들을 GPU로 로딩한다. 노드 피쳐들은 메모리나 외부 저장소에 저장되어 있을 수 있다. 그래프 전체 학습에서 모든 노드들의 피처를 로드하는 것과는 다르게, 입력 노드들의 피처만 로드하면 된다는 점을 유의하자.
+   
+
+   만약 피쳐들이 ``g.ndata`` 에 저장되어 있다면, 그 피쳐들은 ``blocks[0].srcdata`` 에 저장된 피쳐들, 즉 첫번째 MFG의 소스 노드들의 피처들을 접근해서 로드될 수 있다. 여기서 노드들은 최종 representation을 계산하는데 필요한 모든 노드들을 의미한다.
+
+2. MFG들의 리스트 및 입력 노드 피쳐들을 멀티-레이어 GNN에 입력해서 결과를 
+얻는다.
+
+3. 출력 노드에 해당하는 노드 레이블을 GPU에 로드한다. 비슷하게, 노드 레이블은 메모리나 외부 저장소에 저장되어 있을 수 있다. 역시, 그래프 전체 학습에서 모든 노드들의 레이블을 로드하는 것과는 다르게, 출력 노드들의 레이블만 로드한다는 점을 알아두자.
+   
+   피처가 ``g.ndata`` 에 저장되어 있다면, 레이블은 ``blocks[-1].dstdata`` 의 피쳐들 즉, 마지막 MFG의 목적지 노드들의 피쳐들을 접근해서 로드될 수 있다. 이것들은 최종 representation을 계산할 노드들과 같다.
+
+4. loss를 계산한 후, backpropagate를 수행한다.
+
+.. code:: python
+
+    model = StochasticTwoLayerGCN(in_features, hidden_features, out_features)
+    model = model.cuda()
+    opt = torch.optim.Adam(model.parameters())
+    
+    for input_nodes, output_nodes, blocks in dataloader:
+        blocks = [b.to(torch.device('cuda')) for b in blocks]
+        input_features = blocks[0].srcdata['features']
+        output_labels = blocks[-1].dstdata['label']
+        output_predictions = model(blocks, input_features)
+        loss = compute_loss(output_labels, output_predictions)
+        opt.zero_grad()
+        loss.backward()
+        opt.step()
+
+DGL에서는 end-to-end stochastic 학습 예제인 `GraphSAGE
+implementation <https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/train_sampling.py>`__ 를 제공한다.
+
+Heterogeneous 그래프의 경우
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Heterogeneous 그래프에 대한 노드 분류 그래프 뉴럴 네트워크를 학습하는 것은 간단하다.
+
+:ref:`how to train a 2-layer RGCN on full graph <guide-training-rgcn-node-classification>` 를 예로 들어보자. 미니-배치 학습을 하는 RGCN 구현 코드는 이 예제와 매우 비슷하다. (간단하게 하기 위해서 self-loop, non-linearity와 기본적인 decomposition은 제거했다.)
+
+.. code:: python
+
+    class StochasticTwoLayerRGCN(nn.Module):
+        def __init__(self, in_feat, hidden_feat, out_feat, rel_names):
+            super().__init__()
+            self.conv1 = dglnn.HeteroGraphConv({
+                    rel : dglnn.GraphConv(in_feat, hidden_feat, norm='right')
+                    for rel in rel_names
+                })
+            self.conv2 = dglnn.HeteroGraphConv({
+                    rel : dglnn.GraphConv(hidden_feat, out_feat, norm='right')
+                    for rel in rel_names
+                })
+    
+        def forward(self, blocks, x):
+            x = self.conv1(blocks[0], x)
+            x = self.conv2(blocks[1], x)
+            return x
+
+또한, DGL이 제공하는 일부 샘플러들은 heterogeneous 그래프를 지원한다. 예를 들어, 제공되는 :class:`~dgl.dataloading.neighbor.MultiLayerFullNeighborSampler` 클래스 및 :class:`~dgl.dataloading.pytorch.NodeDataLoader` 클래스를 stochastic 학습에도 여전히 사용할 수 있다. 전체 이웃 샘플링에서 다른 점은 학습 셋에 노드 타입들과 노드 ID들의 사전을 명시해야한다는 것 뿐이다.
+
+.. code:: python
+
+    sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
+    dataloader = dgl.dataloading.NodeDataLoader(
+        g, train_nid_dict, sampler,
+        batch_size=1024,
+        shuffle=True,
+        drop_last=False,
+        num_workers=4)
+
+학습 룹은 homogeneous 그래프에 대한 학습 룹이랑 거의 유사하다. 다른 점은 ``compute_loss`` 의 구현에서 노드 타입들와 예측 결과라는 두개의 dictionary들을 인자로 받는다는 것이다.
+
+.. code:: python
+
+    model = StochasticTwoLayerRGCN(in_features, hidden_features, out_features, etypes)
+    model = model.cuda()
+    opt = torch.optim.Adam(model.parameters())
+    
+    for input_nodes, output_nodes, blocks in dataloader:
+        blocks = [b.to(torch.device('cuda')) for b in blocks]
+        input_features = blocks[0].srcdata     # returns a dict
+        output_labels = blocks[-1].dstdata     # returns a dict
+        output_predictions = model(blocks, input_features)
+        loss = compute_loss(output_labels, output_predictions)
+        opt.zero_grad()
+        loss.backward()
+        opt.step()
+
+End-to-end stochastic 학습 예제는 `RGCN
+implementation <https://github.com/dmlc/dgl/blob/master/examples/pytorch/rgcn-hetero/entity_classify_mb.py>`__ 를 참고하자.
+
+
--- a/docs/source/guide_ko/minibatch.rst
+++ b/docs/source/guide_ko/minibatch.rst
+.. _guide_ko-minibatch:
+
+6장: 큰 그래프에 대한 stochastic 학습
+===============================
+
+:ref:`(English Version) <guide-minibatch>`
+
+만약 수백만, 수십억개의 노드들 또는 에지들을 갖는 큰 그래프인 경우에는 :ref:`guide-training` 에서 소개한 그래프 전체를 사용한 학습을 적용하기 어려울 것이다. Hidden state 크기가 :math:`H` 인 노드가 :math:`N` 개인 그래프에 :math:`L` -레이어의 graph convolutional network를 생각해보자. 중간 hidden 상태를 저장하는데 :math:`(NLH)` 메모리가 필요하고, :math:`N` 이 큰 경우 GPU 하나의 용량을 훨씬 넘을 것이다.
+
+이 절에서 모든 노드들의 피쳐를 GPU에 올려야할 필요가 없는 stochastic 미니-배치 학습을 수행하는 법을 알아본다.
+
+이웃 샘플링(Neighborhood Sampling) 방법 개요
+---------------------------------------
+
+이웃 샘플링 방법은 일반적으로 다음과 같다. 각 gradient descent 단계마다, :math:`L-1` 레이어의 최종 representation을 계산되어야 할 노드들의 미니 배치를 선택한다. 그 다음으로 :math:`L-1` 레이어에서 그것들의 이웃 전체 또는 일부를 선택한다. 이 절차는 모델의 입력에 이를 때까지 반복된다. 이 반복 프로세스는 출력시작해서 거꾸로 입력까지의 의존성 그래프(dependency graph)를 생성하며, 이를 시각화하면 다음과 같다:
+
+.. figure:: https://data.dgl.ai/asset/image/guide_6_0_0.png
+   :alt: Imgur
+
+이를 사용하면, 큰 그래프에 대한 GNN 모델을 학습하는데 필요한 워크로드 및 연산 자원을 절약할 수 있다.
+
+DGL은 이웃 샘플링을 사용한 GNN 학습을 위한 몇 가지 이웃 샘플러들과 파이프라인을 제공한다. 또한, 샘플링 전략을 커스터마이징하는 방법도 지원한다.
+
+로드맵
+----
+
+이 장은 GNN은 stochastical하게 학습하는 여러 시나리오들로 시작한다.
+
+* :ref:`guide_ko-minibatch-node-classification-sampler`
+* :ref:`guide_ko-minibatch-edge-classification-sampler`
+* :ref:`guide_ko-minibatch-link-classification-sampler`
+
+이 후 절들에서는 새로운 샘플링 알고리즘들, 미니-배치 학습과 호환되는 새로운 GNN 모듈을 만들고자 하거나, 검증과 추론이 미니-배치에서 어떻게 수행되는지 이해하고 싶은 분들을 위한 보다 고급 토픽들을 다룬다.
+
+* :ref:`guide_ko-minibatch-customizing-neighborhood-sampler`
+* :ref:`guide_ko-minibatch-custom-gnn-module`
+* :ref:`guide_ko-minibatch-inference`
+
+마지막으로 이웃 샘플링을 구현하고 사용하는데 대한 성능 팁을 알아본다.
+
+* :ref:`guide_ko-minibatch-gpu-sampling`
+
+.. toctree::
+    :maxdepth: 1
+    :hidden:
+    :glob:
+
+    minibatch-node
+    minibatch-edge
+    minibatch-link
+    minibatch-custom-sampler
+    minibatch-nn
+    minibatch-inference
+    minibatch-gpu-sampling
--- a/docs/source/guide_ko/mixed_precision.rst
+++ b/docs/source/guide_ko/mixed_precision.rst
+.. _guide_ko-mixed_precision:
+
+8장: Mixed Precision 학습
+=======================
+
+:ref:`(English Version) <guide-mixed_precision>`
+
+DGL은 mixed precision 학습을 위해서 `PyTorch's automatic mixed precision package <https://pytorch.org/docs/stable/amp.html>`_ 와 호환된다. 따라서, 학습 시간 및 GPU 메모리 사용량을 절약할 수 있다. 이 기능을 활성화하기 위해서는, PyTorch 1.6+, python 3.7+을 설치하고, ``float16`` 데이터 타입 지원을 위해서 DGL을 소스 파일을 사용해서 빌드해야 한다. (이 기능은 아직 베타 단계이고, pre-built pip wheel 형태로 제공하지 않는다.)
+
+설치
+---------
+
+우선 DGL 소스 코드를 GitHub에서 다운로드하고, ``USE_FP16=ON`` 플래그를 사용해서 shared library를 빌드한다.
+
+.. code:: bash
+
+   git clone --recurse-submodules https://github.com/dmlc/dgl.git
+   cd dgl
+   mkdir build
+   cd build
+   cmake -DUSE_CUDA=ON -DUSE_FP16=ON ..
+   make -j
+
+다음으로 Python 바인딩을 설치한다.
+
+.. code:: bash
+
+   cd ../python
+   python setup.py install
+
+Half precision을 사용한 메시지 전달
+------------------------------
+
+fp16을 지원하는 DGL은 UDF(User Defined Function)이나 빌트인 함수(예, ``dgl.function.sum``,
+``dgl.function.copy_u``)를 사용해서 ``float16`` 피쳐에 대한 메시지 전달을 허용한다.
+
+
+다음 예제는 DGL 메시지 전달 API를 half-precision 피쳐들에 사용하는 방법을 보여준다.
+
+    >>> import torch
+    >>> import dgl
+    >>> import dgl.function as fn
+    >>> g = dgl.rand_graph(30, 100).to(0)  # Create a graph on GPU w/ 30 nodes and 100 edges.
+    >>> g.ndata['h'] = torch.rand(30, 16).to(0).half()  # Create fp16 node features.
+    >>> g.edata['w'] = torch.rand(100, 1).to(0).half()  # Create fp16 edge features.
+    >>> # Use DGL's built-in functions for message passing on fp16 features.
+    >>> g.update_all(fn.u_mul_e('h', 'w', 'm'), fn.sum('m', 'x'))
+    >>> g.ndata['x'][0]
+    tensor([0.3391, 0.2208, 0.7163, 0.6655, 0.7031, 0.5854, 0.9404, 0.7720, 0.6562,
+            0.4028, 0.6943, 0.5908, 0.9307, 0.5962, 0.7827, 0.5034],
+           device='cuda:0', dtype=torch.float16)
+    >>> g.apply_edges(fn.u_dot_v('h', 'x', 'hx'))
+    >>> g.edata['hx'][0]
+    tensor([5.4570], device='cuda:0', dtype=torch.float16)
+    >>> # Use UDF(User Defined Functions) for message passing on fp16 features.
+    >>> def message(edges):
+    ...     return {'m': edges.src['h'] * edges.data['w']}
+    ...
+    >>> def reduce(nodes):
+    ...     return {'y': torch.sum(nodes.mailbox['m'], 1)}
+    ...
+    >>> def dot(edges):
+    ...     return {'hy': (edges.src['h'] * edges.dst['y']).sum(-1, keepdims=True)}
+    ...
+    >>> g.update_all(message, reduce)
+    >>> g.ndata['y'][0]
+    tensor([0.3394, 0.2209, 0.7168, 0.6655, 0.7026, 0.5854, 0.9404, 0.7720, 0.6562,
+            0.4028, 0.6943, 0.5908, 0.9307, 0.5967, 0.7827, 0.5039],
+           device='cuda:0', dtype=torch.float16)
+    >>> g.apply_edges(dot)
+    >>> g.edata['hy'][0]
+    tensor([5.4609], device='cuda:0', dtype=torch.float16)
+
+
+End-to-End Mixed Precision 학습
+------------------------------
+
+DGL은 PyTorch의 AMP package를 사용해서 mixed precision 학습을 구현하고 있어서, 사용 방법은 `PyTorch의 것 <https://pytorch.org/docs/stable/notes/amp_examples.html>`_ 과 동일하다.
+
+GNN 모델의 forward 패스(loss 계산 포함)를 ``torch.cuda.amp.autocast()`` 로 래핑하면 PyTorch는 각 op 및 텐서에 대해서 적절한 데이터 타입을 자동으로 선택한다. Half precision 텐서는 메모리 효율적이고, half precision 텐서에 대한 대부분 연산들은 GPU tensorcore들을 활용하기 때문에 더 빠르다.
+
+``float16`` 포멧의 작은 graident들은 언더플로우(underflow) 문제를 갖는데 (0이 되버림), PyTorch는 이를 해결하기 위해서 ``GradScaler`` 모듈을 제공한다. ``GradScaler`` 는 loss 값에 factor를 곱하고, 이 scaled loss에 backward pass를 수행한다. 그리고 파라메터들을 업데이트하는 optimizer를 수행하기 전에 unscale 한다.
+
+다음은 3-레이어 GAT를 Reddit 데이터셋(1140억개의 에지를 갖는)에 학습을 하는 스크립트이다. ``use_fp16`` 가 활성화/비활성화되었을 때의 코드 차이를 살펴보자.
+
+.. code::
+
+    import torch 
+    import torch.nn as nn
+    import torch.nn.functional as F
+    from torch.cuda.amp import autocast, GradScaler
+    import dgl
+    from dgl.data import RedditDataset
+    from dgl.nn import GATConv
+
+    use_fp16 = True
+
+
+    class GAT(nn.Module):
+        def __init__(self,
+                     in_feats,
+                     n_hidden,
+                     n_classes,
+                     heads):
+            super().__init__()
+            self.layers = nn.ModuleList()
+            self.layers.append(GATConv(in_feats, n_hidden, heads[0], activation=F.elu))
+            self.layers.append(GATConv(n_hidden * heads[0], n_hidden, heads[1], activation=F.elu))
+            self.layers.append(GATConv(n_hidden * heads[1], n_classes, heads[2], activation=F.elu))
+
+        def forward(self, g, h):
+            for l, layer in enumerate(self.layers):
+                h = layer(g, h)
+                if l != len(self.layers) - 1:
+                    h = h.flatten(1)
+                else:
+                    h = h.mean(1)
+            return h
+
+    # Data loading
+    data = RedditDataset()
+    device = torch.device(0)
+    g = data[0]
+    g = dgl.add_self_loop(g)
+    g = g.int().to(device)
+    train_mask = g.ndata['train_mask']
+    features = g.ndata['feat']
+    labels = g.ndata['label']
+    in_feats = features.shape[1]
+    n_hidden = 256
+    n_classes = data.num_classes
+    n_edges = g.number_of_edges()
+    heads = [1, 1, 1]
+    model = GAT(in_feats, n_hidden, n_classes, heads)
+    model = model.to(device)
+
+    # Create optimizer
+    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=5e-4)
+    # Create gradient scaler
+    scaler = GradScaler()
+
+    for epoch in range(100):
+        model.train()
+        optimizer.zero_grad()
+
+        # Wrap forward pass with autocast
+        with autocast(enabled=use_fp16):
+            logits = model(g, features)
+            loss = F.cross_entropy(logits[train_mask], labels[train_mask])
+        
+        if use_fp16:
+            # Backprop w/ gradient scaling
+            scaler.scale(loss).backward()
+            scaler.step(optimizer)
+            scaler.update()
+        else:
+            loss.backward()
+            optimizer.step()
+
+        print('Epoch {} | Loss {}'.format(epoch, loss.item()))
+
+NVIDIA V100 (16GB) 한개를 갖는 컴퓨터에서, 이 모델을 fp16을 사용하지 않고 학습할 때는 15.2GB GPU 메모리가 사용되는데, fp16을 활성화하면, 학습에 12.8G GPU 메모리가 사용된며, 두 경우 loss가 비슷한 값으로 수렴한다. 만약 head의 갯수를 ``[2, 2, 2]`` 로 바꾸면, fp16를 사용하지 않는 학습은 GPU OOM(out-of-memory) 이슈가 생길 것이지만, fp16를 사용한 학습은 15.7G GPU 메모리를 사용하면서 수행된다.
+
+DGL은 half-precision 지원을 계속 향상하고 있고, 연산 커널의 성능은 아직 최적은 아니다. 앞으로의 업데이트를 계속 지켜보자.
\ No newline at end of file
--- a/docs/source/guide_ko/nn-construction.rst
+++ b/docs/source/guide_ko/nn-construction.rst
+.. _guide_ko-nn-construction:
+
+3.1 DGL NN 모듈 생성 함수
+---------------------
+
+:ref:`(English Version) <guide-nn-construction>`
+
+생성 함수는 다음 단계들을 수행한다:
+
+1. 옵션 설정
+2. 학습할 파라메터 또는 서브모듈 등록
+3. 파라메터 리셋
+
+.. code::
+
+    import torch.nn as nn
+
+    from dgl.utils import expand_as_pair
+
+    class SAGEConv(nn.Module):
+        def __init__(self,
+                     in_feats,
+                     out_feats,
+                     aggregator_type,
+                     bias=True,
+                     norm=None,
+                     activation=None):
+            super(SAGEConv, self).__init__()
+
+            self._in_src_feats, self._in_dst_feats = expand_as_pair(in_feats)
+            self._out_feats = out_feats
+            self._aggre_type = aggregator_type
+            self.norm = norm
+            self.activation = activation
+
+생성 함수를 만들 때 데이터 차원을 지정해야 한다. 일반적인 PyTorch 모듈의 경우에는 차원이란 보통은 입력 차원, 출력 차원, 그리고 은닉(hidden) 치원을 의미하는데, 그래프 뉴럴 네트워크의 경우 입력 차원은 소스 노드의 차원과 목적지 노드의 차원으로 나뉜다.
+
+데이터 차원들 이외의 전형적인 그래프 뉴럴 네트워크의 옵션으로 aggregation 타입(``self._aggre_type`` )이 있다. Aggregation 타입은 특정 목적지 노드에 대해서 관련된 여러 에지의 메시지들이 어떻게 집합되어야 하는지를 결정한다. 흔히 사용되는 aggregation 타입으로는 ``mean`` , ``sum`` , ``max`` , ``min`` 이 있으며, 어떤 모듈은 ``lstm``과 같이 좀더 복잡한 aggregation을 적용하기도 한다.
+
+여기서 ``norm`` 은 피처 normalization을 위해서 호출될 수 있는 함수이다. SAGEConv 페이퍼에서는 l2 normlization, :math:`h_v = h_v / \lVert h_v \rVert_2` 이 normalization으로 사용되고 있다.
+
+.. code::
+
+            # aggregator type: mean, pool, lstm, gcn
+            if aggregator_type not in ['mean', 'pool', 'lstm', 'gcn']:
+                raise KeyError('Aggregator type {} not supported.'.format(aggregator_type))
+            if aggregator_type == 'pool':
+                self.fc_pool = nn.Linear(self._in_src_feats, self._in_src_feats)
+            if aggregator_type == 'lstm':
+                self.lstm = nn.LSTM(self._in_src_feats, self._in_src_feats, batch_first=True)
+            if aggregator_type in ['mean', 'pool', 'lstm']:
+                self.fc_self = nn.Linear(self._in_dst_feats, out_feats, bias=bias)
+            self.fc_neigh = nn.Linear(self._in_src_feats, out_feats, bias=bias)
+            self.reset_parameters()
+
+다음으로는 파라메터들과 서브모듈들을 등록한다. SAGEConv의 경우에는 서브모듈은 aggregation 타입에 따라 달라진다. 그 모듈들은 ``nn.Linear``, ``nn.LSTM`` 등과 같은 순수한 PyTorch nn 모듈이다. 생성 함수의 마지막에는 ``reset_parameters()`` 호출로 가중치들을 초기화한다.
+
+.. code::
+
+        def reset_parameters(self):
+            """Reinitialize learnable parameters."""
+            gain = nn.init.calculate_gain('relu')
+            if self._aggre_type == 'pool':
+                nn.init.xavier_uniform_(self.fc_pool.weight, gain=gain)
+            if self._aggre_type == 'lstm':
+                self.lstm.reset_parameters()
+            if self._aggre_type != 'gcn':
+                nn.init.xavier_uniform_(self.fc_self.weight, gain=gain)
+            nn.init.xavier_uniform_(self.fc_neigh.weight, gain=gain)
--- a/docs/source/guide_ko/nn-forward.rst
+++ b/docs/source/guide_ko/nn-forward.rst
+.. _guide_ko-nn-forward:
+
+3.2 DGL NN 모둘의 Forward 함수
+---------------------------
+
+:ref:`(English Versin) <guide-nn-forward>`
+
+NN 모듈에서 ``forward()`` 함수는 실제 메시지 전달과 연산을 수행한다. 일반적으로 텐서들을 파라메터로 받는 PyTorch의 NN 모듈과 비교하면, DGL NN 모듈은 :class:`dgl.DGLGraph` 를 추가 파라메터로 받는다. ``forward()`` 함수는 3단계로 수행된다.
+
+- 그래프 체크 및 그래프 타입 명세화
+- 메시지 전달
+- 피쳐 업데이트
+
+이 절에서는 SAGEConv에서 사용되는 ``forward()`` 함수를 자세하게 살펴보겠다.
+
+그래프 체크와 그래프 타입 명세화(graph type specification)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code::
+
+        def forward(self, graph, feat):
+            with graph.local_scope():
+                # Specify graph type then expand input feature according to graph type
+                feat_src, feat_dst = expand_as_pair(feat, graph)
+
+``forward()`` 는 계산 및 메시지 전달 과정에서 유효하지 않은 값을 만들 수 있는 여러 특별한 케이스들을 다룰 수 있어야 한다. :class:`~dgl.nn.pytorch.conv.GraphConv` 와 같은 그래프 conv 모듈에서 수행하는 가장 전형적인 점검은 입력 그래프가 in-degree가 0인 노드를 갖지 않는지 확인하는 것이다. in-degree가 0인 경우에, ``mailbox`` 에 아무것도 없게 되고, 축약 함수는 모두 0인 값을 만들어낼 것이다. 이는 잠재적인 모델 성능 문제를 일이킬 수도 있다. 하지만, :class:`~dgl.nn.pytorch.conv.SAGEConv` 모듈의 경우, aggregated representation은 원래의 노드 피쳐와 연결(concatenated)되기 때문에, ``forward()`` 의 결과는 항상 0이 아니기 때문에, 이런 체크가 필요 없다.
+
+DGL NN 모듈은 여러 종류의 그래프, 단종 그래프, 이종 그래프(:ref:`guide-graph-heterogeneous`), 서브그래프 블록(:ref:`guide-minibatch` ), 입력에 걸쳐서 재사용될 수 있다. 
+
+SAGEConv의 수학 공식은 다음과 같다:
+
+.. math::
+
+   h_{\mathcal{N}(dst)}^{(l+1)}  = \mathrm{aggregate}
+           \left(\{h_{src}^{l}, \forall src \in \mathcal{N}(dst) \}\right)
+
+.. math::
+
+    h_{dst}^{(l+1)} = \sigma \left(W \cdot \mathrm{concat}
+           (h_{dst}^{l}, h_{\mathcal{N}(dst)}^{l+1}) + b \right)
+
+.. math::
+
+    h_{dst}^{(l+1)} = \mathrm{norm}(h_{dst}^{l+1})
+
+그래프 타입에 따라서 소스 노드 피쳐(``feat_src``)와 목적지 노드 피쳐(``feat_dst``)를 명시해야 한다. :meth:`~dgl.utils.expand_as_pair` 는 명시된 그래프 타입에 따라 ``feat`` 를 ``feat_src`` 와 ``feat_dst`` 로 확장하는 함수이다. 이 함수의 동작은 다음과 같다.
+
+.. code::
+
+    def expand_as_pair(input_, g=None):
+        if isinstance(input_, tuple):
+            # Bipartite graph case
+            return input_
+        elif g is not None and g.is_block:
+            # Subgraph block case
+            if isinstance(input_, Mapping):
+                input_dst = {
+                    k: F.narrow_row(v, 0, g.number_of_dst_nodes(k))
+                    for k, v in input_.items()}
+            else:
+                input_dst = F.narrow_row(input_, 0, g.number_of_dst_nodes())
+            return input_, input_dst
+        else:
+            # Homogeneous graph case
+            return input_, input_
+
+homogeneous 그래프 전체를 학습시키는 경우, 소스 노드와 목적지 노드들의 타입이 같다. 이것들은 그래프의 전체 노드들이다.
+
+Heterogeneous 그래프의 경우, 그래프는 여러 이분 그래프로 나뉠 수 있다. 즉, 각 관계당 하나의 그래프로. 관계는 ``(src_type, edge_type, dst_dtype)`` 로 표현된다. 입력 피쳐 ``feat`` 가 tuple 이라고 확인되면, 이 함수는 그 그래프는 이분 그래프로 취급한다. Tuple의 첫번째 요소는 소스 노드 피처이고, 두번째는 목적지 노드의 피처이다.
+
+미니-배치 학습의 경우, 연산이 여러 목적지 노드들을 기반으로 샘플된 서브 그래프에 적용된다. DGL에서 서브 그래프는 ``block`` 이라고 한다. 블록이 생성되는 단계에서, ``dst_nodes`` 가 노드 리스트의 앞에 놓이게 된다. ``[0:g.number_of_dst_nodes()]`` 인덱스를 이용해서 ``feat_dst`` 를 찾아낼 수 있다.
+
+``feat_src`` 와 ``feat_dst`` 가 정해진 후에는, 세가지 그래프 타입들에 대한 연산은 모두 동일하다.
+
+메시지 전달과 축약
+~~~~~~~~~~~~~~
+
+.. code::
+
+                import dgl.function as fn
+                import torch.nn.functional as F
+                from dgl.utils import check_eq_shape
+
+                if self._aggre_type == 'mean':
+                    graph.srcdata['h'] = feat_src
+                    graph.update_all(fn.copy_u('h', 'm'), fn.mean('m', 'neigh'))
+                    h_neigh = graph.dstdata['neigh']
+                elif self._aggre_type == 'gcn':
+                    check_eq_shape(feat)
+                    graph.srcdata['h'] = feat_src
+                    graph.dstdata['h'] = feat_dst
+                    graph.update_all(fn.copy_u('h', 'm'), fn.sum('m', 'neigh'))
+                    # divide in_degrees
+                    degs = graph.in_degrees().to(feat_dst)
+                    h_neigh = (graph.dstdata['neigh'] + graph.dstdata['h']) / (degs.unsqueeze(-1) + 1)
+                elif self._aggre_type == 'pool':
+                    graph.srcdata['h'] = F.relu(self.fc_pool(feat_src))
+                    graph.update_all(fn.copy_u('h', 'm'), fn.max('m', 'neigh'))
+                    h_neigh = graph.dstdata['neigh']
+                else:
+                    raise KeyError('Aggregator type {} not recognized.'.format(self._aggre_type))
+
+                # GraphSAGE GCN does not require fc_self.
+                if self._aggre_type == 'gcn':
+                    rst = self.fc_neigh(h_neigh)
+                else:
+                    rst = self.fc_self(h_self) + self.fc_neigh(h_neigh)
+
+이 코드는 실제로 메시지 전달과 축약 연산을 실행하고 있다. 이 부분의 코드는 모듈에 따라 다르게 구현된다. 이 코드의 모든 메시지 전달은 :meth:`~dgl.DGLGraph.update_all` API와 ``built-in``  메시지/축약 함수들로 구현되어 있는데, 이는 :ref:`guide-message-passing-efficient` 에서 설명된 DGL의 성능 최적화를 모두 활용하기 위해서이다.
+
+출력값을 위한 축약 후 피쳐 업데이트
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code::
+
+                # activation
+                if self.activation is not None:
+                    rst = self.activation(rst)
+                # normalization
+                if self.norm is not None:
+                    rst = self.norm(rst)
+                return rst
+
+``forward()`` 함수의 마지막 부분은 ``reduce function`` 다음에 피쳐를 업데이트하는 것이다. 일반적인 업데이트 연산들은 활성화 함수를 적용하고, 객체 생성 단계에서 설정된 옵션에 따라 normalization을 수행한다.
+
--- a/docs/source/guide_ko/nn-heterograph.rst
+++ b/docs/source/guide_ko/nn-heterograph.rst
+.. _guide_ko-nn-heterograph:
+
+3.3 Heterogeneous GraphConv 모듈
+-------------------------------
+
+:ref:`(English Version) <guide-nn-heterograph>`
+
+:class:`~dgl.nn.pytorch.HeteroGraphConv` 는 heterogeneous 그래프들에 DGL NN 모듈을 적용하기 위한 모듈 수준의 인캡슐레이션이다. 메시지 전달 API :meth:`~dgl.DGLGraph.multi_update_all` 와 같은 로직으로 구현되어 있고, 이는 다음을 포함한다.
+
+- :math:`r` 관계에 대한 DGL NN 모듈
+- 한 노드에 연결된 여러 관계로부터 얻은 결과를 통합하는 축약(reduction)
+
+이는 다음과 같이 공식으로 표현된다:
+
+.. math::  h_{dst}^{(l+1)} = \underset{r\in\mathcal{R}, r_{dst}=dst}{AGG} (f_r(g_r, h_{r_{src}}^l, h_{r_{dst}}^l))
+
+, 여기서 :math:`f_r` 는 각 :math:`r` 관계에 대한 NN 모듈이고, :math:`AGG` 는 aggregation 함수이다.
+
+HeteroGraphConv 구현 로직:
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code::
+
+    import torch.nn as nn
+
+    class HeteroGraphConv(nn.Module):
+        def __init__(self, mods, aggregate='sum'):
+            super(HeteroGraphConv, self).__init__()
+            self.mods = nn.ModuleDict(mods)
+            if isinstance(aggregate, str):
+                # An internal function to get common aggregation functions
+                self.agg_fn = get_aggregate_fn(aggregate)
+            else:
+                self.agg_fn = aggregate
+
+Heterograph convolution은 각 관계를 NN 모듈에 매핑하는 ``mods`` 사전을 인자로 받고, 한 노드에 대한 여러 관계들의 결과를 집계하는 함수를 설정한다.
+
+.. code::
+
+    def forward(self, g, inputs, mod_args=None, mod_kwargs=None):
+        if mod_args is None:
+            mod_args = {}
+        if mod_kwargs is None:
+            mod_kwargs = {}
+        outputs = {nty : [] for nty in g.dsttypes}
+
+입력 그래프와 입력 텐서들과 더불어, ``forward()`` 함수는 두가지 추가적인 파라메터들, ``mod_args`` 와 ``mod_kwargs`` 을 받는다. 이것들은 ``self.mods`` 안에서, 다른 종류의 관계에 연관된 NN 모듈을 수행할 때, 커스터마이즈된 파라메터들로써 사용된다.
+
+각 목적지 타입 ``nty`` 에 대한 결과 텐서를 저장하기 위해서 결과 사전(output dictionary)가 생성된다. 각 ``nty`` 에 대한 값은 리스트이다. 이는 ``nty`` 를 목적 타입으로 갖을 관계가 여러개가 있는 경우, 단일 노드 타입이 여러 아웃풋들을 갖을 수 있음을 의미한다. ``HeteroGraphConv`` 는 이 리스트들에 대해서 추가적인 aggregation을 수행할 것이다.
+
+.. code::
+
+          if g.is_block:
+              src_inputs = inputs
+              dst_inputs = {k: v[:g.number_of_dst_nodes(k)] for k, v in inputs.items()}
+          else:
+              src_inputs = dst_inputs = inputs
+
+          for stype, etype, dtype in g.canonical_etypes:
+              rel_graph = g[stype, etype, dtype]
+              if rel_graph.num_edges() == 0:
+                  continue
+              if stype not in src_inputs or dtype not in dst_inputs:
+                  continue
+              dstdata = self.mods[etype](
+                  rel_graph,
+                  (src_inputs[stype], dst_inputs[dtype]),
+                  *mod_args.get(etype, ()),
+                  **mod_kwargs.get(etype, {}))
+              outputs[dtype].append(dstdata)
+
+입력 그래프 ``g`` 는 heterogeneous 그래프 또는 heterogeneous 그래프의 서브그래프 블록일 수 있다. 보통의 NN 모듈처럼, ``forward()`` 함수는 다양한 입력 그래프 타입들을 별로도 다룰 수 있어야 한다.
+
+각 관계는 ``(stype, etype, dtype)`` 인 ``canonical_etype`` 으로 표현된다. ``canonical_etype`` 을 키로 사용해서, 이분 그래프(bipartite graph)인 ``rel_graph`` 를 추출할 수 있다. 이분 그래프에서 입력 피쳐는 ``(src_inputs[stype], dst_inputs[dtype])`` 로 구성된다. 각 관계에 대한 NN 모듈이 호출되고, 결과는 저장된다. 
+
+.. code::
+
+        rsts = {}
+        for nty, alist in outputs.items():
+            if len(alist) != 0:
+                rsts[nty] = self.agg_fn(alist, nty)
+
+마지막으로 한 목적 노드 타입에 대해 여러 관계로 부터 얻어진 결과들은 ``self.agg_fn`` 를 통해서 집계된다. :class:`~dgl.nn.pytorch.HeteroGraphConv` 의 API DOC에서 관련 예제들이 있다.
--- a/docs/source/guide_ko/nn.rst
+++ b/docs/source/guide_ko/nn.rst
+.. _guide_ko-nn:
+
+3장: GNN 모듈 만들기
+=================
+
+:ref:`(English Version) <guide-nn>`
+
+DGL NN 모듈은 GNN 모델을 만드는데 필요한 빌딩 블록들로 구성되어 있다. NN 모듈은 백엔드로 사용되는 DNN 프레임워크에 따라 `Pytorch’s NN Module <https://pytorch.org/docs/1.2.0/_modules/torch/nn/modules/module.html>`__ , `MXNet Gluon’s NN Block  <http://mxnet.incubator.apache.org/versions/1.6/api/python/docs/api/gluon/nn/index.html>`__ 그리고 `TensorFlow’s Keras Layer <https://www.tensorflow.org/api_docs/python/tf/keras/layers>`__ 를 상속한다. DGL NN 모듈에서, 생성 함수에서의 파라메터 등록과 forward 함수에서 텐서 연산은 백엔드 프레임워크의 것과 동일하다. 이런 방식의 구현덕에 DGL 코드는 백엔드 프레임워크 코드와 원활하게 통합될 수 있다. 주요 차이점은 DGL 고유의 메시지 전달 연산에 존재한다.
+
+DGL은 일반적으로 많이 사용되는 :ref:`apinn-pytorch-conv` , :ref:`apinn-pytorch-dense-conv` , :ref:`apinn-pytorch-pooling` 와 :ref:`apinn-pytorch-util` 를 포함하고 있고. 여러분의 기여를 환영한다.
+
+이 장에서는 PyTorch 백엔드를 사용한 :class:`~dgl.nn.pytorch.conv.SAGEConv` 를 예제로 커스텀 DGL NN 모듈을 만드는 방법을 소개한다.
+
+로드맵
+----
+
+* :ref:`guide_ko-nn-construction`
+* :ref:`guide_ko-nn-forward`
+* :ref:`guide_ko-nn-heterograph`
+
+.. toctree::
+    :maxdepth: 1
+    :hidden:
+    :glob:
+
+    nn-construction
+    nn-forward
+    nn-heterograph
--- a/docs/source/guide_ko/training-edge.rst
+++ b/docs/source/guide_ko/training-edge.rst
+.. _guide_ko-training-edge-classification:
+
+5.2 에지 분류 및 리그레션(Regression)
+--------------------------------
+
+:ref:`(English Version) <guide-training-edge-classification>`
+
+때론 그래프의 에지들의 속성을 예측을 원하는 경우가 있다. 이를 위해서 *에지 분류/리그레션* 모델을 만들고자 한다.
+
+우선, 예제로 사용할 에지 예측을 위한 임의의 그래프를 만든다. 
+
+.. code:: ipython3
+
+    src = np.random.randint(0, 100, 500)
+    dst = np.random.randint(0, 100, 500)
+    # make it symmetric
+    edge_pred_graph = dgl.graph((np.concatenate([src, dst]), np.concatenate([dst, src])))
+    # synthetic node and edge features, as well as edge labels
+    edge_pred_graph.ndata['feature'] = torch.randn(100, 10)
+    edge_pred_graph.edata['feature'] = torch.randn(1000, 10)
+    edge_pred_graph.edata['label'] = torch.randn(1000)
+    # synthetic train-validation-test splits
+    edge_pred_graph.edata['train_mask'] = torch.zeros(1000, dtype=torch.bool).bernoulli(0.6)
+
+개요
+~~~~~~~~~
+
+앞 절에서 우리는 멀티 레이어 GNN을 사용해서 노드 분류하는 방법을 알아봤다. 임의의 노드에 대한 hidden representation을 계산하기 위해서 같은 기법을 적용한다. 그러면 에지들에 대한 예측은 그것들의 부속 노드들의 representation들로 부터 도출할 수 있다.
+
+에지에 대한 예측을 계산하는 가장 일반적인 방법은 그 에지의 부속 노드들의 representation들과 부수적으로 그 에지에 대한 피쳐들의 parameterized 함수로 표현하는 것이다.
+
+노드 분류 모델과 구현상의 차이점
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+이전 절에서 만든 모델을 사용해서 노드 representation을 계산한다고 가정하면, :meth:`~dgl.DGLGraph.apply_edges` 메소드로 에지 예측을 계산하는 컴포넌트만 작성하면 된다.
+
+예를 들어, 에지 리그레션을 위해서 각 에지에 대한 점수를 계산하고자 한다면, 아래 코드와 같이 각 에지에 대한 부속 노드의 representation들의 dot product를 계산하면 된다.
+
+.. code:: python
+
+    import dgl.function as fn
+    class DotProductPredictor(nn.Module):
+        def forward(self, graph, h):
+            # h contains the node representations computed from the GNN defined
+            # in the node classification section (Section 5.1).
+            with graph.local_scope():
+                graph.ndata['h'] = h
+                graph.apply_edges(fn.u_dot_v('h', 'h', 'score'))
+                return graph.edata['score']
+
+또한 MLP를 사용해서 각 에지에 대한 벡터 값을 예측하는 예측하는 함수를 작성할 수도 있다. 이 벡터 값은 미래의 다운스트림 테스크들에 사용될 수 있다. 즉, 범주형 분류의 logit으로 사용.
+
+.. code:: python
+
+    class MLPPredictor(nn.Module):
+        def __init__(self, in_features, out_classes):
+            super().__init__()
+            self.W = nn.Linear(in_features * 2, out_classes)
+
+        def apply_edges(self, edges):
+            h_u = edges.src['h']
+            h_v = edges.dst['h']
+            score = self.W(torch.cat([h_u, h_v], 1))
+            return {'score': score}
+
+        def forward(self, graph, h):
+            # h contains the node representations computed from the GNN defined
+            # in the node classification section (Section 5.1).
+            with graph.local_scope():
+                graph.ndata['h'] = h
+                graph.apply_edges(self.apply_edges)
+                return graph.edata['score']
+
+학습 룹(loop)
+~~~~~~~~~~~
+
+노드 representation 계산 모델과 에지 예측 모델을 만들었다면, 모든 에지들에 대한 예측값을 계산하는 전체 그래프를 이용한 학습 룹을 작성할 수 있다.
+
+노드 representation 계산 모델로 ``SAGE`` 를, 에지 예측 모델로 ``DotPredictor`` 을 사용한다.
+
+.. code:: python
+
+    class Model(nn.Module):
+        def __init__(self, in_features, hidden_features, out_features):
+            super().__init__()
+            self.sage = SAGE(in_features, hidden_features, out_features)
+            self.pred = DotProductPredictor()
+        def forward(self, g, x):
+            h = self.sage(g, x)
+            return self.pred(g, h)
+
+이 예제에서 학습/검증/테스트 에지 셋이 에지의 이진 마스크로 구분된다고 가정한다. 또한 early stopping이나 모델 저장은 포함하지 않는다.
+
+.. code:: python
+
+    node_features = edge_pred_graph.ndata['feature']
+    edge_label = edge_pred_graph.edata['label']
+    train_mask = edge_pred_graph.edata['train_mask']
+    model = Model(10, 20, 5)
+    opt = torch.optim.Adam(model.parameters())
+    for epoch in range(10):
+        pred = model(edge_pred_graph, node_features)
+        loss = ((pred[train_mask] - edge_label[train_mask]) ** 2).mean()
+        opt.zero_grad()
+        loss.backward()
+        opt.step()
+        print(loss.item())
+
+.. _guide-training-edge-classification-heterogeneous-graph:
+
+Heterogeneous 그래프
+~~~~~~~~~~~~~~~~~~
+
+Heterogeneous 그래프들에 대한 에지 분류는 homogeneous 그래프와 크게 다르지 않다. 하나의 에지 타입에 대해서 에지 분류를 수행하자 한다면, 모든 노드 티압에 대한 노드 representation을 구하고, :meth:`~dgl.DGLHeteroGraph.apply_edges` 메소드를 사용해서 에지 타입을 예측하면 된다.
+
+예를 들면, heterogeneous 그래프의 하나의 에지 타입에 대한 동작하는 ``DotProductPredictor`` 를 작성하고자 한다면, ``apply_edges`` 메소드에 해당 에지 타입을 명시하기만 하면 된다.
+
+.. code:: python
+
+    class HeteroDotProductPredictor(nn.Module):
+        def forward(self, graph, h, etype):
+            # h contains the node representations for each edge type computed from
+            # the GNN for heterogeneous graphs defined in the node classification
+            # section (Section 5.1).
+            with graph.local_scope():
+                graph.ndata['h'] = h   # assigns 'h' of all node types in one shot
+                graph.apply_edges(fn.u_dot_v('h', 'h', 'score'), etype=etype)
+                return graph.edges[etype].data['score']
+
+비슷하게 ``HeteroMLPPredictor`` 를 작성할 수 있다.
+
+.. code:: python
+
+    class HeteroMLPPredictor(nn.Module):
+        def __init__(self, in_features, out_classes):
+            super().__init__()
+            self.W = nn.Linear(in_features * 2, out_classes)
+
+        def apply_edges(self, edges):
+            h_u = edges.src['h']
+            h_v = edges.dst['h']
+            score = self.W(torch.cat([h_u, h_v], 1))
+            return {'score': score}
+
+        def forward(self, graph, h, etype):
+            # h contains the node representations for each edge type computed from
+            # the GNN for heterogeneous graphs defined in the node classification
+            # section (Section 5.1).
+            with graph.local_scope():
+                graph.ndata['h'] = h   # assigns 'h' of all node types in one shot
+                graph.apply_edges(self.apply_edges, etype=etype)
+                return graph.edges[etype].data['score']
+
+특정 타입의 에지에 대해서, 각 에지의 점수를 예측하는 end-to-end 모델을 다음과 같다:
+
+.. code:: python
+
+    class Model(nn.Module):
+        def __init__(self, in_features, hidden_features, out_features, rel_names):
+            super().__init__()
+            self.sage = RGCN(in_features, hidden_features, out_features, rel_names)
+            self.pred = HeteroDotProductPredictor()
+        def forward(self, g, x, etype):
+            h = self.sage(g, x)
+            return self.pred(g, h, etype)
+
+모델을 사용하는 방법은 노드 타입과 피쳐들에 대한 사전을 모델에 간단하게 입력하면 된다.
+
+.. code:: python
+
+    model = Model(10, 20, 5, hetero_graph.etypes)
+    user_feats = hetero_graph.nodes['user'].data['feature']
+    item_feats = hetero_graph.nodes['item'].data['feature']
+    label = hetero_graph.edges['click'].data['label']
+    train_mask = hetero_graph.edges['click'].data['train_mask']
+    node_features = {'user': user_feats, 'item': item_feats}
+
+학습 룹은 homogeneous 그래프의 것과 거의 유사하다. 예를 들어, 에지 타입 ``click``에 대한 에지 레이블을 예측하는 것은 다음과 같이 간단히 구현된다.
+
+.. code:: python
+
+    opt = torch.optim.Adam(model.parameters())
+    for epoch in range(10):
+        pred = model(hetero_graph, node_features, 'click')
+        loss = ((pred[train_mask] - label[train_mask]) ** 2).mean()
+        opt.zero_grad()
+        loss.backward()
+        opt.step()
+        print(loss.item())
+
+
+Heterogeneous 그래프의 에지들에 대한 에지 타입 예측하기
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+주어진 에지의 타입을 예측하는 일도 종종 하게된다.
+
+:ref:`heterogeneous 그래프 예제 <guide-training-heterogeneous-graph-example>` 에서는 user와 item을 연결하는 에지가 주어졌을 때, user가 ``click`` 을 선택할지, ``dislike`` 를 선택할지를 예측하고 있다.
+
+이는 추천에서 흔히 쓰이는 평가 예측의 간략한 버전이다.
+
+노드 representation을 얻기 위해서 heterogeneous graph convolution 네트워크를 사용할 수 있다. 이를 위해서 :ref:`이전에 정의한 RGCN <guide-training-rgcn-node-classification>` 를 사용하는 것도 가능하다.
+
+에지 타입을 예측하기 위해서 ``HeteroDotProductPredictor`` 의 용도를 간단히 변경해서 예측할 모든 에지 타입을 “병합“하고 모든 에지들의 각 타입에 대한 점수를 내보내는 하나의 에지 타입만 있는 다른 그래프를 취하게하면 된다.
+
+이 예제에 적용해보면, ``user`` 와 ``item`` 두 노트 타입을 갖으며 ``user`` 와 ``item`` 에 대한 ``click`` 이나 ``dislike`` 같은 모든 에지 타입을 병합하는 단일 에지 타입을 갖는 그래프가 필요하다. 다음 문장으로 간단하게 생성할 수 있다.
+
+.. code:: python
+
+    dec_graph = hetero_graph['user', :, 'item']
+
+이 함수는 ``user`` 와 ``item`` 을 노드 타입으로 갖고, 두 노드 타입을 연결하고 있는 모든 에지 타입(예, ``click`` 와 ``dislike`` )을 합친 단일 에지 타입을 갖는 heterogeneous 그래프를 리턴한다.
+
+위 코드는 원래의 에지 타입을 ``dgl.ETYPE`` 이라는 이름의 피처로 리턴하기 때문에, 이를 레이블로 사용할 수 있다.
+
+.. code:: python
+
+    edge_label = dec_graph.edata[dgl.ETYPE]
+
+에지 타입 예측 모듈의 입력으로 위 그래프를 사용해서 예측 모델을 다음과 같이 작성한다.
+
+.. code:: python
+
+    class HeteroMLPPredictor(nn.Module):
+        def __init__(self, in_dims, n_classes):
+            super().__init__()
+            self.W = nn.Linear(in_dims * 2, n_classes)
+
+        def apply_edges(self, edges):
+            x = torch.cat([edges.src['h'], edges.dst['h']], 1)
+            y = self.W(x)
+            return {'score': y}
+
+        def forward(self, graph, h):
+            # h contains the node representations for each edge type computed from
+            # the GNN for heterogeneous graphs defined in the node classification
+            # section (Section 5.1).
+            with graph.local_scope():
+                graph.ndata['h'] = h   # assigns 'h' of all node types in one shot
+                graph.apply_edges(self.apply_edges)
+                return graph.edata['score']
+
+노드 representation 모듈과 에지 타입 예측 모듈을 합친 모델은 다음과 같다.
+
+.. code:: python
+
+    class Model(nn.Module):
+        def __init__(self, in_features, hidden_features, out_features, rel_names):
+            super().__init__()
+            self.sage = RGCN(in_features, hidden_features, out_features, rel_names)
+            self.pred = HeteroMLPPredictor(out_features, len(rel_names))
+        def forward(self, g, x, dec_graph):
+            h = self.sage(g, x)
+            return self.pred(dec_graph, h)
+
+학습 룹은 아래와 같이 간단하다.
+
+.. code:: python
+
+    model = Model(10, 20, 5, hetero_graph.etypes)
+    user_feats = hetero_graph.nodes['user'].data['feature']
+    item_feats = hetero_graph.nodes['item'].data['feature']
+    node_features = {'user': user_feats, 'item': item_feats}
+
+    opt = torch.optim.Adam(model.parameters())
+    for epoch in range(10):
+        logits = model(hetero_graph, node_features, dec_graph)
+        loss = F.cross_entropy(logits, edge_label)
+        opt.zero_grad()
+        loss.backward()
+        opt.step()
+        print(loss.item())
+
+DGL은 heterogeneous 그래프의 에지들에 대한 타입을 예측하는 문제인 평가 예측 예제로 `Graph Convolutional Matrix Completion <https://github.com/dmlc/dgl/tree/master/examples/pytorch/gcmc>`__ 를 제공한다. `모델 구현 파일 <https://github.com/dmlc/dgl/tree/master/examples/pytorch/gcmc>`__ 에 있는 노드 representation 모듈은 ``GCMCLayer`` 라고 불린다. 이 둘은 여기서 설명하기에는 너무 복잡하니 자세한 설명은 생략한다.
\ No newline at end of file
--- a/docs/source/guide_ko/training-graph.rst
+++ b/docs/source/guide_ko/training-graph.rst
+.. _guide_ko-training-graph-classification:
+
+5.4 그래프 분류
+------------
+
+:ref:`(English Version) <guide-training-graph-classification>`
+
+데이터가 커다란 하나의 그래프가 아닌 여러 그래프로 구성된 경우도 종종 있다. 예를 들면, 사람들의 커뮤니티의 여러 종류 목록 같은 것을 들 수 있다. 같은 커뮤니티에 있는 사람들의 친목 관계를 그래프로 특징을 지어본다면, 분류할 수 있는 그래프들의 리스트를 만들 수 있다. 이 상황에서 그래프 분류 모델을 이용해서 커뮤니티의 종류를 구별해볼 수 있다.
+
+개요
+~~~~~~~~~
+
+그래프 분류가 노드 분류나 링크 예측 문제와 주요 차이점은 예측 결과가 전체 입력 그래프의 특성을 나타낸다는 것이다. 이전 문제들과 똑같이 노드들이나 에지들에 대해서 메시지 전달을 수행하지만, 그래프 수준의 representation을 찾아내야한다.
+
+그래프 분류 파이프라인은 다음과 같다:
+
+.. figure:: https://data.dgl.ai/tutorial/batch/graph_classifier.png
+   :alt: Graph Classification Process
+
+   그래프 분류 프로세스
+
+
+일반적인 방법은 (왼쪽부터 오른쪽으로 진행):
+
+- 그래프들의 배치를 준비한다
+- 그래프들의 배치에 메시지 전달을 수행해서 노드/에지 피쳐를 업데이트한다
+- 노드/에지 피쳐들을 모두 합쳐서 그래프 수준의 representation들을 만든다
+- 그래프 수준의 representation들을 사용해서 그래프들을 분류한다
+
+그래프들의 배치(batch)
+^^^^^^^^^^^^^^^^^^
+
+보통의 경우 그래프 분류 문제는 많은 수의 그래프를 사용해서 학습하기 때문에, 모델을 학습할 때 그래프를 한개씩 사용하는 것은 굉장히 비효율적이다. 일반적 딥러닝에서 사용되는 미니-배치 학습의 아이디어를 발려와서, 그래프들의 배치를 만들어서 한번의 학습 이터레이션에 사용하는 것이 가능하다.
+
+DGL는 그래프들의 리스트로부터 하나의 배치 그래프(batched graph)를 생성할 수 있다. 단순하게, 이 배치 그래프는 원래의 작은 그래프들을 연결하는 컴포넌트를 가지고 있는 하나의 큰 그래프로 사용된다.
+
+.. figure:: https://data.dgl.ai/tutorial/batch/batch.png
+   :alt: Batched Graph
+
+   배치 그래프(Batched Graph)
+
+다음 코드 예제는 그래프들의 목록에 :func:`dgl.batch` 를 호출한다. 배치 그래프는 하나의 그래프이자, 그 리스트에 대한 정보를 담고 있다.
+
+.. code:: python
+
+    import dgl
+    import torch as th
+
+    g1 = dgl.graph((th.tensor([0, 1, 2]), th.tensor([1, 2, 3])))
+    g2 = dgl.graph((th.tensor([0, 0, 0, 1]), th.tensor([0, 1, 2, 0])))
+
+    bg = dgl.batch([g1, g2])
+    bg
+    # Graph(num_nodes=7, num_edges=7,
+    #       ndata_schemes={}
+    #       edata_schemes={})
+    bg.batch_size
+    # 2
+    bg.batch_num_nodes()
+    # tensor([4, 3])
+    bg.batch_num_edges()
+    # tensor([3, 4])
+    bg.edges()
+    # (tensor([0, 1, 2, 4, 4, 4, 5], tensor([1, 2, 3, 4, 5, 6, 4]))
+
+대부분의 DGL 변환 함수들은 배치 정보를 버린다는 점을 주의하자. 이 정보를 유지하기 위해서, 변환된 그래프에  :func:`dgl.DGLGraph.set_batch_num_nodes` 와 :func:`dgl.DGLGraph.set_batch_num_edges` 를 사용한다.
+
+그래프 리드아웃(readout)
+^^^^^^^^^^^^^^^^^^^^
+
+모든 그래프는 노드와 에지의 피쳐들과 더불어 유일한 구조를 지니고 있다. 하나의 예측을 만들어내기 위해서, 보통은 아마도 풍부한 정보들을 합치고 요약한다. 이런 종류의 연산을 *리드아웃(readout)*이라고 부른다. 흔히 쓰이는 리드아웃 연산들은 모든 노드 또는 에지 피쳐들에 대한 합(summation), 평균, 최대 또는 최소들이 있다.
+
+그래프 :math:`g` 에 대해서, 평균 노드 피처 리드아웃은 아래와 같이 정의된다.
+
+.. math:: h_g = \frac{1}{|\mathcal{V}|}\sum_{v\in \mathcal{V}}h_v
+
+여기서 :math:`h_g` 는 :math:`g` 에 대한 representation이고, :math:`\mathcal{V}` 는 :math:`g` 의 노드들의 집합, 그리고 :math:`h_v` 는 노드 :math:`v` 의 피쳐이다.
+
+DGL은 많이 쓰이는 리드아웃 연산들을 빌드인 함수로 지원한다. 예를 들어, :func:`dgl.mean_nodes` 는 위의 리드아웃 연산을 구현하고 있다.
+
+:math:`h_g` 가 구해진 후, 이를 MLP 레이어에 전달해서 분류 결과를 얻는다.
+
+뉴럴 네트워크 모델 작성하기
+~~~~~~~~~~~~~~~~~~~~
+
+모델에 대한 입력은 노드와 에지의 피쳐들 갖는 배치 그래프이다.
+
+배치 그래프에 연산하기
+^^^^^^^^^^^^^^^^
+
+첫째로, 배치 그래프에 있는 그래프들을 완전히 분리되어 있다. 즉, 두 그래들 사이에 에지가 존재하지 않는다. 이런 멋진 성질 덕에, 모든 메시지 전달 함수는 같은 결과를 만들어낸다. (즉 그래프 간의 간섭이 없다)
+
+두번째로, 배치 그래프에 대한 리드아웃 함수는 각 그래프에 별도록 수행된다. 배치 크기가 :math:`B` 이고 협쳐진 피쳐(aggregated feature)의 차원이 :math:`D` 인 경우, 리드아웃 결과의 shape은 :math:`(B, D)` 가 된다.
+
+.. code:: python
+
+    import dgl
+    import torch
+
+    g1 = dgl.graph(([0, 1], [1, 0]))
+    g1.ndata['h'] = torch.tensor([1., 2.])
+    g2 = dgl.graph(([0, 1], [1, 2]))
+    g2.ndata['h'] = torch.tensor([1., 2., 3.])
+
+    dgl.readout_nodes(g1, 'h')
+    # tensor([3.])  # 1 + 2
+
+    bg = dgl.batch([g1, g2])
+    dgl.readout_nodes(bg, 'h')
+    # tensor([3., 6.])  # [1 + 2, 1 + 2 + 3]
+
+마지막으로, 배치 그래프의 각 노드/에치 피쳐는 모든 그래프의 노드와 에지 피쳐들을 순서대로 연결해서 얻는다.
+
+.. code:: python
+
+    bg.ndata['h']
+    # tensor([1., 2., 1., 2., 3.])
+
+모델 정의하기
+^^^^^^^^^
+
+위 연산 규칙을 염두해서, 모델을 다음과 같이 정의한다.
+
+.. code:: python
+
+    import dgl.nn.pytorch as dglnn
+    import torch.nn as nn
+
+    class Classifier(nn.Module):
+        def __init__(self, in_dim, hidden_dim, n_classes):
+            super(Classifier, self).__init__()
+            self.conv1 = dglnn.GraphConv(in_dim, hidden_dim)
+            self.conv2 = dglnn.GraphConv(hidden_dim, hidden_dim)
+            self.classify = nn.Linear(hidden_dim, n_classes)
+
+        def forward(self, g, h):
+            # Apply graph convolution and activation.
+            h = F.relu(self.conv1(g, h))
+            h = F.relu(self.conv2(g, h))
+            with g.local_scope():
+                g.ndata['h'] = h
+                # Calculate graph representation by average readout.
+                hg = dgl.mean_nodes(g, 'h')
+                return self.classify(hg)
+
+학습 룹
+~~~~~
+
+데이터 로딩
+^^^^^^^^
+
+모델이 정의되었다면, 학습을 시작할 수 있다. 그래프 분류는 커다란 그래프 한개가 아니라 상대적으로 작은 그래프를 많이 다루기 때문에, 복잡한 그래프 샘플링 알고리즘을 사용하지 않고 그래프들의 stochastic 미니-배치를 사용해서 효과적으로 학습을 수행할 수 있다.
+
+:ref:`guide-data-pipeline` 에서 소개한 그래프 분류 데이터셋을 사용하자.
+
+.. code:: python
+
+    import dgl.data
+    dataset = dgl.data.GINDataset('MUTAG', False)
+
+그래프 분류 데이터셋의 각 아이템은 한개의 그래프와 그 그래프의 레이블 쌍이다. 데이터 로딩 프로세스를 빠르게 하기 위해서 GraphDataLoader의 장점을 사용해 그래프들의 데이터셋을 미니-배치 단위로 iterate한다.
+
+.. code:: python
+
+    from dgl.dataloading import GraphDataLoader
+    dataloader = GraphDataLoader(
+        dataset,
+        batch_size=1024,
+        drop_last=False,
+        shuffle=True)
+
+학습 룹은 데이터로더를 iterate하면서 모델을 업데이트하는 것일 뿐이다.
+
+.. code:: python
+
+    import torch.nn.functional as F
+
+    # Only an example, 7 is the input feature size
+    model = Classifier(7, 20, 5)
+    opt = torch.optim.Adam(model.parameters())
+    for epoch in range(20):
+        for batched_graph, labels in dataloader:
+            feats = batched_graph.ndata['attr']
+            logits = model(batched_graph, feats)
+            loss = F.cross_entropy(logits, labels)
+            opt.zero_grad()
+            loss.backward()
+            opt.step()
+
+`DGL's GIN example <https://github.com/dmlc/dgl/tree/master/examples/pytorch/gin>`__ 의 end-to-end 그래프 분류 예를 참고하자. 이 학습 룹은 `main.py <https://github.com/dmlc/dgl/blob/master/examples/pytorch/gin/main.py>`__ 의 `train` 함수안에 있다. 모델의 구현은 `gin.py <https://github.com/dmlc/dgl/blob/master/examples/pytorch/gin/gin.py>`__ 에 있고, :class:`dgl.nn.pytorch.GINConv` (MXNet 및 Tensorflow 버전도 있음)와 같은 컴포넌트들과 graph convolution layer와 배치 normalization 등이 적용되어 있다.
+
+Heterogeneous 그래프
+~~~~~~~~~~~~~~~~~~
+
+Heterogeneous 그래프들에 대한 그래프 분류는 homogeneous 그래프의 경우와는 약간 차이가 있다. Heterogeneous 그래프와 호환되는 graph convolution 모듈에 더해서, 리드아웃 함수에서 다른 종류의 노드들에 대한 aggregate를 해야한다.
+
+다음 코드는 각 노트 타입에 대해서 노드 representation을 평균을 합산하는 예제이다.
+
+.. code:: python
+
+    class RGCN(nn.Module):
+        def __init__(self, in_feats, hid_feats, out_feats, rel_names):
+            super().__init__()
+
+            self.conv1 = dglnn.HeteroGraphConv({
+                rel: dglnn.GraphConv(in_feats, hid_feats)
+                for rel in rel_names}, aggregate='sum')
+            self.conv2 = dglnn.HeteroGraphConv({
+                rel: dglnn.GraphConv(hid_feats, out_feats)
+                for rel in rel_names}, aggregate='sum')
+
+        def forward(self, graph, inputs):
+            # inputs is features of nodes
+            h = self.conv1(graph, inputs)
+            h = {k: F.relu(v) for k, v in h.items()}
+            h = self.conv2(graph, h)
+            return h
+
+    class HeteroClassifier(nn.Module):
+        def __init__(self, in_dim, hidden_dim, n_classes, rel_names):
+            super().__init__()
+
+            self.rgcn = RGCN(in_dim, hidden_dim, hidden_dim, rel_names)
+            self.classify = nn.Linear(hidden_dim, n_classes)
+
+        def forward(self, g):
+            h = g.ndata['feat']
+            h = self.rgcn(g, h)
+            with g.local_scope():
+                g.ndata['h'] = h
+                # Calculate graph representation by average readout.
+                hg = 0
+                for ntype in g.ntypes:
+                    hg = hg + dgl.mean_nodes(g, 'h', ntype=ntype)
+                return self.classify(hg)
+
+나머지 코드는 homegeneous 그래프의 경우와 다르지 않다.
+
+.. code:: python
+
+    # etypes is the list of edge types as strings.
+    model = HeteroClassifier(10, 20, 5, etypes)
+    opt = torch.optim.Adam(model.parameters())
+    for epoch in range(20):
+        for batched_graph, labels in dataloader:
+            logits = model(batched_graph)
+            loss = F.cross_entropy(logits, labels)
+            opt.zero_grad()
+            loss.backward()
+            opt.step()