basics_flow.md 866 Bytes
Newer Older
maming's avatar
maming committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<!--- Copyright (c) 2025, NVIDIA CORPORATION.
SPDX-License-Identifier: BSD-3-Clause -->

# Data Flow

![energon data flow](../images/data_flow.png)

The steps of how the data flows through those task encoder methods is explained in [](task_encoder).

(flavors_general)=
## Dataset Flavors

The datasets are organized in "flavors", i.e. each modality returned by the dataset is a "flavor".
A modality can for example be a {py:class}`CaptioningSample <megatron.energon.CaptioningSample>` or an 
{py:class}`VQASample <megatron.energon.VQASample>`. The dataset class combines the source data format
and the iterated sample format. For example, the {py:class}`CaptioningWebdataset <megatron.energon.CaptioningWebdataset>` 
combines the webdataset loader with the {py:class}`CaptioningSample <megatron.energon.CaptioningSample>`.

For all types, see [](sect-sample-types)