This folder provides several examples accelerated by Colossal-AI. The `tutorial` folder is for everyone to quickly try out the different features in Colossal-AI. Other folders such as `images` and `language` include a wide range of deep learning tasks and applications.
This folder provides several examples accelerated by Colossal-AI.
Folders such as `images` and `language` include a wide range of deep learning tasks and applications.
The `community` folder aim to create a collaborative platform for developers to contribute exotic features built on top of Colossal-AI.
The `tutorial` folder is for everyone to quickly try out the different features in Colossal-AI.
You can find applications such as Chatbot, Stable Diffusion and Biomedicine in the [Applications](https://github.com/hpcaitech/ColossalAI/tree/main/applications) directory.
You can find applications such as Chatbot, AIGC and Biomedicine in the [Applications](https://github.com/hpcaitech/ColossalAI/tree/main/applications) directory.
## Folder Structure
## Folder Structure
...
@@ -52,3 +55,10 @@ Therefore, it is essential for the example contributors to know how to integrate
...
@@ -52,3 +55,10 @@ Therefore, it is essential for the example contributors to know how to integrate
2. Configure your testing parameters such as number steps, batch size in `test_ci.sh`, e.t.c. Keep these parameters small such that each example only takes several minutes.
2. Configure your testing parameters such as number steps, batch size in `test_ci.sh`, e.t.c. Keep these parameters small such that each example only takes several minutes.
3. Export your dataset path with the prefix `/data` and make sure you have a copy of the dataset in the `/data/scratch/examples-data` directory on the CI machine. Community contributors can contact us via slack to request for downloading the dataset on the CI machine.
3. Export your dataset path with the prefix `/data` and make sure you have a copy of the dataset in the `/data/scratch/examples-data` directory on the CI machine. Community contributors can contact us via slack to request for downloading the dataset on the CI machine.
4. Implement the logic such as dependency setup and example execution
4. Implement the logic such as dependency setup and example execution
## Community Dependency
We are happy to introduce the following nice community dependency repos that are powered by Colossal-AI:
Community-driven Examples is an initiative that allows users to share their own examples to the Colossal-AI community, fostering a sense of community and making it easy for others to access and benefit from shared work. The primary goal with community-driven examples is to have a community-maintained collection of diverse and exotic functionalities built on top of the Colossal-AI package.
If a community example doesn't work as expected, you can [open an issue](https://github.com/hpcaitech/ColossalAI/issues/new/choose) and @ the author to report it.
| Example | Description | Code Example | Colab |Author |
Welcome to [open an issue](https://github.com/hpcaitech/ColossalAI/issues/new/choose) to share your insights and needs.
## How to get involved
To join our community-driven initiative, please visit the [Colossal-AI examples](https://github.com/hpcaitech/ColossalAI/tree/main/examples), review the provided information, and explore the codebase.
To contribute, create a new issue outlining your proposed feature or enhancement, and our team will review and provide feedback. If you are confident enough you can also submit a PR directly. We look forward to collaborating with you on this exciting project!
# Basic MNIST Example with optional FP8 of TransformerEngine
# Basic MNIST Example with optional FP8 of TransformerEngine
[TransformerEngine](https://github.com/NVIDIA/TransformerEngine) is a library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
[TransformerEngine](https://github.com/NVIDIA/TransformerEngine) is a library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
Thanks for the contribution to this tutorial from NVIDIA.
Thanks for the contribution to this tutorial from NVIDIA.
```bash
```bash
python main.py
python main.py
python main.py --use-te# Linear layers from TransformerEngine
python main.py --use-te# Linear layers from TransformerEngine
python main.py --use-fp8# FP8 + TransformerEngine for Linear layers
python main.py --use-fp8# FP8 + TransformerEngine for Linear layers
```
```
> We are working to integrate it with Colossal-AI and will finish it soon.
> We are working to integrate it with Colossal-AI and will finish it soon.
@@ -21,7 +21,7 @@ This folder is used to preprocess chinese corpus with Whole Word Masked. You can
...
@@ -21,7 +21,7 @@ This folder is used to preprocess chinese corpus with Whole Word Masked. You can
<spanid='Split Sentence'/>
<spanid='Split Sentence'/>
### 2.1. Split Sentence & Split data into multiple shard:
### 2.1. Split Sentence & Split data into multiple shard:
Firstly, each file has multiple documents, and each document contains multiple sentences. Split sentence through punctuation, such as `。!`. **Secondly, split data into multiple shard based on server hardware (cpu, cpu memory, hard disk) and corpus size.** Each shard contains a part of corpus, and the model needs to train all the shards as one epoch.
Firstly, each file has multiple documents, and each document contains multiple sentences. Split sentence through punctuation, such as `。!`. **Secondly, split data into multiple shard based on server hardware (cpu, cpu memory, hard disk) and corpus size.** Each shard contains a part of corpus, and the model needs to train all the shards as one epoch.
In this example, split 200G Corpus into 100 shard, and each shard is about 2G. The size of the shard is memory-dependent, taking into account the number of servers, the memory used by the tokenizer, and the memory used by the multi-process training to read the shard (n data parallel requires n\*shard_size memory). **To sum up, data preprocessing and model pretraining requires fighting with hardware, not just GPU.**
In this example, split 200G Corpus into 100 shard, and each shard is about 2G. The size of the shard is memory-dependent, taking into account the number of servers, the memory used by the tokenizer, and the memory used by the multi-process training to read the shard (n data parallel requires n\*shard_size memory). **To sum up, data preprocessing and model pretraining requires fighting with hardware, not just GPU.**
Input a sentence, return a processed sentence: In order to support the Chinese whole word mask, the words that are separated will be marked with a special mark ("#"), so that the subsequent processing module can know which words belong to the same word.
Input a sentence, return a processed sentence: In order to support the Chinese whole word mask, the words that are separated will be marked with a special mark ("#"), so that the subsequent processing module can know which words belong to the same word.
Get the initial training instances, split the whole segment into multiple parts according to the max_sequence_length, and return as multiple processed instances.
Get the initial training instances, split the whole segment into multiple parts according to the max_sequence_length, and return as multiple processed instances.
parser.add_argument('--max_predictions_per_seq',type=int,default=80,help='number of shards, e.g., 10, 50, or 100')
parser.add_argument('--max_predictions_per_seq',
type=int,
default=80,
help='number of shards, e.g., 10, 50, or 100')
parser.add_argument('--input_path',type=str,required=True,help='input path of shard which has split sentence')
parser.add_argument('--input_path',type=str,required=True,help='input path of shard which has split sentence')
parser.add_argument('--output_path',type=str,required=True,help='output path of h5 contains token id')
parser.add_argument('--output_path',type=str,required=True,help='output path of h5 contains token id')
parser.add_argument('--backend',type=str,default='python',help='backend of mask token, python, c++, numpy respectively')
parser.add_argument('--backend',
parser.add_argument('--dupe_factor',type=int,default=1,help='specifies how many times the preprocessor repeats to create the input from the same article/document')
type=str,
default='python',
help='backend of mask token, python, c++, numpy respectively')
parser.add_argument(
'--dupe_factor',
type=int,
default=1,
help='specifies how many times the preprocessor repeats to create the input from the same article/document')
parser.add_argument('--worker',type=int,default=32,help='number of process')
parser.add_argument('--worker',type=int,default=32,help='number of process')
parser.add_argument('--server_num',type=int,default=10,help='number of servers')
parser.add_argument('--server_num',type=int,default=10,help='number of servers')