This folder provides several examples accelerated by Colossal-AI. The `tutorial` folder is for everyone to quickly try out the different features in Colossal-AI. Other folders such as `images` and `language` include a wide range of deep learning tasks and applications.
This folder provides several examples accelerated by Colossal-AI.
Folders such as `images` and `language` include a wide range of deep learning tasks and applications.
The `community` folder aim to create a collaborative platform for developers to contribute exotic features built on top of Colossal-AI.
The `tutorial` folder is for everyone to quickly try out the different features in Colossal-AI.
You can find applications such as Chatbot, Stable Diffusion and Biomedicine in the [Applications](https://github.com/hpcaitech/ColossalAI/tree/main/applications) directory.
You can find applications such as Chatbot, AIGC and Biomedicine in the [Applications](https://github.com/hpcaitech/ColossalAI/tree/main/applications) directory.
## Folder Structure
## Folder Structure
...
@@ -52,3 +55,10 @@ Therefore, it is essential for the example contributors to know how to integrate
...
@@ -52,3 +55,10 @@ Therefore, it is essential for the example contributors to know how to integrate
2. Configure your testing parameters such as number steps, batch size in `test_ci.sh`, e.t.c. Keep these parameters small such that each example only takes several minutes.
2. Configure your testing parameters such as number steps, batch size in `test_ci.sh`, e.t.c. Keep these parameters small such that each example only takes several minutes.
3. Export your dataset path with the prefix `/data` and make sure you have a copy of the dataset in the `/data/scratch/examples-data` directory on the CI machine. Community contributors can contact us via slack to request for downloading the dataset on the CI machine.
3. Export your dataset path with the prefix `/data` and make sure you have a copy of the dataset in the `/data/scratch/examples-data` directory on the CI machine. Community contributors can contact us via slack to request for downloading the dataset on the CI machine.
4. Implement the logic such as dependency setup and example execution
4. Implement the logic such as dependency setup and example execution
## Community Dependency
We are happy to introduce the following nice community dependency repos that are powered by Colossal-AI:
Community-driven Examples is an initiative that allows users to share their own examples to the Colossal-AI community, fostering a sense of community and making it easy for others to access and benefit from shared work. The primary goal with community-driven examples is to have a community-maintained collection of diverse and exotic functionalities built on top of the Colossal-AI package.
If a community example doesn't work as expected, you can [open an issue](https://github.com/hpcaitech/ColossalAI/issues/new/choose) and @ the author to report it.
| Example | Description | Code Example | Colab |Author |
Welcome to [open an issue](https://github.com/hpcaitech/ColossalAI/issues/new/choose) to share your insights and needs.
## How to get involved
To join our community-driven initiative, please visit the [Colossal-AI examples](https://github.com/hpcaitech/ColossalAI/tree/main/examples), review the provided information, and explore the codebase.
To contribute, create a new issue outlining your proposed feature or enhancement, and our team will review and provide feedback. If you are confident enough you can also submit a PR directly. We look forward to collaborating with you on this exciting project!
Input a sentence, return a processed sentence: In order to support the Chinese whole word mask, the words that are separated will be marked with a special mark ("#"), so that the subsequent processing module can know which words belong to the same word.
Input a sentence, return a processed sentence: In order to support the Chinese whole word mask, the words that are separated will be marked with a special mark ("#"), so that the subsequent processing module can know which words belong to the same word.
Get the initial training instances, split the whole segment into multiple parts according to the max_sequence_length, and return as multiple processed instances.
Get the initial training instances, split the whole segment into multiple parts according to the max_sequence_length, and return as multiple processed instances.
parser.add_argument('--max_predictions_per_seq',type=int,default=80,help='number of shards, e.g., 10, 50, or 100')
parser.add_argument('--max_predictions_per_seq',
type=int,
default=80,
help='number of shards, e.g., 10, 50, or 100')
parser.add_argument('--input_path',type=str,required=True,help='input path of shard which has split sentence')
parser.add_argument('--input_path',type=str,required=True,help='input path of shard which has split sentence')
parser.add_argument('--output_path',type=str,required=True,help='output path of h5 contains token id')
parser.add_argument('--output_path',type=str,required=True,help='output path of h5 contains token id')
parser.add_argument('--backend',type=str,default='python',help='backend of mask token, python, c++, numpy respectively')
parser.add_argument('--backend',
parser.add_argument('--dupe_factor',type=int,default=1,help='specifies how many times the preprocessor repeats to create the input from the same article/document')
type=str,
default='python',
help='backend of mask token, python, c++, numpy respectively')
parser.add_argument(
'--dupe_factor',
type=int,
default=1,
help='specifies how many times the preprocessor repeats to create the input from the same article/document')
parser.add_argument('--worker',type=int,default=32,help='number of process')
parser.add_argument('--worker',type=int,default=32,help='number of process')
parser.add_argument('--server_num',type=int,default=10,help='number of servers')
parser.add_argument('--server_num',type=int,default=10,help='number of servers')