[doc] Fix typo under colossalai and doc(#3618)

* Fixed several spelling errors under colossalai * Fix the spelling error in colossalai and docs directory * Cautious Changed the spelling error under the example folder * Update runtime_preparation_pass.py revert autograft to autograd * Update search_chunk.py utile to until * Update check_installation.py change misteach to mismatch in line 91 * Update 1D_tensor_parallel.md revert to perceptron * Update 2D_tensor_parallel.md revert to perceptron in line 73 * Update 2p5D_tensor_parallel.md revert to perceptron in line 71 * Update 3D_tensor_parallel.md revert to perceptron in line 80 * Update README.md revert to resnet in line 42 * Update reorder_graph.py revert to indice in line 7 * Update p2p.py revert to megatron in line 94 * Update initialize.py revert to torchrun in line 198 * Update routers.py change to detailed in line 63 * Update routers.py change to detailed in line 146 * Update README.md revert random number in line 402

[doc] Fix typo under colossalai and doc(#3618)
* Fixed several spelling errors under colossalai * Fix the spelling error in colossalai and docs directory * Cautious Changed the spelling error under the example folder * Update runtime_preparation_pass.py revert autograft to autograd * Update search_chunk.py utile to until * Update check_installation.py change misteach to mismatch in line 91 * Update 1D_tensor_parallel.md revert to perceptron * Update 2D_tensor_parallel.md revert to perceptron in line 73 * Update 2p5D_tensor_parallel.md revert to perceptron in line 71 * Update 3D_tensor_parallel.md revert to perceptron in line 80 * Update README.md revert to resnet in line 42 * Update reorder_graph.py revert to indice in line 7 * Update p2p.py revert to megatron in line 94 * Update initialize.py revert to torchrun in line 198 * Update routers.py change to detailed in line 63 * Update routers.py change to detailed in line 146 * Update README.md revert random number in line 402
b9a8dff7 · digger-yu · GitHub · e1b0a78a · b9a8dff7 · b9a8dff7
Unverified Commit b9a8dff7 authored Apr 26, 2023 by digger-yu Committed by GitHub Apr 26, 2023
12 changed files
--- a/docs/source/en/features/gradient_accumulation.md
+++ b/docs/source/en/features/gradient_accumulation.md
@@ -28,7 +28,7 @@ gradient_accumulation = <int>
 ## Hands-on Practice
 We provide a [runnable example](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/features/gradient_accumulation)
-to demonstrate gradient accumulation. In this example, we set the gradinet accumulation size to be 4. You can run the script using this command:
+to demonstrate gradient accumulation. In this example, we set the gradient accumulation size to be 4. You can run the script using this command:
 ```shell
 python -m torch.distributed.launch --nproc_per_node 1 --master_addr localhost --master_port 29500  run_resnet_cifar10_with_engine.py

--- a/docs/source/en/features/mixed_precision_training.md
+++ b/docs/source/en/features/mixed_precision_training.md
@@ -101,7 +101,7 @@ you can use `colossalai.amp.convert_to_amp`.
 ```python
 from colossalai.amp import AMP_TYPE
-# exmaple of using torch amp
+# example of using torch amp
 model, optimizer, criterion = colossalai.amp.convert_to_amp(model,
                                                            optimizer,
                                                            criterion,
@@ -220,7 +220,7 @@ The default parameters of Naive AMP:
 - initial_scale(int): initial scale of gradient scaler
 - growth_factor(int): the growth rate of loss scale
 - backoff_factor(float): the decrease rate of loss scale
- hysterisis(int): delay shift in dynamic loss scaling
+- hysteresis(int): delay shift in dynamic loss scaling
 - max_scale(int): maximum loss scale allowed
 - verbose(bool): if set to `True`, will print debug info
@@ -292,7 +292,7 @@ colossalai.launch_from_torch(config=args.config)
 ### Step 4. Create training components
 Build your model, optimizer, loss function, lr scheduler and dataloaders. Note that the root path of the dataset is
-obtained from the environment varialbe `DATA`. You may `export DATA=/path/to/data` or change `Path(os.environ['DATA'])`
+obtained from the environment variable `DATA`. You may `export DATA=/path/to/data` or change `Path(os.environ['DATA'])`
 to a path on your machine. Data will be automatically downloaded to the root path.
 ```python
@@ -326,7 +326,7 @@ to a path on your machine. Data will be automatically downloaded to the root pat
    # build loss
    criterion = torch.nn.CrossEntropyLoss()
-    # lr_scheduelr
+    # lr_scheduler
    lr_scheduler = LinearWarmupLR(optimizer, warmup_steps=50, total_steps=gpc.config.NUM_EPOCHS)
 ```

--- a/docs/source/en/features/nvme_offload.md
+++ b/docs/source/en/features/nvme_offload.md
@@ -57,7 +57,7 @@ It's compatible with all parallel methods in ColossalAI.
 Let's start from two simple examples -- training GPT with different methods. These examples relies on `transformers`.
-We should install denpendencies first:
+We should install dependencies first:
 ```shell
 pip install psutil transformers
@@ -99,7 +99,7 @@ class GPTLMLoss(nn.Module):
                            shift_labels.view(-1))
 ```
-And we define some utility functions, which generates random data, computes the number of paramters of a model and get memory usage of current process:
+And we define some utility functions, which generates random data, computes the number of parameters of a model and get memory usage of current process:
 ```python
 def get_data(batch_size: int, seq_len: int,
@@ -251,7 +251,7 @@ Time: 3.691 s
 Mem usage: 5298.344 MB
 ```
-NVME offload saves about 294 MB memory. Note that enabling `pin_memory` of Gemini can accelerate training but increase memory usage. So this result also meets our expectation. If we disable `pin_memory`, we can aslo observe a memory usage drop about 900 MB.
+NVME offload saves about 294 MB memory. Note that enabling `pin_memory` of Gemini can accelerate training but increase memory usage. So this result also meets our expectation. If we disable `pin_memory`, we can also observe a memory usage drop about 900 MB.
 ## API Reference

--- a/docs/source/en/features/zero_with_chunk.md
+++ b/docs/source/en/features/zero_with_chunk.md
@@ -32,11 +32,11 @@ and the first and second momentum estimates) are partitioned across the processe
 3. **Shard Parameter**: The 16-bit model parameters are partitioned across the processes of a data parallel group.
-4. **[Gemini](../advanced_tutorials/meet_gemini.md)**: Dynamic heterogeneous memory space manager for paramters, gradients and optimizer states.
+4. **[Gemini](../advanced_tutorials/meet_gemini.md)**: Dynamic heterogeneous memory space manager for parameters, gradients and optimizer states.
 Besides, this article will introduce the Zero Redundancy Optimizer with chunk-based memory management.
-When using ZeRO, we distributed the model by sharding the parameters. The advantage of this method is that the memory of each node is load balanced. But this approach has two significiant disadvantages. First, during communication, a temporary memory buffer needs to be allocated and released afterwards, leading to the memory fragmentation problem. Secondly, using tensor as the granularity for communication will cause the network bandwidth underutilized. Generally, the longer the transmitted message length, the higher the bandwidth utilization.
+When using ZeRO, we distributed the model by sharding the parameters. The advantage of this method is that the memory of each node is load balanced. But this approach has two significant disadvantages. First, during communication, a temporary memory buffer needs to be allocated and released afterwards, leading to the memory fragmentation problem. Secondly, using tensor as the granularity for communication will cause the network bandwidth underutilized. Generally, the longer the transmitted message length, the higher the bandwidth utilization.
 Using the Chunk mechanism introduced in ColossalAI v0.1.8, we can improve the efficiency of ZeRO. We store a continuous set of parameters in initialization order into a Chunk (a chunk is a continuous memory space), and each Chunk has the same size. Organizing memory in chunks can lead to efficient use of network bandwidth between PCI-e and GPU-GPU, reduce the number of communications, and avoid potential memory fragmentation.

--- a/examples/images/diffusion/ldm/data/teyvat.py
+++ b/examples/images/diffusion/ldm/data/teyvat.py
@@ -13,7 +13,7 @@ from datasets import load_dataset
 def make_multi_folder_data(paths, caption_files=None, **kwargs):
    """Make a concat dataset from multiple folders
-    Don't suport captions yet
+    Don't support captions yet
    If paths is a list, that's ok, if it's a Dict interpret it as:
    k=folder v=n_times to repeat that
    """

--- a/examples/images/diffusion/main.py
+++ b/examples/images/diffusion/main.py
@@ -40,7 +40,7 @@ class DataLoaderX(DataLoader):
 # A custom data loader class that inherits from DataLoader
    def __iter__(self):
        # Overriding the __iter__ method of DataLoader to return a BackgroundGenerator
-        #This is to enable data laoding in the background to improve training performance
+        #This is to enable data loading in the background to improve training performance
        return BackgroundGenerator(super().__iter__())
@@ -60,7 +60,7 @@ def get_parser(**parser_kwargs):
    # Create an ArgumentParser object with specifies kwargs
    parser = argparse.ArgumentParser(**parser_kwargs)
-    # Add vairous command line arguments with their default balues and descriptions
+    # Add various command line arguments with their default values and descriptions
    parser.add_argument(
        "-n",
        "--name",
@@ -162,7 +162,7 @@ def get_parser(**parser_kwargs):
 # A function that returns the non-default arguments between two objects
 def nondefault_trainer_args(opt):
-    # create an argument parsser
+    # create an argument parser
    parser = argparse.ArgumentParser()
    # add pytorch lightning trainer default arguments
    parser = Trainer.add_argparse_args(parser)
@@ -203,7 +203,7 @@ def worker_init_fn(_):
    else:
        return np.random.seed(np.random.get_state()[1][0] + worker_id)
-#Provide functionality for creating data loadedrs based on provided dataset configurations
+#Provide functionality for creating data loaders based on provided dataset configurations
 class DataModuleFromConfig(pl.LightningDataModule):
    def __init__(self,
@@ -255,7 +255,7 @@ class DataModuleFromConfig(pl.LightningDataModule):
    def _train_dataloader(self):
        #Check if the train dataset is iterable
        is_iterable_dataset = isinstance(self.datasets['train'], Txt2ImgIterableBaseDataset)
-        #Set the worker initialization function of the dataset isiterable or use_worker_init_fn is True
+        #Set the worker initialization function of the dataset is iterable or use_worker_init_fn is True
        if is_iterable_dataset or self.use_worker_init_fn:
            init_fn = worker_init_fn
        else:
@@ -310,7 +310,7 @@ class DataModuleFromConfig(pl.LightningDataModule):
 class SetupCallback(Callback):
-    # I nitialize the callback with the necessary parameters
+    # Initialize the callback with the necessary parameters
    def __init__(self, resume, now, logdir, ckptdir, cfgdir, config, lightning_config):
        super().__init__()
@@ -371,7 +371,7 @@ class SetupCallback(Callback):
    #         trainer.save_checkpoint(ckpt_path)
-# PyTorch Lightning callback for ogging images during training and validation of a deep learning model
+# PyTorch Lightning callback for logging images during training and validation of a deep learning model
 class ImageLogger(Callback):
    def __init__(self,
@@ -379,10 +379,10 @@ class ImageLogger(Callback):
                 max_images,      # Maximum number of images to log
                 clamp=True,      # Whether to clamp pixel values to [-1,1]
                 increase_log_steps=True,   # Whether to increase frequency of log steps exponentially
-                 rescale=True,    # Whetehr to rescale pixel values to [0,1]
+                 rescale=True,    # Whether to rescale pixel values to [0,1]
                 disabled=False,  # Whether to disable logging
-                 log_on_batch_idx=False,   # Whether to log on baych index instead of global step
+                 log_on_batch_idx=False,   # Whether to log on batch index instead of global step
-                 log_first_step=False,     # Whetehr to log on the first step
+                 log_first_step=False,     # Whether to log on the first step
                 log_images_kwargs=None):  # Additional keyword arguments to pass to log_images method
        super().__init__()
        self.rescale = rescale
@@ -593,7 +593,7 @@ if __name__ == "__main__":
    parser = Trainer.add_argparse_args(parser)
    opt, unknown = parser.parse_known_args()
-    # Veirfy the arguments are both specified
+    # Verify the arguments are both specified
    if opt.name and opt.resume:
        raise ValueError("-n/--name and -r/--resume cannot be specified both."
                         "If you want to resume training in a new log folder, "
@@ -646,7 +646,7 @@ if __name__ == "__main__":
    # Sets the seed for the random number generator to ensure reproducibility
    seed_everything(opt.seed)
-    # Intinalize and save configuratioon using teh OmegaConf library. 
+    # Initialize and save configuration using teh OmegaConf library. 
    try:
        # init and save configs
        configs = [OmegaConf.load(cfg) for cfg in opt.base]

--- a/examples/images/dreambooth/README.md
+++ b/examples/images/dreambooth/README.md
@@ -61,7 +61,7 @@ torchrun --nproc_per_node 2 train_dreambooth_colossalai.py \
 - `INSTANCE_DIR` refers to personalized path to instance images, you might need to insert information here.
 - `OUTPUT_DIR` refers to local path to save the trained model, you might need to find a path with enough space.
 - `resolution` refers to the corresponding resolution number of your target model. Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.
- `placement`  refers to the training strategy supported by Colossal AI, defult = 'cuda', which refers to loading all the parameters into cuda memory. On the other hand, 'cpu' refers to 'cpu offload' strategy while 'auto' enables 'Gemini', both featured by Colossal AI.
+- `placement`  refers to the training strategy supported by Colossal AI, default = 'cuda', which refers to loading all the parameters into cuda memory. On the other hand, 'cpu' refers to 'cpu offload' strategy while 'auto' enables 'Gemini', both featured by Colossal AI.
 ### Training with prior-preservation loss

--- a/examples/language/gpt/README.md
+++ b/examples/language/gpt/README.md
@@ -40,7 +40,7 @@ We provide two stable solutions.
 One utilizes the Gemini to implement hybrid parallel strategies of Gemini, DDP/ZeRO, and Tensor Parallelism for a huggingface GPT model.
 The other one use [Titans](https://github.com/hpcaitech/Titans), a distributed executed model zoo maintained by ColossalAI,to implement the hybrid parallel strategies of TP + ZeRO + PP.
-We recommend using Gemini to qucikly run your model in a distributed manner.
+We recommend using Gemini to quickly run your model in a distributed manner.
 It doesn't require significant changes to the model structures, therefore you can apply it on a new model easily.
 And use Titans as an advanced weapon to pursue a more extreme performance.
 Titans has included the some typical models, such as Vit and GPT.

--- a/examples/language/gpt/experiments/auto_offload/README.md
+++ b/examples/language/gpt/experiments/auto_offload/README.md
@@ -27,7 +27,7 @@ pip install transformers
 ## Dataset
-For simplicity, the input data is randonly generated here.
+For simplicity, the input data is randomly generated here.
 ## Training

--- a/examples/language/gpt/experiments/auto_parallel/README.md
+++ b/examples/language/gpt/experiments/auto_parallel/README.md
@@ -34,7 +34,7 @@ conda install -c conda-forge coin-or-cbc
 ## Dataset
-For simplicity, the input data is randonly generated here.
+For simplicity, the input data is randomly generated here.
 ## Training

--- a/examples/language/gpt/experiments/pipeline_parallel/README.md
+++ b/examples/language/gpt/experiments/pipeline_parallel/README.md
@@ -27,7 +27,7 @@ pip install transformers
 ## Dataset
-For simplicity, the input data is randonly generated here.
+For simplicity, the input data is randomly generated here.
 ## Training

--- a/examples/language/opt/train_gemini_opt.py
+++ b/examples/language/opt/train_gemini_opt.py
@@ -163,7 +163,7 @@ def main():
    else:
        init_dev = get_current_device()
-    # shard init prameters
+    # shard init parameters
    if args.shardinit:
        logger.info("Sharding initialization !", ranks=[0])
    else:
@@ -192,7 +192,7 @@ def main():
                                                   config=config,
                                                   local_files_only=False)
-    # enable graident checkpointing
+    # enable gradient checkpointing
    model.gradient_checkpointing_enable()
    numel = sum([p.numel() for p in model.parameters()])