--- title: FAQ description: Frequently asked questions --- ### General **Q: The trainer stopped and hasn't progressed in several minutes.** > A: Usually an issue with the GPUs communicating with each other. See the [NCCL doc](nccl.qmd) **Q: Exitcode -9** > A: This usually happens when you run out of system RAM. **Q: Exitcode -7 while using deepspeed** > A: Try upgrading deepspeed w: `pip install -U deepspeed` **Q: AttributeError: 'DummyOptim' object has no attribute 'step'** **Q: ModuleNotFoundError: No module named 'mpi4py' using single GPU with deepspeed** > A: You may be using deepspeed with single gpu. Please remove the `deepspeed:` section in the yaml file or `--deepspeed` CLI flag. **Q: The codes is stuck on saving preprocessed datasets.** > A: This is usually an issue with the GPU. This can be resolved through setting the os environment variable `CUDA_VISIBLE_DEVICES=0`. If you are on runpod, this is usually a pod issue. Starting a new pod should take care of it. **Q: Received mismatch error on merge adapters / loading adapters between torch.Size of checkpoint and model.** > A: This is likely due to vocab size mismatch. By default, Axolotl expands the model's embeddings if the tokenizer has more tokens than the model. Please use the `axolotl merge-lora` command to merge the adapters instead of using your own scripts. > On the other hand, if the model has more tokens than the tokenizer, Axolotl does not shrink the model's embeddings unless `shrink_embeddings: true` is set in the config. **Q: How to call Axolotl via custom python scripts?** > A: Since Axolotl is just Python, please see `src/axolotl/cli/main.py` on how each command is called. **Q: How to know the value to use for `fsdp_transformer_layer_cls_to_wrap`?** > A: This is the class name of the transformer layer to wrap with FSDP. For example, for `LlamaForCausalLM`, the value is `LlamaDecoderLayer`. To find this for a specific model, check the model's `PreTrainedModel` definition and look for `_no_split_modules` variable in the `modeling_.py` file within `transformers` library. **Q: ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token** > A: This is because the tokenizer does not have a padding token. Please add a padding token to the tokenizer via: > ```yaml > special_tokens: > # str. If you're not sure, set to same as `eos_token`. > pad_token: "..." > ``` ### Chat templates **Q: `jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'content' / 'role' / ____`** > A: This means that the property mapping for the stated attribute does not exist when building `chat_template` prompt. For example, if `no attribute 'content'`, please check you have added the correct mapping for `content` under `message_property_mappings`. **Q: `Empty template generated for turn ___`** > A: The `content` is empty for that turn. **Q: `Could not find content start/end boundary for turn __`** > A: The specific turn's start/end could not be detected. Please ensure you have set the `eos_token` following your `chat_template`. Otherwise, this could be a `chat_template` which doesn't use proper boundaries for each turn (like system). On the rare occurrence, make sure your content is not `[[dummy_message]]`. Please let us know about this. **Q: `Content end boundary is before start boundary for turn ___`** > A: This is an edge case which should not occur. Please create an Issue if this happens. **Q: `Content end boundary is the same as start boundary for turn ___. This is likely an empty turn.`** > A: This is likely an empty turn. **Q: The EOS token is incorrectly being masked or not being masked / `EOS token __ not found in chat template`.** > A: There can be two reasons: > 1. This is because of the mismatch between `tokenizer.eos_token` and EOS token in template. Please make sure to set `eos_token: ` under `special_tokens: ` to the same EOS token as in template. > 2. The EOS token is not in the template. Please check if your template is correct. As an example, `phi_35` template does not use its dedicated EOS token `<|endoftext|>` at the end. **Q: "`chat_template` choice is `tokenizer_default` but tokenizer's `chat_template` is null. Please add a `chat_template` in tokenizer config"** > A: This is because the tokenizer does not have a chat template. Please add a chat template in the tokenizer config. See [chat_template](dataset-formats/conversation.qmd#chat-template) for more details. **Q: The EOT token(s) are incorrectly being masked or not being masked / `EOT token __ not found in chat template`.** > A: There can be two reasons: > 1. The EOT token is different from the EOS token and was not specified under `eot_tokens: `. Please set `eot_tokens: ` to the same EOT token(s) as in template. > 2. There is more than one EOT token per turn in the template. Please raise an issue with examples as we recognize this as an edge case. **Q: `EOT token encoding failed. Please check if the token is valid and can be encoded.`** > A: There could be some issue with the tokenizer or unicode encoding. Please raise an issue with examples with the EOT token & tokenizer causing the issue. **Q: `EOT token __ is encoded as multiple tokens.`** > A: This is because the EOT token is encoded as multiple tokens which can cause unexpected behavior. Please add it under `tokens: ` or (recommended) override unused added_tokens via `added_tokens_overrides: `. **Q: `Conflict between train_on_eos and train_on_eot. eos_token is in eot_tokens and train_on_eos != train_on_eot`** > A: This is because the EOS token is in the `eot_tokens: ` while mismatch between `train_on_eos: ` and `train_on_eot: `. This will cause one to override the other. Please ensure that `train_on_eos: ` and `train_on_eot: ` are the same or remove the EOS token from `eot_tokens: `. **Q: If `eot_tokens: ` is not provided, what happens?** > A: If `eot_tokens: ` is not provided, the default behavior is the same as before. EOS tokens used to delimit turns are masked/unmasked depending on whether the turn is trainable. > Internally, `eot_tokens: tokenizer.eos_token` and `train_on_eot: train_on_eos` (which defaults to `turn`). This transition helps clarify the naming and behavior of EOT/EOS tokens. **Q: `Data processing error: CAS service error`** > A: Try disabling XET with `export HF_HUB_DISABLE_XET=1` **Q: `torch._inductor.exc.LoweringException: NoValidChoicesError: No choices to select, please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. `** > A: Depending on the version of torch, you may need to include this in your YAML: > ```yaml > flex_attn_compile_kwargs: > dynamic: false > mode: max-autotune-no-cudagraphs > ```