--- title: "Inference and Merging" format: html: toc: true toc-depth: 3 number-sections: true execute: enabled: false --- This guide covers how to use your trained models for inference, including model loading, interactive testing, merging adapters, and common troubleshooting steps. ## Quick Start {#sec-quickstart} ::: {.callout-tip} Use the same config used for training on inference/merging. ::: ### Basic Inference {#sec-basic} ::: {.panel-tabset} ## LoRA Models ```{.bash} axolotl inference your_config.yml --lora-model-dir="./lora-output-dir" ``` ## Full Fine-tuned Models ```{.bash} axolotl inference your_config.yml --base-model="./completed-model" ``` ::: ## Advanced Usage {#sec-advanced} ### Gradio Interface {#sec-gradio} Launch an interactive web interface: ```{.bash} axolotl inference your_config.yml --gradio ``` ### File-based Prompts {#sec-file-prompts} Process prompts from a text file: ```{.bash} cat /tmp/prompt.txt | axolotl inference your_config.yml \ --base-model="./completed-model" --prompter=None ``` ### Memory Optimization {#sec-memory} For large models or limited memory: ```{.bash} axolotl inference your_config.yml --load-in-8bit=True ``` ## Merging LoRA Weights {#sec-merging} Merge LoRA adapters with the base model: ```{.bash} axolotl merge-lora your_config.yml --lora-model-dir="./completed-model" ``` ### Memory Management for Merging {#sec-memory-management} ::: {.panel-tabset} ## Configuration Options ```{.yaml} gpu_memory_limit: 20GiB # Adjust based on your GPU lora_on_cpu: true # Process on CPU if needed ``` ## Force CPU Merging ```{.bash} CUDA_VISIBLE_DEVICES="" axolotl merge-lora ... ``` ::: ## Tokenization {#sec-tokenization} ### Common Issues {#sec-tokenization-issues} ::: {.callout-warning} Tokenization mismatches between training and inference are a common source of problems. ::: To debug: 1. Check training tokenization: ```{.bash} axolotl preprocess your_config.yml --debug ``` 2. Verify inference tokenization by decoding tokens before model input 3. Compare token IDs between training and inference ### Special Tokens {#sec-special-tokens} Configure special tokens in your YAML: ```{.yaml} special_tokens: bos_token: "" eos_token: "" unk_token: "" tokens: - "<|im_start|>" - "<|im_end|>" ``` ## Troubleshooting {#sec-troubleshooting} ### Common Problems {#sec-common-problems} ::: {.panel-tabset} ## Memory Issues - Use 8-bit loading - Reduce batch sizes - Try CPU offloading ## Token Issues - Verify special tokens - Check tokenizer settings - Compare training and inference preprocessing ## Performance Issues - Verify model loading - Check prompt formatting - Ensure temperature/sampling settings ::: For more details, see our [debugging guide](debugging.qmd).