Update README.md

fe5cd1fc · SAC_ac1ua3v7iw · 10ff5541 · fe5cd1fc
Commit fe5cd1fc authored Mar 25, 2026 by SAC_ac1ua3v7iw
Show whitespace changes
Inline Side-by-side

Showing with 108 additions and 2 deletions

README.md README.md +108 -2

No files found.
--- a/README.md
+++ b/README.md
-# liger-kernel
+<a name="readme-top"></a>
+# Liger Kernel: Efficient Triton Kernels for LLM Training
+## Key Features
+- **Ease of use:** Simply patch your Hugging Face model with one line of code, or compose your own model using our Liger Kernel modules.
+- **Time and memory efficient:** In the same spirit as Flash-Attn, but for layers like **RMSNorm**, **RoPE**, **SwiGLU**, and **CrossEntropy**! Increases multi-GPU training throughput by 20% and reduces memory usage by 60% with **kernel fusion**, **in-place replacement**, and **chunking** techniques.
+- **Exact:** Computation is exact—no approximations! Both forward and backward passes are implemented with rigorous unit tests and undergo convergence testing against training runs without Liger Kernel to ensure accuracy.
+- **Lightweight:** Liger Kernel has minimal dependencies, requiring only Torch and Triton—no extra libraries needed! Say goodbye to dependency headaches!
+- **Multi-GPU supported:** Compatible with multi-GPU setups (PyTorch FSDP, DeepSpeed, DDP, etc.).
+- **Trainer Framework Integration**: [Axolotl](https://github.com/axolotl-ai-cloud/axolotl), [LLaMa-Factory](https://github.com/hiyouga/LLaMA-Factory), [SFTTrainer](https://github.com/huggingface/trl/releases/tag/v0.10.1), [Hugging Face Trainer](https://github.com/huggingface/transformers/pull/32860), [SWIFT](https://github.com/modelscope/ms-swift), [oumi](https://github.com/oumi-ai/oumi/tree/main)
+## Installation
+### Dependencies
+- `torch >= 2.1.2`
+- `triton >= 2.3.0`
+### Optional Dependencies
+- `transformers >= 4.x`: Required if you plan to use the transformers models patching APIs. The specific model you are working will dictate the minimum version of transformers.
+> **Note:**
+> Our kernels inherit the full spectrum of hardware compatibility offered by [Triton](https://github.com/triton-lang/triton).
+To install from source:
+```bash
+git clone https://github.com/linkedin/Liger-Kernel.git
+cd Liger-Kernel
+# Install Default Dependencies
+# Setup.py will detect whether you are using AMD or NVIDIA
+pip install -e .
+# Setup Development Dependencies
+pip install -e ".[dev]"
+# NOTE -> For AMD users only
+pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3/
+```
+## Getting Started
+There are a couple of ways to apply Liger kernels, depending on the level of customization required.
+### 1. Use AutoLigerKernelForCausalLM
+Using the `AutoLigerKernelForCausalLM` is the simplest approach, as you don't have to import a model-specific patching API. If the model type is supported, the modeling code will be automatically patched using the default settings.
+```python
+from liger_kernel.transformers import AutoLigerKernelForCausalLM
+# This AutoModel wrapper class automatically monkey-patches the
+# model with the optimized Liger kernels if the model is supported.
+model = AutoLigerKernelForCausalLM.from_pretrained("path/to/some/model")
+```
+### 2. Apply Model-Specific Patching APIs
+Using the [patching APIs](#patching), you can swap Hugging Face models with optimized Liger Kernels.
+```python
+import transformers
+from liger_kernel.transformers import apply_liger_kernel_to_llama
+# 1a. Adding this line automatically monkey-patches the model with the optimized Liger kernels
+apply_liger_kernel_to_llama()
+# 1b. You could alternatively specify exactly which kernels are applied
+apply_liger_kernel_to_llama(
+  rope=True,
+  swiglu=True,
+  cross_entropy=True,
+  fused_linear_cross_entropy=False,
+  rms_norm=False
+)
+# 2. Instantiate patched model
+model = transformers.AutoModelForCausalLM("path/to/llama/model")
+```
+### 3. Compose Your Own Model
+You can take individual [kernels](https://github.com/linkedin/Liger-Kernel?tab=readme-ov-file#model-kernels) to compose your models.
+```python
+from liger_kernel.transformers import LigerFusedLinearCrossEntropyLoss
+import torch.nn as nn
+import torch
+model = nn.Linear(128, 256).cuda()
+# fuses linear + cross entropy layers together and performs chunk-by-chunk computation to reduce memory
+loss_fn = LigerFusedLinearCrossEntropyLoss()
+input = torch.randn(4, 128, requires_grad=True, device="cuda")
+target = torch.randint(256, (4, ), device="cuda")
+loss = loss_fn(model.weight, input, target)
+loss.backward()
+```
-An Efficient Triton Kernels for LLM Training
\ No newline at end of file