# 🦙 Tutorial: LLaMA 4 Multi-Concurrency Support with KTransformers (Balance Serve Backend)
## 📌 Overview
We are pleased to announce that **KTransformers** now provides **experimental support for LLaMA 4 models** through the powerful `balance_serve` backend introduced in **v0.2.4**. This update is available under the dedicated development branch: [`support-llama4`](https://github.com/kvcache-ai/ktransformers/tree/support-llama4), specifically targeting the newly released **Meta LLaMA 4** model architecture.
⚠️ This support is currently **not available on the main branch** due to dependencies on newer versions of `transformers`, and **compatibility limitations with inference of currently supported models**. Work is underway to integrate this into the mainline once broader stability and compatibility are validated.
💡 **If you already have an environment based on the main branch**, it is **strongly recommended to create a new environment** to avoid potential dependency conflicts.
------
## 🔗 Model & Resource Links
- 🔥 Official LLaMA 4 Release: https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct
(Note: LLaMA 4 models are served through the Meta repository. Make sure to **agree to terms** before downloading.)
curl -X POST http://localhost:10002/v1/chat/completions \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "hello"}
],
"model": "Llama4",
"temperature": 0.3,
"top_p": 1.0,
"stream": true
}'
```
------
## 📌 Limitations
- ✅ **Only `balance_serve` backend is supported** for LLaMA 4 models in this version.
- ⚠️ Requires **`transformers==4.51.0`** or newer. Due to potential compatibility issues with older toolchains, we have **not merged this branch to main yet**.
- ❌ Multimodal models are not supported yet in this version. Support will be added in future releases.