# MLX-LM

:::{attention}
To be updated for Qwen3.
:::

[mlx-lm](https://github.com/ml-explore/mlx-examples/tree/main/llms) helps you run LLMs locally on Apple Silicon. 
It is available at MacOS. 
It has already supported Qwen models and this time, we have also provided checkpoints that you can directly use with it.

## Prerequisites

The easiest way to get started is to install the `mlx-lm` package:

- with `pip`:

  ```bash
  pip install mlx-lm
  ```

- with `conda`:

  ```bash
  conda install -c conda-forge mlx-lm
  ```

## Running with Qwen MLX Files

We provide model checkpoints with `mlx-lm` in our Hugging Face organization, and to search for what you need you can search the repo names with `-MLX`.

Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.

```python
from mlx_lm import load, generate

model, tokenizer = load('Qwen/Qwen2.5-7B-Instruct-MLX', tokenizer_config={"eos_token": "<|im_end|>"})

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

response = generate(model, tokenizer, prompt=text, verbose=True, top_p=0.8, temp=0.7, repetition_penalty=1.05, max_tokens=512)
```

## Make Your MLX files

You can make mlx files with just one command:

```bash
mlx_lm.convert --hf-path Qwen/Qwen2.5-7B-Instruct --mlx-path mlx/Qwen2.5-7B-Instruct/ -q
```

where

- `--hf-path`: the model name on Hugging Face Hub or the local path
- `--mlx-path`: the path for output files
- `-q`: enable quantization