codellama.md 5.02 KB
Newer Older
Lyu Han's avatar
Lyu Han committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# codellama

## Introduction

[codellama](https://github.com/facebookresearch/codellama) features enhanced coding capabilities. It can generate code and natural language about code, from both code and natural language prompts (e.g., “Write me a function that outputs the fibonacci sequence”). It can also be used for code completion and debugging. It supports many of the most popular programming languages used today, including Python, C++, Java, PHP, Typescript (Javascript), C#, Bash and more.

There are three sizes (7b, 13b, 34b) as well as three flavours (base model, Python fine-tuned, and instruction tuned) released on [HuggingFace](https://huggingface.co/codellama).

| Base Model                                                                      | Python                                                                                        | Instruct                                                                                          |
| ------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
| [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf)   | [codellama/CodeLlama-7b-Python-hf](https://huggingface.co/codellama/CodeLlama-7b-Python-hf)   | [codellama/CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf)   |
| [codellama/CodeLlama-13b-hf](https://huggingface.co/codellama/CodeLlama-13b-hf) | [codellama/CodeLlama-13b-Python-hf](https://huggingface.co/codellama/CodeLlama-13b-Python-hf) | [codellama/CodeLlama-13b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf) |
| [codellama/CodeLlama-34b-hf](https://huggingface.co/codellama/CodeLlama-34b-hf) | [codellama/CodeLlama-34b-Python-hf](https://huggingface.co/codellama/CodeLlama-34b-Python-hf) | [codellama/CodeLlama-34b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-34b-Instruct-hf) |

The correspondence between the model and capabilities is:

| models     | code completion | infilling         | instructions / chat | python specialist |
| ---------- | --------------- | ----------------- | ------------------- | ----------------- |
| Base Model | Y               | Y(7B,13B), N(34B) | N                   | N                 |
| Python     | Y               | N                 | N                   | Y                 |
| Instruct   | Y               | Y(7B,13B), N(34B) | Y                   | N                 |

## Inference

Based on the above table, download the model that meets your requirements. Execute the following command to interact with the model in the console:

```shell
# install lmdeploy
29
python3 -m pip install lmdeploy[all]
Lyu Han's avatar
Lyu Han committed
30
31

# convert weight layout
32
lmdeploy convert codellama /the/path/of/codellama/model
Lyu Han's avatar
Lyu Han committed
33
34
35
36
37
38
39
40
41
42
43
44
```

Then, you can communicate with codellama in consolo by following instructions in next sections

**Note**:

- minimum requirement of `transformers` is **v4.33.0**
- lmdeploy supports copying code blocks to the console. But you have to press enter, input "!!" and press enter again to end the prompt. The way to input prompt for other supported models keeps unchanged, i.e., double pressing enter.

### Completion

```shell
45
lmdeploy chat turbomind ./workspace --cap completion
Lyu Han's avatar
Lyu Han committed
46
47
48
49
50
```

### Infilling

```shell
51
lmdeploy chat turbomind ./workspace --cap infilling
Lyu Han's avatar
Lyu Han committed
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
```

The input code is supposed to have a special placeholder `<FILL>`. For example,

```
def remove_non_ascii(s: str) -> str:
    """ <FILL>
    return result
```

And the generated code piece by `turbomind.chat` is the one to be filled in `<FILL>`

### Chat

```
zhouxiang's avatar
zhouxiang committed
67
lmdeploy chat turbomind ./workspace --cap chat --meta-instruct "Provide answers in Python"
Lyu Han's avatar
Lyu Han committed
68
69
```

zhouxiang's avatar
zhouxiang committed
70
`--meta-instruct` instruction can be changed to other coding languages as long as codellama supports it
Lyu Han's avatar
Lyu Han committed
71
72
73
74

### Python specialist

```
75
lmdeploy chat turbomind ./workspace --cap python
Lyu Han's avatar
Lyu Han committed
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
```

Python fine-tuned model is highly recommended when 'python specialist' capability is required.

## Quantization

TBD

## Serving

**LMDeploy server only supports `chat` capabllity**. The res ones are going to be supported soon.

Launch inference server by:

```shell
# --tp: the number of GPUs used in tensor parallelism
zhouxiang's avatar
zhouxiang committed
92
lmdeploy serve api_server ./workspace --server-name ${server_ip} --server-port ${server_port} --tp 1
Lyu Han's avatar
Lyu Han committed
93
94
95
96
97
98
```

Then, you can communicate with it by command line,

```shell
# restful_api_url is what printed in api_server.py, e.g. http://localhost:23333
99
lmdeploy serve api_client api_server_url
Lyu Han's avatar
Lyu Han committed
100
101
102
103
104
```

or through webui after launching gradio,

```shell
105
# api_server_url is what printed in api_server.py, e.g. http://localhost:23333
Lyu Han's avatar
Lyu Han committed
106
# server_ip and server_port here are for gradio ui
zhouxiang's avatar
zhouxiang committed
107
108
# example: lmdeploy serve gradio http://localhost:23333 --server-name localhost --server-port 6006
lmdeploy serve gradio api_server_url --server-name ${gradio_ui_ip} --server-port ${gradio_ui_port}
Lyu Han's avatar
Lyu Han committed
109
110
```

zhouxiang's avatar
zhouxiang committed
111
Regarding the detailed information of RESTful API, you can refer to the [guide](../serving/api_server.md).