Unverified Commit 0da00be5 authored by drbh's avatar drbh Committed by GitHub
Browse files

feat: add ie update to message docs (#1523)



update messages api docs and add Hugging Face Inference Endpoints
integrations section/instructions

---------
Co-authored-by: default avatarPhilipp Schmid <32632186+philschmid@users.noreply.github.com>
parent 3ab578b4
......@@ -4,6 +4,15 @@ Text Generation Inference (TGI) now supports the Messages API, which is fully co
> **Note:** The Messages API is supported from TGI version 1.4.0 and above. Ensure you are using a compatible version to access this feature.
#### Table of Contents
- [Making a Request](#making-a-request)
- [Streaming](#streaming)
- [Synchronous](#synchronous)
- [Hugging Face Inference Endpoints](#hugging-face-inference-endpoints)
- [Cloud Providers](#cloud-providers)
- [Amazon SageMaker](#amazon-sagemaker)
## Making a Request
You can make a request to TGI's Messages API using `curl`. Here's an example:
......@@ -81,6 +90,38 @@ chat_completion = client.chat.completions.create(
print(chat_completion)
```
## Hugging Face Inference Endpoints
The Messages API is integrated with [Inference Endpoints](https://huggingface.co/inference-endpoints/dedicated).
Every endpoint that uses "Text Generation Inference" with an LLM, which has a chat template can now be used. Below is an example of how to use IE with TGI using OpenAI's Python client library:
> **Note:** Make sure to replace `base_url` with your endpoint URL and to include `v1/` at the end of the URL. The `api_key` should be replaced with your Hugging Face API key.
```python
from openai import OpenAI
# init the client but point it to TGI
client = OpenAI(
# replace with your endpoint url, make sure to include "v1/" at the end
base_url="https://vlzz10eq3fol3429.us-east-1.aws.endpoints.huggingface.cloud/v1/",
# replace with your API key
api_key="hf_XXX"
)
chat_completion = client.chat.completions.create(
model="tgi",
messages=[
{"role": "system", "content": "You are a helpful assistant." },
{"role": "user", "content": "What is deep learning?"}
],
stream=True
)
# iterate and print stream
for message in chat_completion:
print(message.choices[0].delta.content, end="")
```
## Cloud Providers
TGI can be deployed on various cloud providers for scalable and robust text generation. One such provider is Amazon SageMaker, which has recently added support for TGI. Here's how you can deploy TGI on Amazon SageMaker:
......@@ -114,7 +155,7 @@ hub = {
huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="1.4.0"),
env=hub,
role=role,
role=role,
)
# deploy model to SageMaker Inference
......@@ -123,7 +164,7 @@ predictor = huggingface_model.deploy(
instance_type="ml.g5.2xlarge",
container_startup_health_check_timeout=300,
)
# send request
predictor.predict({
"messages": [
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment