Unverified Commit da6f4092 authored by AlexHe99's avatar AlexHe99 Committed by GitHub
Browse files

Update deploying_with_k8s.rst (#10922)

parent 25ebed2f
...@@ -162,7 +162,7 @@ To test the deployment, run the following ``curl`` command: ...@@ -162,7 +162,7 @@ To test the deployment, run the following ``curl`` command:
curl http://mistral-7b.default.svc.cluster.local/v1/completions \ curl http://mistral-7b.default.svc.cluster.local/v1/completions \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
-d '{ -d '{
"model": "facebook/opt-125m", "model": "mistralai/Mistral-7B-Instruct-v0.3",
"prompt": "San Francisco is a", "prompt": "San Francisco is a",
"max_tokens": 7, "max_tokens": 7,
"temperature": 0 "temperature": 0
...@@ -172,4 +172,4 @@ If the service is correctly deployed, you should receive a response from the vLL ...@@ -172,4 +172,4 @@ If the service is correctly deployed, you should receive a response from the vLL
Conclusion Conclusion
---------- ----------
Deploying vLLM with Kubernetes allows for efficient scaling and management of ML models leveraging GPU resources. By following the steps outlined above, you should be able to set up and test a vLLM deployment within your Kubernetes cluster. If you encounter any issues or have suggestions, please feel free to contribute to the documentation. Deploying vLLM with Kubernetes allows for efficient scaling and management of ML models leveraging GPU resources. By following the steps outlined above, you should be able to set up and test a vLLM deployment within your Kubernetes cluster. If you encounter any issues or have suggestions, please feel free to contribute to the documentation.
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment