Commit 81a882ad authored by jixx's avatar jixx
Browse files

add tgi2.4.0

parent 9822d7f6
......@@ -55,7 +55,9 @@ Options:
## QUANTIZE
```shell
--quantize <QUANTIZE>
Whether you want the model to be quantized
Quantization method to use for the model. It is not necessary to specify this option for pre-quantized models, since the quantization method is read from the model configuration.
Marlin kernels will be used automatically for GPTQ/AWQ models.
[env: QUANTIZE=]
......@@ -87,6 +89,15 @@ Options:
[env: DTYPE=]
[possible values: float16, bfloat16]
```
## KV_CACHE_DTYPE
```shell
--kv-cache-dtype <KV_CACHE_DTYPE>
Specify the dtype for the key-value cache. When this option is not provided, the dtype of the model is used (typically `float16` or `bfloat16`). Currently the only supported value are `fp8_e4m3fn` and `fp8_e5m2` on CUDA
[env: KV_CACHE_DTYPE=]
[possible values: fp8_e4m3fn, fp8_e5m2]
```
## TRUST_REMOTE_CODE
```shell
......@@ -349,6 +360,12 @@ Options:
--cors-allow-origin <CORS_ALLOW_ORIGIN>
[env: CORS_ALLOW_ORIGIN=]
```
## API_KEY
```shell
--api-key <API_KEY>
[env: API_KEY=]
```
## WATERMARK_GAMMA
```shell
......@@ -424,6 +441,20 @@ Options:
[env: LORA_ADAPTERS=]
```
## USAGE_STATS
```shell
--usage-stats <USAGE_STATS>
Control if anonymous usage stats are collected. Options are "on", "off" and "no-stack" Defaul is on
[env: USAGE_STATS=]
[default: on]
Possible values:
- on: Default option, usage statistics are collected anonymously
- off: Disables all collection of usage statistics
- no-stack: Doesn't send the error stack trace or error type, but allows sending a crash event
```
## HELP
```shell
......
# Metrics
TGI exposes multiple metrics that can be collected via the `/metrics` Prometheus endpoint.
These metrics can be used to monitor the performance of TGI, autoscale deployment and to help identify bottlenecks.
The following metrics are exposed:
| Metric Name | Description | Type | Unit |
|--------------------------------------------|------------------------------------------------------------------------------------------|-----------|---------|
| `tgi_batch_current_max_tokens` | Maximum tokens for the current batch | Gauge | Count |
| `tgi_batch_current_size` | Current batch size | Gauge | Count |
| `tgi_batch_decode_duration` | Time spent decoding a batch per method (prefill or decode) | Histogram | Seconds |
| `tgi_batch_filter_duration` | Time spent filtering batches and sending generated tokens per method (prefill or decode) | Histogram | Seconds |
| `tgi_batch_forward_duration` | Batch forward duration per method (prefill or decode) | Histogram | Seconds |
| `tgi_batch_inference_count` | Inference calls per method (prefill or decode) | Counter | Count |
| `tgi_batch_inference_duration` | Batch inference duration | Histogram | Seconds |
| `tgi_batch_inference_success` | Number of successful inference calls per method (prefill or decode) | Counter | Count |
| `tgi_batch_next_size` | Batch size of the next batch | Histogram | Count |
| `tgi_queue_size` | Current queue size | Gauge | Count |
| `tgi_request_count` | Total number of requests | Counter | Count |
| `tgi_request_duration` | Total time spent processing the request (e2e latency) | Histogram | Seconds |
| `tgi_request_generated_tokens` | Generated tokens per request | Histogram | Count |
| `tgi_request_inference_duration` | Request inference duration | Histogram | Seconds |
| `tgi_request_input_length` | Input token length per request | Histogram | Count |
| `tgi_request_max_new_tokens` | Maximum new tokens per request | Histogram | Count |
| `tgi_request_mean_time_per_token_duration` | Mean time per token per request (inter-token latency) | Histogram | Seconds |
| `tgi_request_queue_duration` | Time spent in the queue per request | Histogram | Seconds |
| `tgi_request_skipped_tokens` | Speculated tokens per request | Histogram | Count |
| `tgi_request_success` | Number of successful requests | Counter | |
| `tgi_request_validation_duration` | Time spent validating the request | Histogram | Seconds |
# Supported Models and Hardware
# Supported Models
Text Generation Inference enables serving optimized models on specific hardware for the highest performance. The following sections list which models are hardware are supported.
## Supported Models
Text Generation Inference enables serving optimized models. The following sections list which models (VLMs & LLMs) are supported.
- [Deepseek V2](https://huggingface.co/deepseek-ai/DeepSeek-V2)
- [Idefics 2](https://huggingface.co/HuggingFaceM4/idefics2-8b) (Multimodal)
- [Llava Next (1.6)](https://huggingface.co/llava-hf/llava-v1.6-vicuna-13b-hf) (Multimodal)
- [Llama](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
- [Llama](https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f)
- [Phi 3](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
- [Granite](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct)
- [Gemma](https://huggingface.co/google/gemma-7b)
- [Gemma2](https://huggingface.co/google/gemma2-9b)
- [PaliGemma](https://huggingface.co/google/paligemma-3b-pt-224)
- [Gemma2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315)
- [Cohere](https://huggingface.co/CohereForAI/c4ai-command-r-plus)
- [Dbrx](https://huggingface.co/databricks/dbrx-instruct)
- [Mamba](https://huggingface.co/state-spaces/mamba-2.8b-slimpj)
- [Mistral](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
- [Mistral](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)
- [Mixtral](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1)
- [Gpt Bigcode](https://huggingface.co/bigcode/gpt_bigcode-santacoder)
- [Phi](https://huggingface.co/microsoft/phi-1_5)
- [PhiMoe](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct)
- [Baichuan](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat)
- [Falcon](https://huggingface.co/tiiuae/falcon-7b-instruct)
- [StarCoder 2](https://huggingface.co/bigcode/starcoder2-15b-instruct-v0.1)
......@@ -30,7 +32,10 @@ Text Generation Inference enables serving optimized models on specific hardware
- [Mpt](https://huggingface.co/mosaicml/mpt-7b-instruct)
- [Gpt2](https://huggingface.co/openai-community/gpt2)
- [Gpt Neox](https://huggingface.co/EleutherAI/gpt-neox-20b)
- [Gptj](https://huggingface.co/EleutherAI/gpt-j-6b)
- [Idefics](https://huggingface.co/HuggingFaceM4/idefics-9b) (Multimodal)
- [Mllama](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) (Multimodal)
If the above list lacks the model you would like to serve, depending on the model's pipeline type, you can try to initialize and serve the model anyways to see how well it performs, but performance isn't guaranteed for non-optimized models:
......
# Collection of Usage Statistics
Text Generation Inference collects anonymous usage statistics to help us improve the service. The collected data is used to improve TGI and to understand what causes failures. The data is collected transparently and any sensitive information is omitted.
Data is sent twice, once on server startup and once when server stops. Also, usage statistics are only enabled when TGI is running in docker to avoid collecting data then TGI runs directly on the host machine.
## What data is collected
The code that collects the data is available [here](https://github.com/huggingface/text-generation-inference/blob/main/router/src/usage_stats.rs).
As of release 2.1.2 this is an example of the data collected:
- From the TGI configuration:
```json
{
"event_type": "start",
"disable_grammar_support": false,
"max_batch_prefill_tokens": 4096,
"max_batch_size": null,
"max_batch_total_tokens": null,
"max_best_of": 2,
"max_client_batch_size": 4,
"max_concurrent_requests": 128,
"max_input_tokens": 1024,
"max_stop_sequences": 4,
"max_top_n_tokens": 5,
"max_total_tokens": 2048,
"max_waiting_tokens": 20,
"model_config": {
"model_type": "Bloom"
},
"revision": null,
"tokenizer_class": "BloomTokenizerFast",
"validation_workers": 2,
"waiting_served_ratio": 1.2,
"docker_label": "latest",
"git_sha": "cfc118704880453d29bcbe4fbbd91dda501cf5fe",
"nvidia_env": {
"name": "NVIDIA A10G",
"pci_bus_id": "00000000:00:1E.0",
"driver_version": "535.183.01",
"pstate": "P8",
"pcie_link_gen_max": "4",
"pcie_link_gen_current": "1",
"temperature_gpu": "31",
"utilization_gpu": "0 %",
"utilization_memory": "0 %",
"memory_total": "23028 MiB",
"memory_free": "22515 MiB",
"memory_used": "0 MiB",
"reset_status_reset_required": "No",
"reset_status_drain_and_reset_recommended": "No",
"compute_cap": "8.6",
"ecc_errors_corrected_volatile_total": "0",
"mig_mode_current": "[N/A]",
"power_draw_instant": "10.86 W",
"power_limit": "300.00 W"
},
"system_env": {
"cpu_count": 16,
"cpu_type": "AMD EPYC 7R32",
"total_memory": 66681196544,
"architecture": "x86_64",
"platform": "linux-unix-x86_64"
}
}
```
## How to opt-out
By passing the `--usage-stats` to the text-generation-launcher you can control how much usage statistics are being collected.
`--usage-stats=no-stack` will not emit the stack traces from errors and the error types, but will continue to send start and stop events
`--usage-stats=off` will completely disable everything
{
"nodes": {
"cachix": {
"inputs": {
"devenv": [
"crate2nix"
],
"flake-compat": [
"crate2nix"
],
"nixpkgs": "nixpkgs",
"pre-commit-hooks": [
"crate2nix"
]
},
"locked": {
"lastModified": 1709700175,
"narHash": "sha256-A0/6ZjLmT9qdYzKHmevnEIC7G+GiZ4UCr8v0poRPzds=",
"owner": "cachix",
"repo": "cachix",
"rev": "be97b37989f11b724197b5f4c7ffd78f12c8c4bf",
"type": "github"
},
"original": {
"owner": "cachix",
"ref": "latest",
"repo": "cachix",
"type": "github"
}
},
"cachix_2": {
"inputs": {
"devenv": [
"crate2nix",
"crate2nix_stable"
],
"flake-compat": [
"crate2nix",
"crate2nix_stable"
],
"nixpkgs": "nixpkgs_2",
"pre-commit-hooks": [
"crate2nix",
"crate2nix_stable"
]
},
"locked": {
"lastModified": 1716549461,
"narHash": "sha256-lHy5kgx6J8uD+16SO47dPrbob98sh+W1tf4ceSqPVK4=",
"owner": "cachix",
"repo": "cachix",
"rev": "e2bb269fb8c0828d5d4d2d7b8d09ea85abcacbd4",
"type": "github"
},
"original": {
"owner": "cachix",
"ref": "latest",
"repo": "cachix",
"type": "github"
}
},
"cachix_3": {
"inputs": {
"devenv": [
"crate2nix",
"crate2nix_stable",
"crate2nix_stable"
],
"flake-compat": [
"crate2nix",
"crate2nix_stable",
"crate2nix_stable"
],
"nixpkgs": "nixpkgs_3",
"pre-commit-hooks": [
"crate2nix",
"crate2nix_stable",
"crate2nix_stable"
]
},
"locked": {
"lastModified": 1716549461,
"narHash": "sha256-lHy5kgx6J8uD+16SO47dPrbob98sh+W1tf4ceSqPVK4=",
"owner": "cachix",
"repo": "cachix",
"rev": "e2bb269fb8c0828d5d4d2d7b8d09ea85abcacbd4",
"type": "github"
},
"original": {
"owner": "cachix",
"ref": "latest",
"repo": "cachix",
"type": "github"
}
},
"crate2nix": {
"inputs": {
"cachix": "cachix",
"crate2nix_stable": "crate2nix_stable",
"devshell": "devshell_3",
"flake-compat": "flake-compat_3",
"flake-parts": "flake-parts_3",
"nix-test-runner": "nix-test-runner_3",
"nixpkgs": [
"tgi-nix",
"nixpkgs"
],
"pre-commit-hooks": "pre-commit-hooks_3"
},
"locked": {
"lastModified": 1723311214,
"narHash": "sha256-xdGZQBEa1AC2us/sY3igS/CucWY6jErXsAvCFRhB2LI=",
"owner": "nix-community",
"repo": "crate2nix",
"rev": "236f6addfd452a48be805819e3216af79e988fd5",
"type": "github"
},
"original": {
"owner": "nix-community",
"repo": "crate2nix",
"type": "github"
}
},
"crate2nix_stable": {
"inputs": {
"cachix": "cachix_2",
"crate2nix_stable": "crate2nix_stable_2",
"devshell": "devshell_2",
"flake-compat": "flake-compat_2",
"flake-parts": "flake-parts_2",
"nix-test-runner": "nix-test-runner_2",
"nixpkgs": "nixpkgs_5",
"pre-commit-hooks": "pre-commit-hooks_2"
},
"locked": {
"lastModified": 1719760004,
"narHash": "sha256-esWhRnt7FhiYq0CcIxw9pvH+ybOQmWBfHYMtleaMhBE=",
"owner": "nix-community",
"repo": "crate2nix",
"rev": "1dee214bb20855fa3e1e7bb98d28922ddaff8c57",
"type": "github"
},
"original": {
"owner": "nix-community",
"ref": "0.14.1",
"repo": "crate2nix",
"type": "github"
}
},
"crate2nix_stable_2": {
"inputs": {
"cachix": "cachix_3",
"crate2nix_stable": "crate2nix_stable_3",
"devshell": "devshell",
"flake-compat": "flake-compat",
"flake-parts": "flake-parts",
"nix-test-runner": "nix-test-runner",
"nixpkgs": "nixpkgs_4",
"pre-commit-hooks": "pre-commit-hooks"
},
"locked": {
"lastModified": 1712821484,
"narHash": "sha256-rGT3CW64cJS9nlnWPFWSc1iEa3dNZecVVuPVGzcsHe8=",
"owner": "nix-community",
"repo": "crate2nix",
"rev": "42883afcad3823fa5811e967fb7bff54bc3c9d6d",
"type": "github"
},
"original": {
"owner": "nix-community",
"ref": "0.14.0",
"repo": "crate2nix",
"type": "github"
}
},
"crate2nix_stable_3": {
"inputs": {
"flake-utils": "flake-utils"
},
"locked": {
"lastModified": 1702842982,
"narHash": "sha256-A9AowkHIjsy1a4LuiPiVP88FMxyCWK41flZEZOUuwQM=",
"owner": "nix-community",
"repo": "crate2nix",
"rev": "75ac2973affa6b9b4f661a7b592cba6e4f51d426",
"type": "github"
},
"original": {
"owner": "nix-community",
"ref": "0.12.0",
"repo": "crate2nix",
"type": "github"
}
},
"devshell": {
"inputs": {
"flake-utils": "flake-utils_2",
"nixpkgs": [
"crate2nix",
"crate2nix_stable",
"crate2nix_stable",
"nixpkgs"
]
},
"locked": {
"lastModified": 1717408969,
"narHash": "sha256-Q0OEFqe35fZbbRPPRdrjTUUChKVhhWXz3T9ZSKmaoVY=",
"owner": "numtide",
"repo": "devshell",
"rev": "1ebbe68d57457c8cae98145410b164b5477761f4",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "devshell",
"type": "github"
}
},
"devshell_2": {
"inputs": {
"flake-utils": "flake-utils_3",
"nixpkgs": [
"crate2nix",
"crate2nix_stable",
"nixpkgs"
]
},
"locked": {
"lastModified": 1717408969,
"narHash": "sha256-Q0OEFqe35fZbbRPPRdrjTUUChKVhhWXz3T9ZSKmaoVY=",
"owner": "numtide",
"repo": "devshell",
"rev": "1ebbe68d57457c8cae98145410b164b5477761f4",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "devshell",
"type": "github"
}
},
"devshell_3": {
"inputs": {
"flake-utils": "flake-utils_4",
"nixpkgs": [
"crate2nix",
"nixpkgs"
]
},
"locked": {
"lastModified": 1711099426,
"narHash": "sha256-HzpgM/wc3aqpnHJJ2oDqPBkNsqWbW0WfWUO8lKu8nGk=",
"owner": "numtide",
"repo": "devshell",
"rev": "2d45b54ca4a183f2fdcf4b19c895b64fbf620ee8",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "devshell",
"type": "github"
}
},
"flake-compat": {
"locked": {
"lastModified": 1696426674,
"narHash": "sha256-kvjfFW7WAETZlt09AgDn1MrtKzP7t90Vf7vypd3OL1U=",
"rev": "0f9255e01c2351cc7d116c072cb317785dd33b33",
"revCount": 57,
"type": "tarball",
"url": "https://api.flakehub.com/f/pinned/edolstra/flake-compat/1.0.1/018afb31-abd1-7bff-a5e4-cff7e18efb7a/source.tar.gz"
},
"original": {
"type": "tarball",
"url": "https://flakehub.com/f/edolstra/flake-compat/1.tar.gz"
}
},
"flake-compat_2": {
"locked": {
"lastModified": 1696426674,
"narHash": "sha256-kvjfFW7WAETZlt09AgDn1MrtKzP7t90Vf7vypd3OL1U=",
"rev": "0f9255e01c2351cc7d116c072cb317785dd33b33",
"revCount": 57,
"type": "tarball",
"url": "https://api.flakehub.com/f/pinned/edolstra/flake-compat/1.0.1/018afb31-abd1-7bff-a5e4-cff7e18efb7a/source.tar.gz"
},
"original": {
"type": "tarball",
"url": "https://flakehub.com/f/edolstra/flake-compat/1.tar.gz"
}
},
"flake-compat_3": {
"locked": {
"lastModified": 1696426674,
"narHash": "sha256-kvjfFW7WAETZlt09AgDn1MrtKzP7t90Vf7vypd3OL1U=",
"rev": "0f9255e01c2351cc7d116c072cb317785dd33b33",
"revCount": 57,
"type": "tarball",
"url": "https://api.flakehub.com/f/pinned/edolstra/flake-compat/1.0.1/018afb31-abd1-7bff-a5e4-cff7e18efb7a/source.tar.gz"
},
"original": {
"type": "tarball",
"url": "https://flakehub.com/f/edolstra/flake-compat/1.tar.gz"
}
},
"flake-compat_4": {
"locked": {
"lastModified": 1696426674,
"narHash": "sha256-kvjfFW7WAETZlt09AgDn1MrtKzP7t90Vf7vypd3OL1U=",
"owner": "edolstra",
"repo": "flake-compat",
"rev": "0f9255e01c2351cc7d116c072cb317785dd33b33",
"type": "github"
},
"original": {
"owner": "edolstra",
"repo": "flake-compat",
"type": "github"
}
},
"flake-parts": {
"inputs": {
"nixpkgs-lib": [
"crate2nix",
"crate2nix_stable",
"crate2nix_stable",
"nixpkgs"
]
},
"locked": {
"lastModified": 1719745305,
"narHash": "sha256-xwgjVUpqSviudEkpQnioeez1Uo2wzrsMaJKJClh+Bls=",
"owner": "hercules-ci",
"repo": "flake-parts",
"rev": "c3c5ecc05edc7dafba779c6c1a61cd08ac6583e9",
"type": "github"
},
"original": {
"owner": "hercules-ci",
"repo": "flake-parts",
"type": "github"
}
},
"flake-parts_2": {
"inputs": {
"nixpkgs-lib": [
"crate2nix",
"crate2nix_stable",
"nixpkgs"
]
},
"locked": {
"lastModified": 1719745305,
"narHash": "sha256-xwgjVUpqSviudEkpQnioeez1Uo2wzrsMaJKJClh+Bls=",
"owner": "hercules-ci",
"repo": "flake-parts",
"rev": "c3c5ecc05edc7dafba779c6c1a61cd08ac6583e9",
"type": "github"
},
"original": {
"owner": "hercules-ci",
"repo": "flake-parts",
"type": "github"
}
},
"flake-parts_3": {
"inputs": {
"nixpkgs-lib": [
"crate2nix",
"nixpkgs"
]
},
"locked": {
"lastModified": 1712014858,
"narHash": "sha256-sB4SWl2lX95bExY2gMFG5HIzvva5AVMJd4Igm+GpZNw=",
"owner": "hercules-ci",
"repo": "flake-parts",
"rev": "9126214d0a59633752a136528f5f3b9aa8565b7d",
"type": "github"
},
"original": {
"owner": "hercules-ci",
"repo": "flake-parts",
"type": "github"
}
},
"flake-utils": {
"inputs": {
"systems": "systems"
},
"locked": {
"lastModified": 1694529238,
"narHash": "sha256-zsNZZGTGnMOf9YpHKJqMSsa0dXbfmxeoJ7xHlrt+xmY=",
"owner": "numtide",
"repo": "flake-utils",
"rev": "ff7b65b44d01cf9ba6a71320833626af21126384",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "flake-utils",
"type": "github"
}
},
"flake-utils_2": {
"inputs": {
"systems": "systems_2"
},
"locked": {
"lastModified": 1701680307,
"narHash": "sha256-kAuep2h5ajznlPMD9rnQyffWG8EM/C73lejGofXvdM8=",
"owner": "numtide",
"repo": "flake-utils",
"rev": "4022d587cbbfd70fe950c1e2083a02621806a725",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "flake-utils",
"type": "github"
}
},
"flake-utils_3": {
"inputs": {
"systems": "systems_3"
},
"locked": {
"lastModified": 1701680307,
"narHash": "sha256-kAuep2h5ajznlPMD9rnQyffWG8EM/C73lejGofXvdM8=",
"owner": "numtide",
"repo": "flake-utils",
"rev": "4022d587cbbfd70fe950c1e2083a02621806a725",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "flake-utils",
"type": "github"
}
},
"flake-utils_4": {
"inputs": {
"systems": "systems_4"
},
"locked": {
"lastModified": 1701680307,
"narHash": "sha256-kAuep2h5ajznlPMD9rnQyffWG8EM/C73lejGofXvdM8=",
"owner": "numtide",
"repo": "flake-utils",
"rev": "4022d587cbbfd70fe950c1e2083a02621806a725",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "flake-utils",
"type": "github"
}
},
"flake-utils_5": {
"inputs": {
"systems": "systems_5"
},
"locked": {
"lastModified": 1710146030,
"narHash": "sha256-SZ5L6eA7HJ/nmkzGG7/ISclqe6oZdOZTNoesiInkXPQ=",
"owner": "numtide",
"repo": "flake-utils",
"rev": "b1d9ab70662946ef0850d488da1c9019f3a9752a",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "flake-utils",
"type": "github"
}
},
"flake-utils_6": {
"inputs": {
"systems": "systems_6"
},
"locked": {
"lastModified": 1726560853,
"narHash": "sha256-X6rJYSESBVr3hBoH0WbKE5KvhPU5bloyZ2L4K60/fPQ=",
"owner": "numtide",
"repo": "flake-utils",
"rev": "c1dfcf08411b08f6b8615f7d8971a2bfa81d5e8a",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "flake-utils",
"type": "github"
}
},
"flake-utils_7": {
"inputs": {
"systems": "systems_7"
},
"locked": {
"lastModified": 1726560853,
"narHash": "sha256-X6rJYSESBVr3hBoH0WbKE5KvhPU5bloyZ2L4K60/fPQ=",
"owner": "numtide",
"repo": "flake-utils",
"rev": "c1dfcf08411b08f6b8615f7d8971a2bfa81d5e8a",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "flake-utils",
"type": "github"
}
},
"gitignore": {
"inputs": {
"nixpkgs": [
"crate2nix",
"crate2nix_stable",
"crate2nix_stable",
"pre-commit-hooks",
"nixpkgs"
]
},
"locked": {
"lastModified": 1709087332,
"narHash": "sha256-HG2cCnktfHsKV0s4XW83gU3F57gaTljL9KNSuG6bnQs=",
"owner": "hercules-ci",
"repo": "gitignore.nix",
"rev": "637db329424fd7e46cf4185293b9cc8c88c95394",
"type": "github"
},
"original": {
"owner": "hercules-ci",
"repo": "gitignore.nix",
"type": "github"
}
},
"gitignore_2": {
"inputs": {
"nixpkgs": [
"crate2nix",
"crate2nix_stable",
"pre-commit-hooks",
"nixpkgs"
]
},
"locked": {
"lastModified": 1709087332,
"narHash": "sha256-HG2cCnktfHsKV0s4XW83gU3F57gaTljL9KNSuG6bnQs=",
"owner": "hercules-ci",
"repo": "gitignore.nix",
"rev": "637db329424fd7e46cf4185293b9cc8c88c95394",
"type": "github"
},
"original": {
"owner": "hercules-ci",
"repo": "gitignore.nix",
"type": "github"
}
},
"gitignore_3": {
"inputs": {
"nixpkgs": [
"crate2nix",
"pre-commit-hooks",
"nixpkgs"
]
},
"locked": {
"lastModified": 1709087332,
"narHash": "sha256-HG2cCnktfHsKV0s4XW83gU3F57gaTljL9KNSuG6bnQs=",
"owner": "hercules-ci",
"repo": "gitignore.nix",
"rev": "637db329424fd7e46cf4185293b9cc8c88c95394",
"type": "github"
},
"original": {
"owner": "hercules-ci",
"repo": "gitignore.nix",
"type": "github"
}
},
"nix-filter": {
"locked": {
"lastModified": 1710156097,
"narHash": "sha256-1Wvk8UP7PXdf8bCCaEoMnOT1qe5/Duqgj+rL8sRQsSM=",
"owner": "numtide",
"repo": "nix-filter",
"rev": "3342559a24e85fc164b295c3444e8a139924675b",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "nix-filter",
"type": "github"
}
},
"nix-test-runner": {
"flake": false,
"locked": {
"lastModified": 1588761593,
"narHash": "sha256-FKJykltAN/g3eIceJl4SfDnnyuH2jHImhMrXS2KvGIs=",
"owner": "stoeffel",
"repo": "nix-test-runner",
"rev": "c45d45b11ecef3eb9d834c3b6304c05c49b06ca2",
"type": "github"
},
"original": {
"owner": "stoeffel",
"repo": "nix-test-runner",
"type": "github"
}
},
"nix-test-runner_2": {
"flake": false,
"locked": {
"lastModified": 1588761593,
"narHash": "sha256-FKJykltAN/g3eIceJl4SfDnnyuH2jHImhMrXS2KvGIs=",
"owner": "stoeffel",
"repo": "nix-test-runner",
"rev": "c45d45b11ecef3eb9d834c3b6304c05c49b06ca2",
"type": "github"
},
"original": {
"owner": "stoeffel",
"repo": "nix-test-runner",
"type": "github"
}
},
"nix-test-runner_3": {
"flake": false,
"locked": {
"lastModified": 1588761593,
"narHash": "sha256-FKJykltAN/g3eIceJl4SfDnnyuH2jHImhMrXS2KvGIs=",
"owner": "stoeffel",
"repo": "nix-test-runner",
"rev": "c45d45b11ecef3eb9d834c3b6304c05c49b06ca2",
"type": "github"
},
"original": {
"owner": "stoeffel",
"repo": "nix-test-runner",
"type": "github"
}
},
"nixpkgs": {
"locked": {
"lastModified": 1700612854,
"narHash": "sha256-yrQ8osMD+vDLGFX7pcwsY/Qr5PUd6OmDMYJZzZi0+zc=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "19cbff58383a4ae384dea4d1d0c823d72b49d614",
"type": "github"
},
"original": {
"owner": "NixOS",
"ref": "nixos-unstable",
"repo": "nixpkgs",
"type": "github"
}
},
"nixpkgs_2": {
"locked": {
"lastModified": 1715534503,
"narHash": "sha256-5ZSVkFadZbFP1THataCaSf0JH2cAH3S29hU9rrxTEqk=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "2057814051972fa1453ddfb0d98badbea9b83c06",
"type": "github"
},
"original": {
"owner": "NixOS",
"ref": "nixos-unstable",
"repo": "nixpkgs",
"type": "github"
}
},
"nixpkgs_3": {
"locked": {
"lastModified": 1715534503,
"narHash": "sha256-5ZSVkFadZbFP1THataCaSf0JH2cAH3S29hU9rrxTEqk=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "2057814051972fa1453ddfb0d98badbea9b83c06",
"type": "github"
},
"original": {
"owner": "NixOS",
"ref": "nixos-unstable",
"repo": "nixpkgs",
"type": "github"
}
},
"nixpkgs_4": {
"locked": {
"lastModified": 1719506693,
"narHash": "sha256-C8e9S7RzshSdHB7L+v9I51af1gDM5unhJ2xO1ywxNH8=",
"path": "/nix/store/4p0avw1s3vf27hspgqsrqs37gxk4i83i-source",
"rev": "b2852eb9365c6de48ffb0dc2c9562591f652242a",
"type": "path"
},
"original": {
"id": "nixpkgs",
"type": "indirect"
}
},
"nixpkgs_5": {
"locked": {
"lastModified": 1719506693,
"narHash": "sha256-C8e9S7RzshSdHB7L+v9I51af1gDM5unhJ2xO1ywxNH8=",
"path": "/nix/store/4p0avw1s3vf27hspgqsrqs37gxk4i83i-source",
"rev": "b2852eb9365c6de48ffb0dc2c9562591f652242a",
"type": "path"
},
"original": {
"id": "nixpkgs",
"type": "indirect"
}
},
"nixpkgs_6": {
"locked": {
"lastModified": 1727675176,
"narHash": "sha256-xIjBFMYldWvj+g8ahxMPofsj+OqxvKJN6YylNHQ7gn4=",
"owner": "nixos",
"repo": "nixpkgs",
"rev": "a6d0207fea9212d28cd3d487efe6bc699663b93a",
"type": "github"
},
"original": {
"owner": "nixos",
"ref": "nixos-unstable-small",
"repo": "nixpkgs",
"type": "github"
}
},
"pre-commit-hooks": {
"inputs": {
"flake-compat": [
"crate2nix",
"crate2nix_stable",
"crate2nix_stable",
"flake-compat"
],
"gitignore": "gitignore",
"nixpkgs": [
"crate2nix",
"crate2nix_stable",
"crate2nix_stable",
"nixpkgs"
],
"nixpkgs-stable": [
"crate2nix",
"crate2nix_stable",
"crate2nix_stable",
"nixpkgs"
]
},
"locked": {
"lastModified": 1719259945,
"narHash": "sha256-F1h+XIsGKT9TkGO3omxDLEb/9jOOsI6NnzsXFsZhry4=",
"owner": "cachix",
"repo": "pre-commit-hooks.nix",
"rev": "0ff4381bbb8f7a52ca4a851660fc7a437a4c6e07",
"type": "github"
},
"original": {
"owner": "cachix",
"repo": "pre-commit-hooks.nix",
"type": "github"
}
},
"pre-commit-hooks_2": {
"inputs": {
"flake-compat": [
"crate2nix",
"crate2nix_stable",
"flake-compat"
],
"gitignore": "gitignore_2",
"nixpkgs": [
"crate2nix",
"crate2nix_stable",
"nixpkgs"
],
"nixpkgs-stable": [
"crate2nix",
"crate2nix_stable",
"nixpkgs"
]
},
"locked": {
"lastModified": 1719259945,
"narHash": "sha256-F1h+XIsGKT9TkGO3omxDLEb/9jOOsI6NnzsXFsZhry4=",
"owner": "cachix",
"repo": "pre-commit-hooks.nix",
"rev": "0ff4381bbb8f7a52ca4a851660fc7a437a4c6e07",
"type": "github"
},
"original": {
"owner": "cachix",
"repo": "pre-commit-hooks.nix",
"type": "github"
}
},
"pre-commit-hooks_3": {
"inputs": {
"flake-compat": [
"crate2nix",
"flake-compat"
],
"flake-utils": "flake-utils_5",
"gitignore": "gitignore_3",
"nixpkgs": [
"crate2nix",
"nixpkgs"
],
"nixpkgs-stable": [
"crate2nix",
"nixpkgs"
]
},
"locked": {
"lastModified": 1712055707,
"narHash": "sha256-4XLvuSIDZJGS17xEwSrNuJLL7UjDYKGJSbK1WWX2AK8=",
"owner": "cachix",
"repo": "pre-commit-hooks.nix",
"rev": "e35aed5fda3cc79f88ed7f1795021e559582093a",
"type": "github"
},
"original": {
"owner": "cachix",
"repo": "pre-commit-hooks.nix",
"type": "github"
}
},
"root": {
"inputs": {
"crate2nix": "crate2nix",
"flake-utils": "flake-utils_6",
"nix-filter": "nix-filter",
"nixpkgs": [
"tgi-nix",
"nixpkgs"
],
"rust-overlay": "rust-overlay",
"tgi-nix": "tgi-nix"
}
},
"rust-overlay": {
"inputs": {
"nixpkgs": [
"tgi-nix",
"nixpkgs"
]
},
"locked": {
"lastModified": 1727836133,
"narHash": "sha256-JE0zciM5IGWvK8J/pE2VldNBf7oyMH5WrU8tZArefbg=",
"owner": "oxalica",
"repo": "rust-overlay",
"rev": "02321540b0c8000b36889b1b974d1fec585b25a4",
"type": "github"
},
"original": {
"owner": "oxalica",
"repo": "rust-overlay",
"type": "github"
}
},
"systems": {
"locked": {
"lastModified": 1681028828,
"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
"owner": "nix-systems",
"repo": "default",
"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
"type": "github"
},
"original": {
"owner": "nix-systems",
"repo": "default",
"type": "github"
}
},
"systems_2": {
"locked": {
"lastModified": 1681028828,
"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
"owner": "nix-systems",
"repo": "default",
"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
"type": "github"
},
"original": {
"owner": "nix-systems",
"repo": "default",
"type": "github"
}
},
"systems_3": {
"locked": {
"lastModified": 1681028828,
"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
"owner": "nix-systems",
"repo": "default",
"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
"type": "github"
},
"original": {
"owner": "nix-systems",
"repo": "default",
"type": "github"
}
},
"systems_4": {
"locked": {
"lastModified": 1681028828,
"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
"owner": "nix-systems",
"repo": "default",
"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
"type": "github"
},
"original": {
"owner": "nix-systems",
"repo": "default",
"type": "github"
}
},
"systems_5": {
"locked": {
"lastModified": 1681028828,
"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
"owner": "nix-systems",
"repo": "default",
"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
"type": "github"
},
"original": {
"owner": "nix-systems",
"repo": "default",
"type": "github"
}
},
"systems_6": {
"locked": {
"lastModified": 1681028828,
"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
"owner": "nix-systems",
"repo": "default",
"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
"type": "github"
},
"original": {
"owner": "nix-systems",
"repo": "default",
"type": "github"
}
},
"systems_7": {
"locked": {
"lastModified": 1681028828,
"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
"owner": "nix-systems",
"repo": "default",
"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
"type": "github"
},
"original": {
"owner": "nix-systems",
"repo": "default",
"type": "github"
}
},
"tgi-nix": {
"inputs": {
"flake-compat": "flake-compat_4",
"flake-utils": "flake-utils_7",
"nixpkgs": "nixpkgs_6"
},
"locked": {
"lastModified": 1729761651,
"narHash": "sha256-GYykQ9Fxji2EuXCGcPn0dx8Qx8VQBJTkRdcCytp4A/k=",
"owner": "huggingface",
"repo": "text-generation-inference-nix",
"rev": "f7e3c4fa67d70590ed9ee47feeab645bd9ba81b1",
"type": "github"
},
"original": {
"owner": "huggingface",
"ref": "marlin-kernels-0.3.1",
"repo": "text-generation-inference-nix",
"type": "github"
}
}
},
"root": "root",
"version": 7
}
{
inputs = {
crate2nix = {
url = "github:nix-community/crate2nix";
inputs.nixpkgs.follows = "tgi-nix/nixpkgs";
};
nix-filter.url = "github:numtide/nix-filter";
tgi-nix.url = "github:huggingface/text-generation-inference-nix/marlin-kernels-0.3.1";
nixpkgs.follows = "tgi-nix/nixpkgs";
flake-utils.url = "github:numtide/flake-utils";
rust-overlay = {
url = "github:oxalica/rust-overlay";
inputs.nixpkgs.follows = "tgi-nix/nixpkgs";
};
};
outputs =
{
self,
crate2nix,
nix-filter,
nixpkgs,
flake-utils,
rust-overlay,
tgi-nix,
}:
flake-utils.lib.eachDefaultSystem (
system:
let
cargoNix = crate2nix.tools.${system}.appliedCargoNix {
name = "tgi";
src = ./.;
additionalCargoNixArgs = [ "--all-features" ];
};
pkgs = import nixpkgs {
inherit system;
inherit (tgi-nix.lib) config;
overlays = [
rust-overlay.overlays.default
tgi-nix.overlays.default
(import nix/overlay.nix)
];
};
crateOverrides = import ./nix/crate-overrides.nix { inherit pkgs nix-filter; };
benchmark = cargoNix.workspaceMembers.text-generation-benchmark.build.override {
inherit crateOverrides;
};
launcher = cargoNix.workspaceMembers.text-generation-launcher.build.override {
inherit crateOverrides;
};
router =
let
routerUnwrapped = cargoNix.workspaceMembers.text-generation-router-v3.build.override {
inherit crateOverrides;
};
packagePath =
with pkgs.python3.pkgs;
makePythonPath [
protobuf
sentencepiece
torch
transformers
];
in
pkgs.writeShellApplication {
name = "text-generation-router";
text = ''
PYTHONPATH="${packagePath}" ${routerUnwrapped}/bin/text-generation-router "$@"
'';
};
server = pkgs.python3.pkgs.callPackage ./nix/server.nix { inherit nix-filter; };
client = pkgs.python3.pkgs.callPackage ./nix/client.nix { };
in
{
checks = {
rust =
with pkgs;
rustPlatform.buildRustPackage {
name = "rust-checks";
src = ./.;
cargoLock = {
lockFile = ./Cargo.lock;
};
buildInputs = [ openssl.dev ];
nativeBuildInputs = [
clippy
pkg-config
protobuf
python3
rustfmt
];
buildPhase = ''
cargo check
'';
checkPhase = ''
cargo fmt -- --check
cargo test -j $NIX_BUILD_CORES
cargo clippy
'';
installPhase = "touch $out";
};
};
formatter = pkgs.nixfmt-rfc-style;
devShells = with pkgs; rec {
default = pure;
pure = mkShell {
buildInputs = [
benchmark
launcher
router
server
];
};
test = mkShell {
buildInputs =
[
benchmark
launcher
router
server
client
openssl.dev
pkg-config
cargo
rustfmt
clippy
]
++ (with python3.pkgs; [
docker
pytest
pytest-asyncio
syrupy
pre-commit
ruff
]);
};
impure = callPackage ./nix/impure-shell.nix { inherit server; };
impureWithCuda = callPackage ./nix/impure-shell.nix {
inherit server;
withCuda = true;
};
impure-flash-attn-v1 = callPackage ./nix/impure-shell.nix {
server = server.override { flash-attn = python3.pkgs.flash-attn-v1; };
};
};
packages = rec {
default = pkgs.writeShellApplication {
name = "text-generation-inference";
runtimeInputs = [
server
router
];
text = ''
${launcher}/bin/text-generation-launcher "$@"
'';
};
dockerImage = pkgs.callPackage nix/docker.nix {
text-generation-inference = default;
};
dockerImageStreamed = pkgs.callPackage nix/docker.nix {
text-generation-inference = default;
stream = true;
};
};
}
);
}
......@@ -4,22 +4,25 @@ import json
import math
import os
import random
import re
import shutil
import subprocess
import sys
import tempfile
import time
from typing import Dict, List, Optional
import docker
import pytest
import base64
from pathlib import Path
from typing import Dict, List, Optional
from aiohttp import ClientConnectorError, ClientOSError, ServerDisconnectedError
from docker.errors import NotFound
from syrupy.extensions.json import JSONSnapshotExtension
from text_generation import AsyncClient
from text_generation.types import (
BestOfSequence,
Message,
ChatComplete,
ChatCompletionChunk,
ChatCompletionComplete,
......@@ -65,6 +68,7 @@ class ResponseComparator(JSONSnapshotExtension):
self,
data,
*,
include=None,
exclude=None,
matcher=None,
):
......@@ -80,7 +84,12 @@ class ResponseComparator(JSONSnapshotExtension):
data = [d.model_dump() for d in data]
data = self._filter(
data=data, depth=0, path=(), exclude=exclude, matcher=matcher
data=data,
depth=0,
path=(),
exclude=exclude,
include=include,
matcher=matcher,
)
return json.dumps(data, indent=2, ensure_ascii=False, sort_keys=False) + "\n"
......@@ -92,25 +101,25 @@ class ResponseComparator(JSONSnapshotExtension):
) -> bool:
def convert_data(data):
data = json.loads(data)
if isinstance(data, Dict) and "choices" in data:
choices = data["choices"]
if isinstance(choices, List) and len(choices) >= 1:
if "delta" in choices[0]:
return ChatCompletionChunk(**data)
if "text" in choices[0]:
return Completion(**data)
return ChatComplete(**data)
return _convert_data(data)
def _convert_data(data):
if isinstance(data, Dict):
return Response(**data)
if "choices" in data:
data["choices"] = list(
sorted(data["choices"], key=lambda x: x["index"])
)
choices = data["choices"]
if isinstance(choices, List) and len(choices) >= 1:
if "delta" in choices[0]:
return ChatCompletionChunk(**data)
if "text" in choices[0]:
return Completion(**data)
return ChatComplete(**data)
else:
return Response(**data)
if isinstance(data, List):
if (
len(data) > 0
and "object" in data[0]
and data[0]["object"] == "text_completion"
):
return [Completion(**d) for d in data]
return [Response(**d) for d in data]
return [_convert_data(d) for d in data]
raise NotImplementedError
def eq_token(token: Token, other: Token) -> bool:
......@@ -119,6 +128,7 @@ class ResponseComparator(JSONSnapshotExtension):
and token.text == other.text
and (
self.ignore_logprob
or (token.logprob == other.logprob and token.logprob is None)
or math.isclose(token.logprob, other.logprob, rel_tol=self.rtol)
)
and token.special == other.special
......@@ -257,7 +267,7 @@ class IgnoreLogProbResponseComparator(ResponseComparator):
class LauncherHandle:
def __init__(self, port: int):
self.client = AsyncClient(f"http://localhost:{port}")
self.client = AsyncClient(f"http://localhost:{port}", timeout=30)
def _inner_health(self):
raise NotImplementedError
......@@ -271,7 +281,7 @@ class LauncherHandle:
try:
await self.client.generate("test")
return
except (ClientConnectorError, ClientOSError, ServerDisconnectedError) as e:
except (ClientConnectorError, ClientOSError, ServerDisconnectedError):
time.sleep(1)
raise RuntimeError("Health check failed")
......@@ -329,10 +339,14 @@ def launcher(event_loop):
use_flash_attention: bool = True,
disable_grammar_support: bool = False,
dtype: Optional[str] = None,
kv_cache_dtype: Optional[str] = None,
revision: Optional[str] = None,
max_input_length: Optional[int] = None,
max_batch_prefill_tokens: Optional[int] = None,
max_total_tokens: Optional[int] = None,
lora_adapters: Optional[List[str]] = None,
cuda_graphs: Optional[List[int]] = None,
attention: Optional[str] = None,
):
port = random.randint(8000, 10_000)
master_port = random.randint(10_000, 20_000)
......@@ -365,6 +379,9 @@ def launcher(event_loop):
if dtype is not None:
args.append("--dtype")
args.append(dtype)
if kv_cache_dtype is not None:
args.append("--kv-cache-dtype")
args.append(kv_cache_dtype)
if revision is not None:
args.append("--revision")
args.append(revision)
......@@ -379,11 +396,22 @@ def launcher(event_loop):
if max_total_tokens:
args.append("--max-total-tokens")
args.append(str(max_total_tokens))
if lora_adapters:
args.append("--lora-adapters")
args.append(",".join(lora_adapters))
if cuda_graphs:
args.append("--cuda-graphs")
args.append(",".join(map(str, cuda_graphs)))
print(" ".join(args), file=sys.stderr)
env["LOG_LEVEL"] = "info,text_generation_router=debug"
env["PREFILL_CHUNKING"] = "1"
if not use_flash_attention:
env["USE_FLASH_ATTENTION"] = "false"
if attention is not None:
env["ATTENTION"] = attention
with tempfile.TemporaryFile("w+") as tmp:
# We'll output stdout/stderr to a temporary file. Using a pipe
......@@ -414,10 +442,14 @@ def launcher(event_loop):
use_flash_attention: bool = True,
disable_grammar_support: bool = False,
dtype: Optional[str] = None,
kv_cache_dtype: Optional[str] = None,
revision: Optional[str] = None,
max_input_length: Optional[int] = None,
max_batch_prefill_tokens: Optional[int] = None,
max_total_tokens: Optional[int] = None,
lora_adapters: Optional[List[str]] = None,
cuda_graphs: Optional[List[int]] = None,
attention: Optional[str] = None,
):
port = random.randint(8000, 10_000)
......@@ -433,6 +465,9 @@ def launcher(event_loop):
if dtype is not None:
args.append("--dtype")
args.append(dtype)
if kv_cache_dtype is not None:
args.append("--kv-cache-dtype")
args.append(kv_cache_dtype)
if revision is not None:
args.append("--revision")
args.append(revision)
......@@ -447,6 +482,12 @@ def launcher(event_loop):
if max_total_tokens:
args.append("--max-total-tokens")
args.append(str(max_total_tokens))
if lora_adapters:
args.append("--lora-adapters")
args.append(",".join(lora_adapters))
if cuda_graphs:
args.append("--cuda-graphs")
args.append(",".join(map(str, cuda_graphs)))
client = docker.from_env()
......@@ -455,6 +496,7 @@ def launcher(event_loop):
try:
container = client.containers.get(container_name)
container.stop()
container.remove()
container.wait()
except NotFound:
pass
......@@ -463,9 +505,12 @@ def launcher(event_loop):
env = {
"LOG_LEVEL": "info,text_generation_router=debug",
"PREFILL_CHUNKING": "1",
}
if not use_flash_attention:
env["USE_FLASH_ATTENTION"] = "false"
if attention is not None:
env["ATTENTION"] = attention
if HF_TOKEN is not None:
env["HF_TOKEN"] = HF_TOKEN
......@@ -475,13 +520,28 @@ def launcher(event_loop):
volumes = [f"{DOCKER_VOLUME}:/data"]
if DOCKER_DEVICES:
devices = DOCKER_DEVICES.split(",")
if DOCKER_DEVICES.lower() == "none":
devices = []
else:
devices = DOCKER_DEVICES.strip().split(",")
visible = os.getenv("ROCR_VISIBLE_DEVICES")
if visible:
env["ROCR_VISIBLE_DEVICES"] = visible
device_requests = []
if not devices:
devices = None
elif devices == ["nvidia.com/gpu=all"]:
devices = None
device_requests = [
docker.types.DeviceRequest(
driver="cdi",
# count=gpu_count,
device_ids=[f"nvidia.com/gpu={i}"],
)
for i in range(gpu_count)
]
else:
devices = []
devices = None
device_requests = [
docker.types.DeviceRequest(count=gpu_count, capabilities=[["gpu"]])
]
......@@ -497,24 +557,30 @@ def launcher(event_loop):
devices=devices,
volumes=volumes,
ports={"80/tcp": port},
healthcheck={"timeout": int(10 * 1e9)},
shm_size="1G",
)
yield ContainerLauncherHandle(client, container.name, port)
try:
yield ContainerLauncherHandle(client, container.name, port)
if not use_flash_attention:
del env["USE_FLASH_ATTENTION"]
if not use_flash_attention:
del env["USE_FLASH_ATTENTION"]
try:
container.stop()
container.wait()
except NotFound:
pass
try:
container.stop()
container.wait()
except NotFound:
pass
container_output = container.logs().decode("utf-8")
print(container_output, file=sys.stderr)
container_output = container.logs().decode("utf-8")
print(container_output, file=sys.stderr)
container.remove()
finally:
try:
container.remove()
except Exception:
pass
if DOCKER_IMAGE is not None:
return docker_launcher
......@@ -547,3 +613,56 @@ def generate_load():
return await asyncio.gather(*futures)
return generate_load_inner
@pytest.fixture(scope="module")
def generate_multi():
async def generate_load_inner(
client: AsyncClient,
prompts: List[str],
max_new_tokens: int,
seed: Optional[int] = None,
) -> List[Response]:
import numpy as np
arange = np.arange(len(prompts))
perm = np.random.permutation(arange)
rperm = [-1] * len(perm)
for i, p in enumerate(perm):
rperm[p] = i
shuffled_prompts = [prompts[p] for p in perm]
futures = [
client.chat(
messages=[Message(role="user", content=prompt)],
max_tokens=max_new_tokens,
temperature=0,
seed=seed,
)
for prompt in shuffled_prompts
]
shuffled_responses = await asyncio.gather(*futures)
responses = [shuffled_responses[p] for p in rperm]
return responses
return generate_load_inner
# TODO fix the server parsser to count inline image tokens correctly
@pytest.fixture
def chicken():
path = Path(__file__).parent / "images" / "chicken_on_money.png"
with open(path, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
return f"data:image/png;base64,{encoded_string.decode('utf-8')}"
@pytest.fixture
def cow_beach():
path = Path(__file__).parent / "images" / "cow_beach.png"
with open(path, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
return f"data:image/png;base64,{encoded_string.decode('utf-8')}"
......@@ -11,42 +11,42 @@
},
{
"id": 49833,
"logprob": -10.5625,
"logprob": -10.5703125,
"text": " dég"
},
{
"id": 21543,
"logprob": -0.14770508,
"logprob": -0.14746094,
"text": "uster"
},
{
"id": 447,
"logprob": -1.9287109,
"logprob": -1.9277344,
"text": " un"
},
{
"id": 46341,
"logprob": -15.4609375,
"logprob": -15.421875,
"text": " ort"
},
{
"id": 35567,
"logprob": -7.5585938,
"logprob": -7.5820312,
"text": "olan"
},
{
"id": 15,
"logprob": -1.4003906,
"logprob": -1.4013672,
"text": ","
},
{
"id": 1669,
"logprob": -1.5673828,
"logprob": -1.5595703,
"text": " il"
},
{
"id": 11580,
"logprob": -0.94628906,
"logprob": -0.9428711,
"text": " faut"
},
{
......@@ -56,7 +56,7 @@
},
{
"id": 39261,
"logprob": -1.5732422,
"logprob": -1.7763672,
"text": " d'abord"
}
],
......@@ -64,65 +64,66 @@
"tokens": [
{
"id": 578,
"logprob": -1.6591797,
"logprob": -1.7822266,
"special": false,
"text": " le"
},
{
"id": 5608,
"logprob": -2.4492188,
"logprob": -2.4882812,
"special": false,
"text": " faire"
},
{
"id": 159570,
"logprob": -6.6835938,
"id": 7735,
"logprob": -2.4199219,
"special": false,
"text": " réch"
"text": " fond"
},
{
"id": 810,
"id": 289,
"logprob": 0.0,
"special": false,
"text": "au"
"text": "re"
},
{
"id": 12736,
"logprob": 0.0,
"id": 693,
"logprob": -2.4628906,
"special": false,
"text": "ffer"
"text": " à"
},
{
"id": 1742,
"logprob": -2.5175781,
"id": 366,
"logprob": -1.1308594,
"special": false,
"text": " au"
"text": " la"
},
{
"id": 6105,
"logprob": -2.0078125,
"id": 48844,
"logprob": -1.7900391,
"special": false,
"text": " bain"
"text": " cass"
},
{
"id": 88254,
"logprob": -0.12695312,
"id": 1744,
"logprob": 0.0,
"special": false,
"text": "-mar"
"text": "ero"
},
{
"id": 641,
"id": 327,
"logprob": 0.0,
"special": false,
"text": "ie"
"text": "le"
},
{
"id": 2940,
"logprob": -3.5175781,
"logprob": -1.9306641,
"special": false,
"text": " avec"
}
]
],
"top_tokens": null
},
"generated_text": " le faire réchauffer au bain-marie avec"
"generated_text": " le faire fondre à la casserole avec"
}
......@@ -11,7 +11,7 @@
},
{
"id": 1669,
"logprob": -5.4414062,
"logprob": -5.4453125,
"text": " il"
},
{
......@@ -21,12 +21,12 @@
},
{
"id": 3913,
"logprob": -4.3554688,
"logprob": -4.3320312,
"text": " tout"
},
{
"id": 39261,
"logprob": -2.9238281,
"logprob": -2.9160156,
"text": " d'abord"
}
],
......@@ -34,65 +34,66 @@
"tokens": [
{
"id": 408,
"logprob": -0.07891846,
"logprob": -0.16687012,
"special": false,
"text": " que"
},
{
"id": 366,
"logprob": -1.2939453,
"logprob": -1.5517578,
"special": false,
"text": " la"
},
{
"id": 8769,
"logprob": -0.3708496,
"logprob": -0.16687012,
"special": false,
"text": " personne"
},
{
"id": 1479,
"logprob": -2.2871094,
"logprob": -2.1035156,
"special": false,
"text": " qui"
},
{
"id": 2997,
"logprob": -0.8671875,
"id": 143926,
"logprob": -2.8671875,
"special": false,
"text": " vous"
"text": " réalise"
},
{
"id": 35977,
"logprob": -1.5097656,
"id": 578,
"logprob": 0.0,
"special": false,
"text": " suit"
"text": " le"
},
{
"id": 21558,
"logprob": -0.07891846,
"id": 8138,
"logprob": -0.66748047,
"special": false,
"text": " ait"
"text": " projet"
},
{
"id": 447,
"logprob": -0.12695312,
"id": 795,
"logprob": -1.6279297,
"special": false,
"text": " un"
"text": " ne"
},
{
"id": 78606,
"logprob": -2.21875,
"id": 9802,
"logprob": -0.47875977,
"special": false,
"text": " profil"
"text": " soit"
},
{
"id": 3899,
"logprob": -1.3535156,
"id": 1230,
"logprob": 0.0,
"special": false,
"text": " bien"
"text": " pas"
}
]
],
"top_tokens": null
},
"generated_text": "Pour déguster un ortolan, il faut tout d'abord que la personne qui vous suit ait un profil bien"
"generated_text": "Pour déguster un ortolan, il faut tout d'abord que la personne qui réalise le projet ne soit pas"
}
......@@ -11,52 +11,52 @@
},
{
"id": 49833,
"logprob": -10.5390625,
"logprob": -10.546875,
"text": " dég"
},
{
"id": 21543,
"logprob": -0.14758301,
"logprob": -0.14819336,
"text": "uster"
},
{
"id": 447,
"logprob": -1.9296875,
"logprob": -1.9257812,
"text": " un"
},
{
"id": 46341,
"logprob": -15.4453125,
"logprob": -15.4296875,
"text": " ort"
},
{
"id": 35567,
"logprob": -7.59375,
"logprob": -7.5625,
"text": "olan"
},
{
"id": 15,
"logprob": -1.3994141,
"logprob": -1.4199219,
"text": ","
},
{
"id": 1669,
"logprob": -1.578125,
"logprob": -1.5634766,
"text": " il"
},
{
"id": 11580,
"logprob": -0.9453125,
"logprob": -0.9458008,
"text": " faut"
},
{
"id": 3913,
"logprob": -3.7011719,
"logprob": -3.6816406,
"text": " tout"
},
{
"id": 39261,
"logprob": -1.5732422,
"logprob": -1.7753906,
"text": " d'abord"
}
],
......@@ -64,65 +64,66 @@
"tokens": [
{
"id": 578,
"logprob": -1.6474609,
"logprob": -1.828125,
"special": false,
"text": " le"
},
{
"id": 5608,
"logprob": -2.5097656,
"logprob": -2.5546875,
"special": false,
"text": " faire"
},
{
"id": 159570,
"logprob": -6.65625,
"id": 7735,
"logprob": -2.4277344,
"special": false,
"text": " réch"
"text": " fond"
},
{
"id": 810,
"id": 289,
"logprob": 0.0,
"special": false,
"text": "au"
"text": "re"
},
{
"id": 12736,
"logprob": 0.0,
"id": 693,
"logprob": -2.4472656,
"special": false,
"text": "ffer"
"text": " à"
},
{
"id": 1742,
"logprob": -2.5859375,
"id": 366,
"logprob": -1.1494141,
"special": false,
"text": " au"
"text": " la"
},
{
"id": 6105,
"logprob": -2.03125,
"id": 48844,
"logprob": -1.7939453,
"special": false,
"text": " bain"
"text": " cass"
},
{
"id": 88254,
"logprob": -0.12695312,
"id": 1744,
"logprob": 0.0,
"special": false,
"text": "-mar"
"text": "ero"
},
{
"id": 641,
"id": 327,
"logprob": 0.0,
"special": false,
"text": "ie"
"text": "le"
},
{
"id": 2940,
"logprob": -3.5175781,
"logprob": -1.9013672,
"special": false,
"text": " avec"
}
]
],
"top_tokens": null
},
"generated_text": " le faire réchauffer au bain-marie avec"
"generated_text": " le faire fondre à la casserole avec"
}
......@@ -5,7 +5,7 @@
"index": 0,
"logprobs": null,
"message": {
"content": "As of your last question, the weather in Brooklyn, New York, is typically hot and humid throughout the year. The suburbs around New York City are jealously sheltered, and at least in the Lower Bronx, there are very few outdoor environments to explore in the middle of urban confines. In fact, typical times for humidity levels in Brooklyn include:\n\n- Early morning: 80-85% humidity, with occas",
"content": "As of your last question, the weather in Brooklyn, New York, is typically hot and humid throughout the year. The suburbs around New York City are jealously sheltered, and at least in the Lower Bronx, there are very few outdoor environments to appreciate nature.\n\nIn terms of temperature, the warmest times of the year are from June to August, when average high temperatures typically range from around 73°F or 23°C",
"name": null,
"role": "assistant",
"tool_calls": null
......@@ -13,14 +13,14 @@
"usage": null
}
],
"created": 1716553098,
"created": 1724792495,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion",
"system_fingerprint": "2.0.5-dev0-native",
"object": "chat.completion",
"system_fingerprint": "2.2.1-dev0-native",
"usage": {
"completion_tokens": 100,
"prompt_tokens": 62,
"total_tokens": 162
"prompt_tokens": 61,
"total_tokens": 161
}
}
{
"choices": [
{
"finish_reason": "eos_token",
"index": 1,
"finish_reason": "length",
"index": 0,
"logprobs": null,
"text": " PR for more information?"
"text": " A Beginner’s Guide\nDeep learning is a subset"
},
{
"finish_reason": "length",
"index": 0,
"index": 1,
"logprobs": null,
"text": "le Business Incubator is providing a workspace"
"text": " This is a question that has puzzled many people for"
},
{
"finish_reason": "length",
"index": 2,
"index": 3,
"logprobs": null,
"text": " severely flawed and often has a substandard"
"text": "usculas_minusculas(s):\n \"\"\"\n"
},
{
"finish_reason": "length",
"index": 3,
"index": 2,
"logprobs": null,
"text": "hd20220811-"
"text": " Paris\nWhat is the capital of France?\nThe"
}
],
"created": 1713284455,
"created": 1725877154,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native",
"system_fingerprint": "2.2.1-dev0-native",
"usage": {
"completion_tokens": 36,
"prompt_tokens": 8,
"total_tokens": 44
"completion_tokens": 40,
"prompt_tokens": 22,
"total_tokens": 62
}
}
......@@ -5,14 +5,14 @@
"finish_reason": "",
"index": 0,
"logprobs": null,
"text": "\n"
"text": " A"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -20,14 +20,14 @@
"finish_reason": "",
"index": 1,
"logprobs": null,
"text": "\n"
"text": " This"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -35,14 +35,14 @@
"finish_reason": "",
"index": 2,
"logprobs": null,
"text": "\n"
"text": " Paris"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -50,14 +50,14 @@
"finish_reason": "",
"index": 3,
"logprobs": null,
"text": "hd"
"text": "us"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -65,14 +65,14 @@
"finish_reason": "",
"index": 0,
"logprobs": null,
"text": "\n"
"text": " Beginner"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -80,14 +80,14 @@
"finish_reason": "",
"index": 1,
"logprobs": null,
"text": "\n"
"text": " is"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -98,11 +98,11 @@
"text": "\n"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -110,14 +110,14 @@
"finish_reason": "",
"index": 3,
"logprobs": null,
"text": "aho"
"text": "cul"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -125,14 +125,14 @@
"finish_reason": "",
"index": 0,
"logprobs": null,
"text": "2"
"text": "’s"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -140,14 +140,14 @@
"finish_reason": "",
"index": 1,
"logprobs": null,
"text": "2"
"text": " a"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -155,14 +155,14 @@
"finish_reason": "",
"index": 2,
"logprobs": null,
"text": "2"
"text": "What"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -170,14 +170,14 @@
"finish_reason": "",
"index": 3,
"logprobs": null,
"text": "ima"
"text": "as"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -185,14 +185,14 @@
"finish_reason": "",
"index": 0,
"logprobs": null,
"text": "."
"text": " Guide"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -200,14 +200,14 @@
"finish_reason": "",
"index": 1,
"logprobs": null,
"text": "."
"text": " question"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -215,14 +215,14 @@
"finish_reason": "",
"index": 2,
"logprobs": null,
"text": "."
"text": " is"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -230,14 +230,14 @@
"finish_reason": "",
"index": 3,
"logprobs": null,
"text": "\n"
"text": "_minus"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -245,14 +245,14 @@
"finish_reason": "",
"index": 0,
"logprobs": null,
"text": " Sarah"
"text": "\n"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -260,14 +260,14 @@
"finish_reason": "",
"index": 1,
"logprobs": null,
"text": " Yes"
"text": " that"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -275,14 +275,14 @@
"finish_reason": "",
"index": 2,
"logprobs": null,
"text": " And"
"text": " the"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -290,14 +290,14 @@
"finish_reason": "",
"index": 3,
"logprobs": null,
"text": "i"
"text": "cul"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -305,14 +305,14 @@
"finish_reason": "",
"index": 0,
"logprobs": null,
"text": "'"
"text": "Deep"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -320,14 +320,14 @@
"finish_reason": "",
"index": 1,
"logprobs": null,
"text": ","
"text": " has"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -335,14 +335,14 @@
"finish_reason": "",
"index": 2,
"logprobs": null,
"text": " what"
"text": " capital"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -350,14 +350,14 @@
"finish_reason": "",
"index": 3,
"logprobs": null,
"text": "'"
"text": "as"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -365,14 +365,14 @@
"finish_reason": "",
"index": 0,
"logprobs": null,
"text": "s"
"text": " learning"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -380,14 +380,14 @@
"finish_reason": "",
"index": 1,
"logprobs": null,
"text": " Moh"
"text": " puzzled"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -395,14 +395,14 @@
"finish_reason": "",
"index": 2,
"logprobs": null,
"text": " is"
"text": " of"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -410,14 +410,14 @@
"finish_reason": "",
"index": 3,
"logprobs": null,
"text": "m"
"text": "(s"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -425,14 +425,14 @@
"finish_reason": "",
"index": 0,
"logprobs": null,
"text": " Room"
"text": " is"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -440,14 +440,14 @@
"finish_reason": "",
"index": 1,
"logprobs": null,
"text": "s"
"text": " many"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -455,14 +455,14 @@
"finish_reason": "",
"index": 2,
"logprobs": null,
"text": " the"
"text": " France"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -470,14 +470,14 @@
"finish_reason": "",
"index": 3,
"logprobs": null,
"text": " tired"
"text": "):\n"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -485,14 +485,14 @@
"finish_reason": "",
"index": 0,
"logprobs": null,
"text": ":"
"text": " a"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -500,14 +500,14 @@
"finish_reason": "",
"index": 1,
"logprobs": null,
"text": "'"
"text": " people"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -515,14 +515,14 @@
"finish_reason": "",
"index": 2,
"logprobs": null,
"text": " capital"
"text": "?\n"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
......@@ -530,73 +530,73 @@
"finish_reason": "",
"index": 3,
"logprobs": null,
"text": " of"
"text": " "
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
{
"finish_reason": "",
"finish_reason": "length",
"index": 0,
"logprobs": null,
"text": " She"
"text": " subset"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
{
"finish_reason": "",
"finish_reason": "length",
"index": 1,
"logprobs": null,
"text": " scale"
"text": " for"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
{
"finish_reason": "",
"finish_reason": "length",
"index": 2,
"logprobs": null,
"text": " of"
"text": "The"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
},
{
"choices": [
{
"finish_reason": "",
"finish_reason": "length",
"index": 3,
"logprobs": null,
"text": " being"
"text": " \"\"\"\n"
}
],
"created": 1713284431,
"created": 1725883643,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native"
"system_fingerprint": "2.2.1-dev0-native"
}
]
......@@ -4,17 +4,17 @@
"finish_reason": "length",
"index": 0,
"logprobs": null,
"text": " PR for flake8"
"text": " A Beginner’s Guide\nDeep learning is a subset"
}
],
"created": 1713284454,
"created": 1725876621,
"id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "text_completion",
"system_fingerprint": "2.0.1-native",
"system_fingerprint": "2.2.1-dev0-native",
"usage": {
"completion_tokens": 5,
"completion_tokens": 10,
"prompt_tokens": 6,
"total_tokens": 11
"total_tokens": 16
}
}
[
{
"choices": [
{
"delta": {
"content": "**",
"role": "assistant",
"tool_calls": null
},
"finish_reason": null,
"index": 0,
"logprobs": null
}
],
"created": 1726656043,
"id": "",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "chat.completion.chunk",
"system_fingerprint": "2.2.1-dev0-native",
"usage": null
},
{
"choices": [
{
"delta": {
"content": "Deep",
"role": "assistant",
"tool_calls": null
},
"finish_reason": null,
"index": 0,
"logprobs": null
}
],
"created": 1726656043,
"id": "",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "chat.completion.chunk",
"system_fingerprint": "2.2.1-dev0-native",
"usage": null
},
{
"choices": [
{
"delta": {
"content": " Learning",
"role": "assistant",
"tool_calls": null
},
"finish_reason": null,
"index": 0,
"logprobs": null
}
],
"created": 1726656043,
"id": "",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "chat.completion.chunk",
"system_fingerprint": "2.2.1-dev0-native",
"usage": null
},
{
"choices": [
{
"delta": {
"content": ":",
"role": "assistant",
"tool_calls": null
},
"finish_reason": null,
"index": 0,
"logprobs": null
}
],
"created": 1726656043,
"id": "",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "chat.completion.chunk",
"system_fingerprint": "2.2.1-dev0-native",
"usage": null
},
{
"choices": [
{
"delta": {
"content": " An",
"role": "assistant",
"tool_calls": null
},
"finish_reason": null,
"index": 0,
"logprobs": null
}
],
"created": 1726656043,
"id": "",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "chat.completion.chunk",
"system_fingerprint": "2.2.1-dev0-native",
"usage": null
},
{
"choices": [
{
"delta": {
"content": " Overview",
"role": "assistant",
"tool_calls": null
},
"finish_reason": null,
"index": 0,
"logprobs": null
}
],
"created": 1726656043,
"id": "",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "chat.completion.chunk",
"system_fingerprint": "2.2.1-dev0-native",
"usage": null
},
{
"choices": [
{
"delta": {
"content": "**\n",
"role": "assistant",
"tool_calls": null
},
"finish_reason": null,
"index": 0,
"logprobs": null
}
],
"created": 1726656044,
"id": "",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "chat.completion.chunk",
"system_fingerprint": "2.2.1-dev0-native",
"usage": null
},
{
"choices": [
{
"delta": {
"content": "================================",
"role": "assistant",
"tool_calls": null
},
"finish_reason": null,
"index": 0,
"logprobs": null
}
],
"created": 1726656044,
"id": "",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "chat.completion.chunk",
"system_fingerprint": "2.2.1-dev0-native",
"usage": null
},
{
"choices": [
{
"delta": {
"content": "=====",
"role": "assistant",
"tool_calls": null
},
"finish_reason": null,
"index": 0,
"logprobs": null
}
],
"created": 1726656044,
"id": "",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "chat.completion.chunk",
"system_fingerprint": "2.2.1-dev0-native",
"usage": null
},
{
"choices": [
{
"delta": {
"content": "\n\n",
"role": "assistant",
"tool_calls": null
},
"finish_reason": "length",
"index": 0,
"logprobs": null
}
],
"created": 1726656044,
"id": "",
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"object": "chat.completion.chunk",
"system_fingerprint": "2.2.1-dev0-native",
"usage": {
"completion_tokens": 10,
"prompt_tokens": 40,
"total_tokens": 50
}
}
]
{
"details": {
"best_of_sequences": null,
"finish_reason": "length",
"generated_tokens": 10,
"prefill": [
{
"id": 100000,
"logprob": null,
"text": "<|begin▁of▁sentence|>"
},
{
"id": 3533,
"logprob": -9.625,
"text": "Test"
},
{
"id": 3102,
"logprob": -11.25,
"text": " request"
}
],
"seed": null,
"tokens": [
{
"id": 185,
"logprob": -1.546875,
"special": false,
"text": "\n"
},
{
"id": 549,
"logprob": -2.859375,
"special": false,
"text": "The"
},
{
"id": 1727,
"logprob": -2.484375,
"special": false,
"text": " test"
},
{
"id": 3102,
"logprob": -0.83203125,
"special": false,
"text": " request"
},
{
"id": 317,
"logprob": -1.1484375,
"special": false,
"text": " is"
},
{
"id": 245,
"logprob": -1.578125,
"special": false,
"text": " a"
},
{
"id": 3412,
"logprob": -2.578125,
"special": false,
"text": " document"
},
{
"id": 344,
"logprob": -1.125,
"special": false,
"text": " that"
},
{
"id": 317,
"logprob": -1.6953125,
"special": false,
"text": " is"
},
{
"id": 1222,
"logprob": -1.71875,
"special": false,
"text": " used"
}
],
"top_tokens": null
},
"generated_text": "\nThe test request is a document that is used"
}
{
"details": {
"best_of_sequences": null,
"finish_reason": "eos_token",
"generated_tokens": 4,
"prefill": [
{
"id": 100000,
"logprob": null,
"text": "<|begin▁of▁sentence|>"
},
{
"id": 3533,
"logprob": -9.625,
"text": "Test"
},
{
"id": 3102,
"logprob": -11.25,
"text": " request"
}
],
"seed": 0,
"tokens": [
{
"id": 2143,
"logprob": -1.828125,
"special": false,
"text": " sent"
},
{
"id": 10081,
"logprob": -0.41210938,
"special": false,
"text": " successfully"
},
{
"id": 13,
"logprob": 0.0,
"special": false,
"text": "."
},
{
"id": 100001,
"logprob": -0.16015625,
"special": true,
"text": "<|end▁of▁sentence|>"
}
],
"top_tokens": null
},
"generated_text": "Test request sent successfully."
}
[
{
"details": {
"best_of_sequences": null,
"finish_reason": "length",
"generated_tokens": 10,
"prefill": [
{
"id": 100000,
"logprob": null,
"text": "<|begin▁of▁sentence|>"
},
{
"id": 3533,
"logprob": -9.625,
"text": "Test"
},
{
"id": 3102,
"logprob": -11.25,
"text": " request"
}
],
"seed": null,
"tokens": [
{
"id": 185,
"logprob": -1.546875,
"special": false,
"text": "\n"
},
{
"id": 549,
"logprob": -2.859375,
"special": false,
"text": "The"
},
{
"id": 1727,
"logprob": -2.4375,
"special": false,
"text": " test"
},
{
"id": 3102,
"logprob": -0.83984375,
"special": false,
"text": " request"
},
{
"id": 317,
"logprob": -1.1328125,
"special": false,
"text": " is"
},
{
"id": 254,
"logprob": -1.515625,
"special": false,
"text": " the"
},
{
"id": 1022,
"logprob": -1.15625,
"special": false,
"text": " first"
},
{
"id": 3458,
"logprob": -0.3671875,
"special": false,
"text": " step"
},
{
"id": 279,
"logprob": -0.88671875,
"special": false,
"text": " in"
},
{
"id": 254,
"logprob": -0.69140625,
"special": false,
"text": " the"
}
],
"top_tokens": null
},
"generated_text": "\nThe test request is the first step in the"
},
{
"details": {
"best_of_sequences": null,
"finish_reason": "length",
"generated_tokens": 10,
"prefill": [
{
"id": 100000,
"logprob": null,
"text": "<|begin▁of▁sentence|>"
},
{
"id": 3533,
"logprob": -9.625,
"text": "Test"
},
{
"id": 3102,
"logprob": -11.25,
"text": " request"
}
],
"seed": null,
"tokens": [
{
"id": 185,
"logprob": -1.546875,
"special": false,
"text": "\n"
},
{
"id": 549,
"logprob": -2.859375,
"special": false,
"text": "The"
},
{
"id": 1727,
"logprob": -2.4375,
"special": false,
"text": " test"
},
{
"id": 3102,
"logprob": -0.83984375,
"special": false,
"text": " request"
},
{
"id": 317,
"logprob": -1.1328125,
"special": false,
"text": " is"
},
{
"id": 254,
"logprob": -1.515625,
"special": false,
"text": " the"
},
{
"id": 1022,
"logprob": -1.15625,
"special": false,
"text": " first"
},
{
"id": 3458,
"logprob": -0.3671875,
"special": false,
"text": " step"
},
{
"id": 279,
"logprob": -0.88671875,
"special": false,
"text": " in"
},
{
"id": 254,
"logprob": -0.69140625,
"special": false,
"text": " the"
}
],
"top_tokens": null
},
"generated_text": "\nThe test request is the first step in the"
},
{
"details": {
"best_of_sequences": null,
"finish_reason": "length",
"generated_tokens": 10,
"prefill": [
{
"id": 100000,
"logprob": null,
"text": "<|begin▁of▁sentence|>"
},
{
"id": 3533,
"logprob": -9.625,
"text": "Test"
},
{
"id": 3102,
"logprob": -11.25,
"text": " request"
}
],
"seed": null,
"tokens": [
{
"id": 185,
"logprob": -1.546875,
"special": false,
"text": "\n"
},
{
"id": 549,
"logprob": -2.859375,
"special": false,
"text": "The"
},
{
"id": 1727,
"logprob": -2.4375,
"special": false,
"text": " test"
},
{
"id": 3102,
"logprob": -0.83984375,
"special": false,
"text": " request"
},
{
"id": 317,
"logprob": -1.1328125,
"special": false,
"text": " is"
},
{
"id": 254,
"logprob": -1.515625,
"special": false,
"text": " the"
},
{
"id": 1022,
"logprob": -1.15625,
"special": false,
"text": " first"
},
{
"id": 3458,
"logprob": -0.3671875,
"special": false,
"text": " step"
},
{
"id": 279,
"logprob": -0.88671875,
"special": false,
"text": " in"
},
{
"id": 254,
"logprob": -0.69140625,
"special": false,
"text": " the"
}
],
"top_tokens": null
},
"generated_text": "\nThe test request is the first step in the"
},
{
"details": {
"best_of_sequences": null,
"finish_reason": "length",
"generated_tokens": 10,
"prefill": [
{
"id": 100000,
"logprob": null,
"text": "<|begin▁of▁sentence|>"
},
{
"id": 3533,
"logprob": -9.625,
"text": "Test"
},
{
"id": 3102,
"logprob": -11.25,
"text": " request"
}
],
"seed": null,
"tokens": [
{
"id": 185,
"logprob": -1.546875,
"special": false,
"text": "\n"
},
{
"id": 549,
"logprob": -2.859375,
"special": false,
"text": "The"
},
{
"id": 1727,
"logprob": -2.4375,
"special": false,
"text": " test"
},
{
"id": 3102,
"logprob": -0.83984375,
"special": false,
"text": " request"
},
{
"id": 317,
"logprob": -1.1328125,
"special": false,
"text": " is"
},
{
"id": 254,
"logprob": -1.515625,
"special": false,
"text": " the"
},
{
"id": 1022,
"logprob": -1.15625,
"special": false,
"text": " first"
},
{
"id": 3458,
"logprob": -0.3671875,
"special": false,
"text": " step"
},
{
"id": 279,
"logprob": -0.88671875,
"special": false,
"text": " in"
},
{
"id": 254,
"logprob": -0.69140625,
"special": false,
"text": " the"
}
],
"top_tokens": null
},
"generated_text": "\nThe test request is the first step in the"
}
]
......@@ -11,12 +11,12 @@
},
{
"id": 2015,
"logprob": -10.0,
"logprob": -10.0625,
"text": "Test"
},
{
"id": 3853,
"logprob": -10.875,
"logprob": -11.0,
"text": " request"
}
],
......@@ -24,7 +24,7 @@
"tokens": [
{
"id": 7539,
"logprob": -0.73046875,
"logprob": -0.609375,
"special": false,
"text": " forms"
},
......@@ -36,7 +36,7 @@
},
{
"id": 671,
"logprob": -1.703125,
"logprob": -1.5546875,
"special": false,
"text": " an"
},
......@@ -66,24 +66,24 @@
},
{
"id": 11859,
"logprob": -1.6953125,
"logprob": -1.953125,
"special": false,
"text": " lab"
},
{
"id": 2185,
"logprob": -1.3125,
"logprob": -1.7734375,
"special": false,
"text": " process"
},
{
"id": 578,
"logprob": -1.5,
"id": 235265,
"logprob": 0.0,
"special": false,
"text": " and"
"text": "."
}
],
"top_tokens": null
},
"generated_text": "Test request forms are an essential part of the lab process and"
"generated_text": "Test request forms are an essential part of the lab process."
}
......@@ -11,12 +11,12 @@
},
{
"id": 2015,
"logprob": -10.0,
"logprob": -10.0625,
"text": "Test"
},
{
"id": 3853,
"logprob": -10.875,
"logprob": -11.0,
"text": " request"
}
],
......@@ -24,13 +24,13 @@
"tokens": [
{
"id": 1736,
"logprob": -2.09375,
"logprob": -2.109375,
"special": false,
"text": " form"
},
{
"id": 109,
"logprob": -1.8671875,
"logprob": -1.90625,
"special": false,
"text": "\n\n"
},
......@@ -42,43 +42,43 @@
},
{
"id": 2121,
"logprob": -1.8203125,
"logprob": -1.796875,
"special": false,
"text": " test"
},
{
"id": 3853,
"logprob": -0.23242188,
"logprob": -0.24511719,
"special": false,
"text": " request"
},
{
"id": 1736,
"logprob": -0.08544922,
"logprob": -0.09326172,
"special": false,
"text": " form"
},
{
"id": 603,
"logprob": -0.9375,
"logprob": -0.95703125,
"special": false,
"text": " is"
},
{
"id": 1671,
"logprob": -1.671875,
"logprob": -1.5859375,
"special": false,
"text": " used"
},
{
"id": 577,
"logprob": -0.40429688,
"logprob": -0.39257812,
"special": false,
"text": " to"
},
{
"id": 3853,
"logprob": -1.1875,
"logprob": -1.25,
"special": false,
"text": " request"
}
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment