Commits · 780531ec771e429b173b5fa0f976f3bbf06e7649 · OpenDAS / text-generation-inference

22 Nov, 2024 1 commit
- chore: prepare 2.4.1 release (#2773) · 780531ec
  OlivierDehaene authored Nov 22, 2024
```
* chore: prepare 2.4.1 release

* fix tests

* fmt
```
  780531ec
21 Nov, 2024 2 commits
- feat: add payload limit (#2726) · ab7ccf5b
  OlivierDehaene authored Nov 21, 2024
```
* feat: add payload limit

* update launcher
```
  ab7ccf5b
- Remove guideline from API (#2762) · d012f229
  Lucain authored Nov 21, 2024
  
  d012f229
19 Nov, 2024 1 commit

PR 2634 CI - Fix the tool_choice format for named choice by adapting OpenAIs scheme (#2645) · 5489406c

drbh authored Nov 19, 2024



* add OpenAI like tool_choice for named choice

* add tests

* fix: run linter and bump api docs

* fix: consolidate changes and remove old tool type

* feat: improve, simplify and rename tool choice struct add required support and refactor

* fix: simplify tool choice logic, improve tests, openapi and rust docs

* fix: refactor away prepare_chat_input and improve tool grammar apply control flow

* feat: update docs and add tool choice configuration section

* fix: simplify naming, tool choice default and improve test

* fix: adjust tool choice none logic, add test and small refactors

* fix: add missing snapshot file

* fix: adjust tool choice type in test

* fix: adjust default when json tool choice is

* fix: remove trailing space lint after rebase

* fix: remove mostly mocked unit test

---------
Co-authored-by: Linus Bierhoff <linus.bierhoff@icloud.com>

5489406c

15 Nov, 2024 1 commit
- fix response type of document for Text Generation Inference (#2743) · 003eaec0
  jito authored Nov 15, 2024
```
Signed-off-by: jitokim <pigberger70@gmail.com>
```
  003eaec0
10 Nov, 2024 1 commit

Add initial support for compressed-tensors checkpoints (#2732) · a7850008

Daniël de Kok authored Nov 10, 2024

compressed-tensors is a safetensors extension for sparse, quantized
tensors. The format is more powerful than earlier AWQ/GPTQ/FP8
quantization, because

- Different quantizer configurations can be used for different targets.
- The format can specify input/output quantizers in addition to weight
  quantizers.
- Configurable exclusions for quantization.

This change adds a dependency on the `compressed-tensors` package for
its configuration parsing and layer matching functionality.

The following types of quantization are supported in this PR:

- W8A16 and W4A16 INT using GPTQ-Marlin kernels.
- W8A8 and W8A16 FP using FP8-Marlin and cutlass kernels.

Support for other quantization types will be added in subsequent PRs.

a7850008

04 Nov, 2024 1 commit
- fix: add chat_tokenize endpoint to api docs (#2710) · 08c4184e
  drbh authored Nov 04, 2024
  
  08c4184e
30 Oct, 2024 1 commit

Support qwen2 vl (#2689) · befd9f67

drbh authored Oct 30, 2024

* feat: add support for qwen2 vl model

* feat: fix token padding, enable warmup and process basic request

* fix: improve get_position_ids, add lift embed_tokens

* fix: remove get_cos_sin_hack dev function

* feat: add simple test chat with meesage and text

* fix: lint test

* fix: adjust positional embeddings for multi dimensional position ids

* fix: update docs and lint unused vars

* fix: include linted file

* fix: add norm after text output

* fix: format model file

* fix: adjust for ruff lints

* fix: remove unused rotate_half

* feat: refactors and calc num features

* fix: prefer position_ids passed from vlm causal lm and reset ids on batch

* fix: adjust get_position_ids if not available and add required args to signatures

* fix: adjust resize case for qwen2_vl warmup

* fix: avoid qwen2 vl specific paths with qwen2

befd9f67

28 Oct, 2024 1 commit

Choosing input/total tokens automatically based on available VRAM? (#2673) · 0c9b6cdd

Nicolas Patry authored Oct 28, 2024

* Choosing input/total tokens automatically based on available VRAM?

* Update doc.

* Remove generated files.

* Trying to fix non chunking targets.

* Attempt #2

* fix.

* QuantLinear is rocm compatible.

* Much simpler logic after the overhead.

* Updating logic + non flash.

* Revert doc text.

* Simple updates.

* Fix integration mt0 (transformers update).

0c9b6cdd

25 Oct, 2024 1 commit
- chore: prepare 2.4.0 release (#2695) · a6b02da9
  OlivierDehaene authored Oct 25, 2024
  
  a6b02da9
23 Oct, 2024 2 commits
- feat: allow any supported payload on /invocations (#2683) · 41c26237
  OlivierDehaene authored Oct 23, 2024
```
* feat: allow any supported payload on /invocations

* update openAPI

* update doc
```
  41c26237
- feat: natively support Granite models (#2682) · 03c9388b
  OlivierDehaene authored Oct 23, 2024
```
* feat: natively support Granite models

* Update doc
```
  03c9388b
17 Oct, 2024 1 commit
- Support `e4m3fn` KV cache (#2655) · 5bbe1ce0
  Daniël de Kok authored Oct 17, 2024
```
* Support `e4m3fn` KV cache

* Make check more obvious
```
  5bbe1ce0
15 Oct, 2024 1 commit
- Fixing linters. (#2650) · cf04a43f
  Nicolas Patry authored Oct 15, 2024
  
  cf04a43f
14 Oct, 2024 2 commits

Clarify gated description and quicktour (#2631) · 51f54018
Omar Sanseviero authored Oct 14, 2024
```
Update quicktour.md
```
51f54018

Small fixes for supported models (#2471) · ce28ee88

Omar Sanseviero authored Oct 14, 2024



* Small improvements for docs

* Update _toctree.yml

* Updating the doc (we keep the list actually).

---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

ce28ee88

10 Oct, 2024 1 commit
- Update documentation to most recent stable version of TGI. (#2625) · d912f0bf
  vb authored Oct 10, 2024
```
Update to most recent stable version of TGI.
```
  d912f0bf
08 Oct, 2024 1 commit

CI (2599): Update ToolType input schema (#2601) · 8ad20daf

drbh authored Oct 08, 2024



* Update ToolType input schema

* lint

* fix: run formatter

* fix: allow tool choide to be null

---------
Co-authored-by: Wauplin <lucainp@gmail.com>

8ad20daf

04 Oct, 2024 1 commit

Add basic FP8 KV cache support (#2603) · 2358c2bb

Daniël de Kok authored Oct 04, 2024

* Add basic FP8 KV cache support

This change adds rudimentary FP8 KV cache support. The support is
enabled by passing `--kv-cache-dtype fp8_e5m2` to the launcher. Doing so
uses this type for the KV cache. However support is still limited:

* Only the `fp8_e5m2` type is supported.
* The KV cache layout is the same as `float16`/`bfloat16` (HND).
* The FP8 KV cache is only supported for FlashInfer.
* Loading of scales is not yet supported.

* Fix Cargo.toml

2358c2bb

03 Oct, 2024 2 commits
- Revert "Unroll notify error into generate response" (#2605) · 3011639f
  drbh authored Oct 03, 2024
```
Revert "Unroll notify error into generate response (#2597)"

This reverts commit d22b0c1f.
```
  3011639f
- New release 2.3.1 (#2604) · f6e2f05b
  Nicolas Patry authored Oct 03, 2024
```
* New release 2.3.1

* Update doc number
```
  f6e2f05b
02 Oct, 2024 3 commits

Unroll notify error into generate response (#2597) · d22b0c1f

drbh authored Oct 02, 2024

* feat: unroll notify_error if no tool is choosen

* fix: expect simple message when no tool is selected

* fix: improve test to avoid notify_error

* fix: improve docs and indicate change in expected response

* fix: adjust linting in test file

d22b0c1f

CI (2592): Allow LoRA adapter revision in server launcher (#2602) · 23354595

drbh authored Oct 02, 2024



allow revision for lora adapters from launcher
Co-authored-by: Sida <sida@kulamind.com>
Co-authored-by: teamclouday <teamclouday@gmail.com>

23354595

Mllama flash version (#2585) · d18ed5cf

Nicolas Patry authored Oct 02, 2024

* Working loading state.

* Preprocessing.

* Working state ? (Broke idefics1 temporarily).

* Cleaner condition.

* Fix idefics.

* Updating config, removing TODO

* Mllama

* Ugrade transformers 4.45

* Flashing mllama.

* Starting to get there.

* Working state.

* Integrations tests for mllama (cutting to 10 tokens because there seems'
to be instability after (meaning size of the batch matters.

* Updating model link.

* Earlier assert.

* Fix vlm ?

* remove log.

* Force ignore all images but last.

* Default dtype bfloat16.

* Update integration test after switch to bf16.

* Remove dead code.

* Removed dead code.

* Upgrade the flake to latest transformers/tokenizers

* Move to hf tgi-nix

* Upgrade to 0.5.0

d18ed5cf

30 Sep, 2024 3 commits

feat: support phi3.5 moe (#2479) · 93a7042d

drbh authored Sep 30, 2024



* feat: support phi3.5 moe model loading

* fix: prefer llama base model and improve rotary logic

* feat: return reasonable generation and add integration test

* fix: run lint and update docs

* fix: rerun lint for openapi docs

* fix: prefer do_sample false unless temp is set by user, and update chat tests

* fix: small typo adjustments

* fix: consolidate long rope paths

* fix: revert greedy by default and test changes

* Vendor configuration so that we don't have to `trust_remote_code`

* Use SparseMoELayer

* Add support for dense MoE

* Some type annotations

* Add the usual model tests

* Ruff.

---------
Co-authored-by: Daniël de Kok <me@danieldk.eu>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

93a7042d

Update ROCM libs and improvements (#2579) · f9e561ec

Mohit Sharma authored Sep 30, 2024

* style

* update torch

* ix issues

* fix clone

* revert mkl

* added custom PA

* style

* fix style

* style

* hide env vart

* fix mixtral model

* add skinny kernel and merge fixes

* fixed style

* fix issue for sliding window models

* addressed review comments

* fix import

* improved error messag

* updated default value

* remove import

* fix imports after rebase

* float16 dep

* improve dockerfile

* cleaned dockerfile

f9e561ec

Update architecture.md (#2577) · e790cfc0
Ikram Ul Haq authored Sep 30, 2024

e790cfc0

24 Sep, 2024 2 commits

remove LORA_ADAPTERS_PATH (#2563) · 7efcb5e0
Nicholas Broad authored Sep 24, 2024
```
specify how to call local adapters
```
7efcb5e0

Adding note for private models in quick-tour document (#2548) · e6d29656

Aritra Roy Gosthipaty authored Sep 24, 2024



* chore: adding note for private models in quicktour doc

* Update docs/source/quicktour.md
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* Update docs/source/quicktour.md
Co-authored-by: vb <vaibhavs10@gmail.com>

* Update docs/source/quicktour.md
Co-authored-by: vb <vaibhavs10@gmail.com>

---------
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: vb <vaibhavs10@gmail.com>

e6d29656

20 Sep, 2024 1 commit
- Preparing for release. (#2540) · 169178b9
  Nicolas Patry authored Sep 20, 2024
```
* Preparing for release.

* Upgrade version in docs.
```
  169178b9
19 Sep, 2024 2 commits

doc: clarify that `--quantize` is not needed for pre-quantized models (#2536) · abd24dd3
Daniël de Kok authored Sep 19, 2024

abd24dd3

Stream options. (#2533) · f512021e

Nicolas Patry authored Sep 19, 2024

* Stream options.

* Fetch stuff from nix integration test for easier testing.

* Adding the assert.

* Only send the usage when asked for.

* Update the docs.

* Impure test because we need network.

* develop.

* Optional usage.

* Fixes.

* Workflow

f512021e

06 Sep, 2024 1 commit

Add links to Adyen blogpost (#2500) · aaea212d

Martin Iglesias Goyanes authored Sep 06, 2024



* Add links to Adyen blogpost

* Adding to toctree.

* Update external.md

* Update _toctree.yml

---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

aaea212d

05 Sep, 2024 1 commit
- Adding links to Adyen blogpost. (#2492) · 8b96a182
  Nicolas Patry authored Sep 05, 2024
  
  8b96a182
29 Aug, 2024 2 commits

update doc with intel cpu part (#2420) · 9883f3b4

Wang, Yi authored Aug 29, 2024



* update doc with intel cpu part
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Apply suggestions from code review

we do not use latest ever in documentation, it causes too many issues for users. Release number get update on every release.

---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

9883f3b4

feat: add /v1/models endpoint (#2433) · d5202c46

drbh authored Aug 29, 2024

* feat: add /v1/models endpoint

* feat: add /v1/models endpoint

* fix: remove unused type import

* fix: revert route typo

* fix: update docs with new endpoint

* fix: add to redocly ignore and lint

d5202c46

28 Aug, 2024 1 commit
- fix: improve regex expression (#2468) · 8f99f165
  drbh authored Aug 28, 2024
  
  8f99f165
27 Aug, 2024 1 commit

Pr 2451 ci branch (#2454) · cfa73b5c

drbh authored Aug 26, 2024



* fix[router]: Fix tools not passed in chat template
Signed-off-by: GitHub <noreply@github.com>

* feat: improve default tool serialization and lints

* feat: refactor tool logic to include notify_error in prompt and adjust typing

* fix: adjust non tool template apply

* fix: simplify tool grammar logic and improve schema

* feat: avoid skip tool test and avoid empty tool prompts

* fix: increase test client timeout for grammar compilation tests

---------
Signed-off-by: GitHub <noreply@github.com>
Co-authored-by: Simone Rossi <simone.rossi.93@gmail.com>

cfa73b5c

16 Aug, 2024 2 commits

doc: Add metrics documentation and add a 'Reference' section (#2230) · 53729b74

Hugo Larcher authored Aug 16, 2024



* doc: Add metrics documentation and add a 'Reference' section

* doc: Add API reference

* doc: Refactor API reference

* fix: Message API link

* Bad rebase

* Moving the docs.

---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

53729b74

FIxing the CI. · cb0a2948
Nicolas Patry authored Aug 16, 2024

cb0a2948