Commits · a6b02da97166a3c76f6ff5075b10ff25bd41bde1 · OpenDAS / text-generation-inference

25 Oct, 2024 1 commit
- chore: prepare 2.4.0 release (#2695) · a6b02da9
  OlivierDehaene authored Oct 25, 2024
  
  a6b02da9
23 Oct, 2024 1 commit
- feat: allow any supported payload on /invocations (#2683) · 41c26237
  OlivierDehaene authored Oct 23, 2024
```
* feat: allow any supported payload on /invocations

* update openAPI

* update doc
```
  41c26237
15 Oct, 2024 1 commit
- Fixing linters. (#2650) · cf04a43f
  Nicolas Patry authored Oct 15, 2024
  
  cf04a43f
14 Oct, 2024 1 commit

Small fixes for supported models (#2471) · ce28ee88

Omar Sanseviero authored Oct 14, 2024



* Small improvements for docs

* Update _toctree.yml

* Updating the doc (we keep the list actually).

---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

ce28ee88

08 Oct, 2024 1 commit

CI (2599): Update ToolType input schema (#2601) · 8ad20daf

drbh authored Oct 08, 2024



* Update ToolType input schema

* lint

* fix: run formatter

* fix: allow tool choide to be null

---------
Co-authored-by: Wauplin <lucainp@gmail.com>

8ad20daf

03 Oct, 2024 1 commit
- New release 2.3.1 (#2604) · f6e2f05b
  Nicolas Patry authored Oct 03, 2024
```
* New release 2.3.1

* Update doc number
```
  f6e2f05b
20 Sep, 2024 1 commit
- Preparing for release. (#2540) · 169178b9
  Nicolas Patry authored Sep 20, 2024
```
* Preparing for release.

* Upgrade version in docs.
```
  169178b9
19 Sep, 2024 1 commit

Stream options. (#2533) · f512021e

Nicolas Patry authored Sep 19, 2024

* Stream options.

* Fetch stuff from nix integration test for easier testing.

* Adding the assert.

* Only send the usage when asked for.

* Update the docs.

* Impure test because we need network.

* develop.

* Optional usage.

* Fixes.

* Workflow

f512021e

29 Aug, 2024 1 commit

feat: add /v1/models endpoint (#2433) · d5202c46

drbh authored Aug 29, 2024

* feat: add /v1/models endpoint

* feat: add /v1/models endpoint

* fix: remove unused type import

* fix: revert route typo

* fix: update docs with new endpoint

* fix: add to redocly ignore and lint

d5202c46

27 Aug, 2024 1 commit

Pr 2451 ci branch (#2454) · cfa73b5c

drbh authored Aug 26, 2024



* fix[router]: Fix tools not passed in chat template
Signed-off-by: GitHub <noreply@github.com>

* feat: improve default tool serialization and lints

* feat: refactor tool logic to include notify_error in prompt and adjust typing

* fix: adjust non tool template apply

* fix: simplify tool grammar logic and improve schema

* feat: avoid skip tool test and avoid empty tool prompts

* fix: increase test client timeout for grammar compilation tests

---------
Signed-off-by: GitHub <noreply@github.com>
Co-authored-by: Simone Rossi <simone.rossi.93@gmail.com>

cfa73b5c

16 Aug, 2024 2 commits

FIxing the CI. · cb0a2948
Nicolas Patry authored Aug 16, 2024

cb0a2948

Improve the Consuming TGI + Streaming docs. (#2412) · 99b662f8

Vaibhav Srivastav authored Aug 16, 2024



* Improve the Consuming TGI docs.

* Fix erronous update to .

* add info about Open AI client.

* More updates.

* Apply suggestions from code review
Co-authored-by: Erik Kaunismäki <erik.kaum@gmail.com>

* Suggestions from Lucain.

* Update Gradio snippet.

* Up.

* Apply suggestions from code review
Co-authored-by: Lucain <lucainp@gmail.com>

* Update docs/source/basic_tutorials/consuming_tgi.md
Co-authored-by: Lucain <lucainp@gmail.com>

* Up.

* Apply suggestions from code review
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* Up.

* Up.

* Doc review from Nico.

* Doc review from Nico. x2

* Last nit

---------
Co-authored-by: Erik Kaunismäki <erik.kaum@gmail.com>
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

99b662f8

12 Aug, 2024 1 commit

fix: improve completions to send a final chunk with usage details (#2336) · 30395b09

drbh authored Aug 12, 2024

* fix: improve completions to send a final chunk with usage details

* fix: include finish reason string

* fix: remove dev debug trait and unneeded mut

* fix: update openapi schema

30395b09

09 Aug, 2024 2 commits
- feat: add guideline to chat request and template (#2391) · 0d06aed0
  drbh authored Aug 09, 2024
```
* feat: add guideline to chat request and template

* fix: add template test and update docs
```
  0d06aed0
- Using an enum for flash backens (paged/flashdecoding/flashinfer) (#2385) · 7a48a847
  Nicolas Patry authored Aug 09, 2024
```
* Using an enum for flash backens (paged/flashdecoding/flashinfer)

* Early exit on server too.

* Clippy.

* Fix clippy and fmt.
```
  7a48a847
08 Aug, 2024 1 commit

Update Quantization docs and minor doc fix. (#2368) · cb3ae302

Vaibhav Srivastav authored Aug 08, 2024



* Update Quantization docs and minor doc fix.

* update readme with latest quants info

* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* up

---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

cb3ae302

31 Jul, 2024 1 commit

Rebase TRT-llm (#2331) · 2b19d671

Nicolas Patry authored Jul 31, 2024

* wip

wip

refacto

refacto

Initial setup for CXX binding to TRTLLM

Working FFI call for TGI and TRTLLM backend

Remove unused parameters annd force tokenizer name to be set

Overall build TRTLLM and deps through CMake build system

Enable end to end CMake build

First version loading engines and making it ready for inference

Remembering to check how we can detect support for chunked context

Move to latest TensorRT-LLM version

Specify which default log level to use depending on CMake build type

make leader executor mode working

unconditionally call InitializeBackend on the FFI layer

bind to CUDA::nvml to retrieve compute capabilities at runtime

updated logic and comment to detect cuda compute capabilities

implement the Stream method to send new tokens through a callback

use spdlog release 1.14.1 moving forward

update trtllm to latest version a96cccafcf6365c128f004f779160951f8c0801c

correctly tell cmake to build dependent tensorrt-llm required libraries

create cmake install target to put everything relevant in installation folder

add auth_token CLI argument to provide hf hub authentification token

allow converting huggingface::tokenizers error to TensorRtLlmBackendError

use correct include for spdlog

include guard to build example in cmakelists

working setup of the ffi layer

remove fmt import

use external fmt lib

end to end ffi flow working

make sure to track include/ffi.h to trigger rebuild from cargo

impl the rust backend which currently cannot move the actual computation in background thread

expose shutdown function at ffi layer

impl RwLock scenario for TensorRtLllmBackend

oops missing c++ backend definitions

compute the number of maximum new tokens for each request independently

make sure the context is not dropped in the middle of the async decoding.

remove unnecessary log

add all the necessary plumbery to return the generated content

update invalid doc in cpp file

correctly forward back the log probabilities

remove unneeded scope variable for now

refactor Stream impl for Generation to factorise code

expose the internal missing start/queue timestamp

forward tgi parameters rep/freq penalty

add some more validation about grammar not supported

define a shared struct to hold the result of a decoding step

expose information about potential error happening while decoding

remove logging

add logging in case of decoding error

make sure executor_worker is provided

add initial Dockerfile for TRTLLM backend

add some more information in CMakeLists.txt to correctly install executorWorker

add some more information in CMakeLists.txt to correctly find and install nvrtc wrapper

simplify prebuilt trtllm libraries name definition

do the same name definition stuff for tensorrt_llm_executor_static

leverage pkg-config to probe libraries paths and reuse new install structure from cmake

fix bad copy/past missing nvinfer linkage direction

align all the linker search dependency

add missing pkgconfig folder for MPI in Dockerfile

correctly setup linking search path for runtime layer

fix missing / before tgi lib path

adding missing ld_library_path for cuda stubs in Dockerfile

update tgi entrypoint

commenting out Python part for TensorRT installation

refactored docker image

move to TensorRT-LLM v0.11.0

make docker linter happy with same capitalization rule

fix typo

refactor the compute capabilities detection along with num gpus

update TensorRT-LLM to latest version

update TensorRT install script to latest

update build.rs to link to cuda 12.5

add missing dependant libraries for linking

clean up a bit

install to decoder_attention target

add some custom stuff for nccl linkage

fix envvar CARGO_CFG_TARGET_ARCH set at runtime vs compile time

use std::env::const::ARCH

make sure variable live long enough...

look for cuda 12.5

add some more basic info in README.md

* Rebase.

* Fix autodocs.

* Let's try to enable trtllm backend.

* Ignore backends/v3 by default.

* Fixing client.

* Fix makefile + autodocs.

* Updating the schema thing + redocly.

* Fix trtllm lint.

* Adding pb files ?

* Remove cargo fmt temporarily.

* ?

* Tmp.

* Remove both check + clippy  ?

* Backporting telemetry.

* Backporting 457fb0a1



* Remove PB from git.

* Fixing PB with default member backends/client

* update TensorRT-LLM to latest version

* provided None for api_key

* link against libtensorrt_llm and not libtensorrt-llm

---------
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
Co-authored-by: Morgan Funtowicz <morgan@huggingface.co>

2b19d671

23 Jul, 2024 1 commit

Preparing for release. (#2285) · 5d121a97

Nicolas Patry authored Jul 23, 2024

* Preparing for release.

* Updating docs.

* Fixing token within the docker image for the launcher.

5d121a97

19 Jul, 2024 1 commit

fix: adjust default tool choice (#2244) · 68a9685f

drbh authored Jul 19, 2024

* fix: adjust default tool choice

* feat: improve tool choice syntax and response parsing/errors

* fix: remove dev tests

* feat: add ToolChoice to docs

68a9685f

09 Jul, 2024 2 commits
- Updating the self check (#2209) · 4c976fb4
  Nicolas Patry authored Jul 09, 2024
```
* Updating the self check

* Fix.

* Revert the CLI .

* cli.

* Space.

* Revert cargo update.
```
  4c976fb4
- Adding sanity check to openapi docs. · fe710af2
  Nicolas Patry authored Jul 09, 2024
  
  fe710af2
05 Jul, 2024 1 commit

Refactor dead code - Removing all `flash_xxx.py` files. (#2166) · fb2f74e2

Nicolas Patry authored Jul 05, 2024

* Refactor dead code.

* First working step.

* Remove a lot of duplicated code.

* More dead code.

* More cleanup.

* Fix Santacoder test.

* Fixing the simple tests.

* Fixing sharding.

* Fixes for VLM.

* Fixing santacoder (num_kv_heads hardcoded).

* Removing more dead code.

* Fixing `config.n_head`.

* Stopping earlier because of `<end_of_utterance>` in idefics2.

* Addresses comments.

* Removing the dead code.

* Fuse back mistral into FlashCausalLM.

* Finish removal.

* Fixing docs + causal_lm `batch_class`.

* Fixing docs + causal.lm.

* Add default to Gemma Causality.

* Default value for gemma/gemma2.

* Wrong default.

fb2f74e2

03 Jul, 2024 4 commits

Fixing missing `object` field for regular completions. (#2175) · 5ad41aa2
Nicolas Patry authored Jul 03, 2024
```
* Fixing missing `object` field for regular completions.

* Fixing docs by re-adding missing `Prompt`.
```
5ad41aa2
Revert "Fixing missing `object` field for regular completions." · be4a4c47
Nicolas Patry authored Jul 03, 2024
```
This reverts commit 2bbb7fa4.
```
be4a4c47
Fixing missing `object` field for regular completions. · 2bbb7fa4
Nicolas Patry authored Jul 03, 2024

2bbb7fa4

feat: improve update_docs for openapi schema (#2169) · 571530dd

drbh authored Jul 03, 2024

* feat: add pre commit step to force schema update when router changes

* fix: prefer improved update_doc and start server and compare

* fix: adjust typo

* fix: adjust revert typo

* fix: update workflow to use update_doc md command

* feat: improve workflow to check openapi schema too

* fix: adjust timeout for CI

* fix: adjust raise condition and install server in ci

* fix: install protoc before server

* feat: improve update doc and add command to print router schema

* fix: adjust autodoc workflow

* fix: explicitly install protoc and python

* fix: alllow trailing space in openapi schema diff

571530dd

23 May, 2024 1 commit

Add completion route to client and add stop parameter where it's missing (#1869) · 629047cb

Thomas Schillaci authored May 23, 2024

# What does this PR do?

- Add the stop parameter to the completion route
- Add the completion method to the python client
- Add the stop parameter to the python client's chat method


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation

).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil

---------
Co-authored-by: Thomas SCHILLACI <tschilla@px101.prod.exalead.com>
Co-authored-by: Thomas Schillaci <thomas.schillaci@3ds.com>

629047cb

18 Apr, 2024 2 commits
- v2.0.1 · 2d0a7173
  OlivierDehaene authored Apr 18, 2024
  
  2d0a7173
- Upgrading all versions. (#1759) · f9ee2c41
  Nicolas Patry authored Apr 18, 2024
  
  f9ee2c41
12 Apr, 2024 1 commit
- v2.0.0 (#1736) · c38a7d7d
  OlivierDehaene authored Apr 12, 2024
  
  c38a7d7d
29 Mar, 2024 1 commit
- v1.4.5 (#1686) · 4ee0a0c4
  OlivierDehaene authored Mar 29, 2024
  
  4ee0a0c4
22 Mar, 2024 1 commit
- v1.4.4 (#1668) · 6c4496a1
  OlivierDehaene authored Mar 22, 2024
  
  6c4496a1
28 Feb, 2024 1 commit
- v1.4.3 (#1609) · e6bb3ff8
  OlivierDehaene authored Feb 28, 2024
  
  e6bb3ff8
21 Feb, 2024 3 commits
- fix: fix openapi schema (#1586) · 010508ce
  OlivierDehaene authored Feb 21, 2024
  
  010508ce
- v1.4.2 (#1585) · 9c1cb81c
  OlivierDehaene authored Feb 21, 2024
  
  9c1cb81c
- fix(router): fix openapi and add jsonschema validation (#1578) · fa8a8e05
  OlivierDehaene authored Feb 21, 2024
  
  fa8a8e05
16 Feb, 2024 2 commits
- v1.4.1 (#1568) · 4139054b
  OlivierDehaene authored Feb 16, 2024
  
  4139054b
- chore: add pre-commit (#1569) · 9946165e
  OlivierDehaene authored Feb 16, 2024
  
  9946165e
26 Jan, 2024 2 commits
- v1.4.0 (#1494) · c2d4a3b5
  OlivierDehaene authored Jan 26, 2024
  
  c2d4a3b5
- Update the docs to include newer models. (#1492) · ebecc061
  Nicolas Patry authored Jan 26, 2024
  
  ebecc061