- 25 Oct, 2024 1 commit
-
-
OlivierDehaene authored
-
- 23 Oct, 2024 1 commit
-
-
OlivierDehaene authored
* feat: allow any supported payload on /invocations * update openAPI * update doc
-
- 15 Oct, 2024 1 commit
-
-
Nicolas Patry authored
-
- 14 Oct, 2024 1 commit
-
-
Omar Sanseviero authored
* Small improvements for docs * Update _toctree.yml * Updating the doc (we keep the list actually). --------- Co-authored-by:Nicolas Patry <patry.nicolas@protonmail.com>
-
- 08 Oct, 2024 1 commit
-
-
drbh authored
* Update ToolType input schema * lint * fix: run formatter * fix: allow tool choide to be null --------- Co-authored-by:Wauplin <lucainp@gmail.com>
-
- 03 Oct, 2024 1 commit
-
-
Nicolas Patry authored
* New release 2.3.1 * Update doc number
-
- 20 Sep, 2024 1 commit
-
-
Nicolas Patry authored
* Preparing for release. * Upgrade version in docs.
-
- 19 Sep, 2024 1 commit
-
-
Nicolas Patry authored
* Stream options. * Fetch stuff from nix integration test for easier testing. * Adding the assert. * Only send the usage when asked for. * Update the docs. * Impure test because we need network. * develop. * Optional usage. * Fixes. * Workflow
-
- 29 Aug, 2024 1 commit
-
-
drbh authored
* feat: add /v1/models endpoint * feat: add /v1/models endpoint * fix: remove unused type import * fix: revert route typo * fix: update docs with new endpoint * fix: add to redocly ignore and lint
-
- 27 Aug, 2024 1 commit
-
-
drbh authored
* fix[router]: Fix tools not passed in chat template Signed-off-by:
GitHub <noreply@github.com> * feat: improve default tool serialization and lints * feat: refactor tool logic to include notify_error in prompt and adjust typing * fix: adjust non tool template apply * fix: simplify tool grammar logic and improve schema * feat: avoid skip tool test and avoid empty tool prompts * fix: increase test client timeout for grammar compilation tests --------- Signed-off-by:
GitHub <noreply@github.com> Co-authored-by:
Simone Rossi <simone.rossi.93@gmail.com>
-
- 16 Aug, 2024 2 commits
-
-
Nicolas Patry authored
-
Vaibhav Srivastav authored
* Improve the Consuming TGI docs. * Fix erronous update to . * add info about Open AI client. * More updates. * Apply suggestions from code review Co-authored-by:
Erik Kaunismäki <erik.kaum@gmail.com> * Suggestions from Lucain. * Update Gradio snippet. * Up. * Apply suggestions from code review Co-authored-by:
Lucain <lucainp@gmail.com> * Update docs/source/basic_tutorials/consuming_tgi.md Co-authored-by:
Lucain <lucainp@gmail.com> * Up. * Apply suggestions from code review Co-authored-by:
Omar Sanseviero <osanseviero@gmail.com> * Up. * Up. * Doc review from Nico. * Doc review from Nico. x2 * Last nit --------- Co-authored-by:
Erik Kaunismäki <erik.kaum@gmail.com> Co-authored-by:
Lucain <lucainp@gmail.com> Co-authored-by:
Omar Sanseviero <osanseviero@gmail.com>
-
- 12 Aug, 2024 1 commit
-
-
drbh authored
* fix: improve completions to send a final chunk with usage details * fix: include finish reason string * fix: remove dev debug trait and unneeded mut * fix: update openapi schema
-
- 09 Aug, 2024 2 commits
-
-
drbh authored
* feat: add guideline to chat request and template * fix: add template test and update docs
-
Nicolas Patry authored
* Using an enum for flash backens (paged/flashdecoding/flashinfer) * Early exit on server too. * Clippy. * Fix clippy and fmt.
-
- 08 Aug, 2024 1 commit
-
-
Vaibhav Srivastav authored
* Update Quantization docs and minor doc fix. * update readme with latest quants info * Apply suggestions from code review Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * up --------- Co-authored-by:
Pedro Cuenca <pedro@huggingface.co>
-
- 31 Jul, 2024 1 commit
-
-
Nicolas Patry authored
* wip wip refacto refacto Initial setup for CXX binding to TRTLLM Working FFI call for TGI and TRTLLM backend Remove unused parameters annd force tokenizer name to be set Overall build TRTLLM and deps through CMake build system Enable end to end CMake build First version loading engines and making it ready for inference Remembering to check how we can detect support for chunked context Move to latest TensorRT-LLM version Specify which default log level to use depending on CMake build type make leader executor mode working unconditionally call InitializeBackend on the FFI layer bind to CUDA::nvml to retrieve compute capabilities at runtime updated logic and comment to detect cuda compute capabilities implement the Stream method to send new tokens through a callback use spdlog release 1.14.1 moving forward update trtllm to latest version a96cccafcf6365c128f004f779160951f8c0801c correctly tell cmake to build dependent tensorrt-llm required libraries create cmake install target to put everything relevant in installation folder add auth_token CLI argument to provide hf hub authentification token allow converting huggingface::tokenizers error to TensorRtLlmBackendError use correct include for spdlog include guard to build example in cmakelists working setup of the ffi layer remove fmt import use external fmt lib end to end ffi flow working make sure to track include/ffi.h to trigger rebuild from cargo impl the rust backend which currently cannot move the actual computation in background thread expose shutdown function at ffi layer impl RwLock scenario for TensorRtLllmBackend oops missing c++ backend definitions compute the number of maximum new tokens for each request independently make sure the context is not dropped in the middle of the async decoding. remove unnecessary log add all the necessary plumbery to return the generated content update invalid doc in cpp file correctly forward back the log probabilities remove unneeded scope variable for now refactor Stream impl for Generation to factorise code expose the internal missing start/queue timestamp forward tgi parameters rep/freq penalty add some more validation about grammar not supported define a shared struct to hold the result of a decoding step expose information about potential error happening while decoding remove logging add logging in case of decoding error make sure executor_worker is provided add initial Dockerfile for TRTLLM backend add some more information in CMakeLists.txt to correctly install executorWorker add some more information in CMakeLists.txt to correctly find and install nvrtc wrapper simplify prebuilt trtllm libraries name definition do the same name definition stuff for tensorrt_llm_executor_static leverage pkg-config to probe libraries paths and reuse new install structure from cmake fix bad copy/past missing nvinfer linkage direction align all the linker search dependency add missing pkgconfig folder for MPI in Dockerfile correctly setup linking search path for runtime layer fix missing / before tgi lib path adding missing ld_library_path for cuda stubs in Dockerfile update tgi entrypoint commenting out Python part for TensorRT installation refactored docker image move to TensorRT-LLM v0.11.0 make docker linter happy with same capitalization rule fix typo refactor the compute capabilities detection along with num gpus update TensorRT-LLM to latest version update TensorRT install script to latest update build.rs to link to cuda 12.5 add missing dependant libraries for linking clean up a bit install to decoder_attention target add some custom stuff for nccl linkage fix envvar CARGO_CFG_TARGET_ARCH set at runtime vs compile time use std::env::const::ARCH make sure variable live long enough... look for cuda 12.5 add some more basic info in README.md * Rebase. * Fix autodocs. * Let's try to enable trtllm backend. * Ignore backends/v3 by default. * Fixing client. * Fix makefile + autodocs. * Updating the schema thing + redocly. * Fix trtllm lint. * Adding pb files ? * Remove cargo fmt temporarily. * ? * Tmp. * Remove both check + clippy ? * Backporting telemetry. * Backporting 457fb0a1 * Remove PB from git. * Fixing PB with default member backends/client * update TensorRT-LLM to latest version * provided None for api_key * link against libtensorrt_llm and not libtensorrt-llm --------- Co-authored-by:
OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com> Co-authored-by:
Morgan Funtowicz <morgan@huggingface.co>
-
- 23 Jul, 2024 1 commit
-
-
Nicolas Patry authored
* Preparing for release. * Updating docs. * Fixing token within the docker image for the launcher.
-
- 19 Jul, 2024 1 commit
-
-
drbh authored
* fix: adjust default tool choice * feat: improve tool choice syntax and response parsing/errors * fix: remove dev tests * feat: add ToolChoice to docs
-
- 09 Jul, 2024 2 commits
-
-
Nicolas Patry authored
* Updating the self check * Fix. * Revert the CLI . * cli. * Space. * Revert cargo update.
-
Nicolas Patry authored
-
- 05 Jul, 2024 1 commit
-
-
Nicolas Patry authored
* Refactor dead code. * First working step. * Remove a lot of duplicated code. * More dead code. * More cleanup. * Fix Santacoder test. * Fixing the simple tests. * Fixing sharding. * Fixes for VLM. * Fixing santacoder (num_kv_heads hardcoded). * Removing more dead code. * Fixing `config.n_head`. * Stopping earlier because of `<end_of_utterance>` in idefics2. * Addresses comments. * Removing the dead code. * Fuse back mistral into FlashCausalLM. * Finish removal. * Fixing docs + causal_lm `batch_class`. * Fixing docs + causal.lm. * Add default to Gemma Causality. * Default value for gemma/gemma2. * Wrong default.
-
- 03 Jul, 2024 4 commits
-
-
Nicolas Patry authored
* Fixing missing `object` field for regular completions. * Fixing docs by re-adding missing `Prompt`.
-
Nicolas Patry authored
This reverts commit 2bbb7fa4.
-
Nicolas Patry authored
-
drbh authored
* feat: add pre commit step to force schema update when router changes * fix: prefer improved update_doc and start server and compare * fix: adjust typo * fix: adjust revert typo * fix: update workflow to use update_doc md command * feat: improve workflow to check openapi schema too * fix: adjust timeout for CI * fix: adjust raise condition and install server in ci * fix: install protoc before server * feat: improve update doc and add command to print router schema * fix: adjust autodoc workflow * fix: explicitly install protoc and python * fix: alllow trailing space in openapi schema diff
-
- 23 May, 2024 1 commit
-
-
Thomas Schillaci authored
# What does this PR do? - Add the stop parameter to the completion route - Add the completion method to the python client - Add the stop parameter to the python client's chat method ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation ). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. @Narsil --------- Co-authored-by:
Thomas SCHILLACI <tschilla@px101.prod.exalead.com> Co-authored-by:
Thomas Schillaci <thomas.schillaci@3ds.com>
-
- 18 Apr, 2024 2 commits
-
-
OlivierDehaene authored
-
Nicolas Patry authored
-
- 12 Apr, 2024 1 commit
-
-
OlivierDehaene authored
-
- 29 Mar, 2024 1 commit
-
-
OlivierDehaene authored
-
- 22 Mar, 2024 1 commit
-
-
OlivierDehaene authored
-
- 28 Feb, 2024 1 commit
-
-
OlivierDehaene authored
-
- 21 Feb, 2024 3 commits
-
-
OlivierDehaene authored
-
OlivierDehaene authored
-
OlivierDehaene authored
-
- 16 Feb, 2024 2 commits
-
-
OlivierDehaene authored
-
OlivierDehaene authored
-
- 26 Jan, 2024 2 commits
-
-
OlivierDehaene authored
-
Nicolas Patry authored
-