- 22 Nov, 2024 1 commit
-
-
OlivierDehaene authored
* chore: prepare 2.4.1 release * fix tests * fmt
-
- 21 Nov, 2024 2 commits
-
-
OlivierDehaene authored
* feat: add payload limit * update launcher
-
Lucain authored
-
- 19 Nov, 2024 1 commit
-
-
drbh authored
* add OpenAI like tool_choice for named choice * add tests * fix: run linter and bump api docs * fix: consolidate changes and remove old tool type * feat: improve, simplify and rename tool choice struct add required support and refactor * fix: simplify tool choice logic, improve tests, openapi and rust docs * fix: refactor away prepare_chat_input and improve tool grammar apply control flow * feat: update docs and add tool choice configuration section * fix: simplify naming, tool choice default and improve test * fix: adjust tool choice none logic, add test and small refactors * fix: add missing snapshot file * fix: adjust tool choice type in test * fix: adjust default when json tool choice is * fix: remove trailing space lint after rebase * fix: remove mostly mocked unit test --------- Co-authored-by:Linus Bierhoff <linus.bierhoff@icloud.com>
-
- 15 Nov, 2024 1 commit
-
-
jito authored
Signed-off-by:jitokim <pigberger70@gmail.com>
-
- 10 Nov, 2024 1 commit
-
-
Daniël de Kok authored
compressed-tensors is a safetensors extension for sparse, quantized tensors. The format is more powerful than earlier AWQ/GPTQ/FP8 quantization, because - Different quantizer configurations can be used for different targets. - The format can specify input/output quantizers in addition to weight quantizers. - Configurable exclusions for quantization. This change adds a dependency on the `compressed-tensors` package for its configuration parsing and layer matching functionality. The following types of quantization are supported in this PR: - W8A16 and W4A16 INT using GPTQ-Marlin kernels. - W8A8 and W8A16 FP using FP8-Marlin and cutlass kernels. Support for other quantization types will be added in subsequent PRs.
-
- 04 Nov, 2024 1 commit
-
-
drbh authored
-
- 30 Oct, 2024 1 commit
-
-
drbh authored
* feat: add support for qwen2 vl model * feat: fix token padding, enable warmup and process basic request * fix: improve get_position_ids, add lift embed_tokens * fix: remove get_cos_sin_hack dev function * feat: add simple test chat with meesage and text * fix: lint test * fix: adjust positional embeddings for multi dimensional position ids * fix: update docs and lint unused vars * fix: include linted file * fix: add norm after text output * fix: format model file * fix: adjust for ruff lints * fix: remove unused rotate_half * feat: refactors and calc num features * fix: prefer position_ids passed from vlm causal lm and reset ids on batch * fix: adjust get_position_ids if not available and add required args to signatures * fix: adjust resize case for qwen2_vl warmup * fix: avoid qwen2 vl specific paths with qwen2
-
- 28 Oct, 2024 1 commit
-
-
Nicolas Patry authored
* Choosing input/total tokens automatically based on available VRAM? * Update doc. * Remove generated files. * Trying to fix non chunking targets. * Attempt #2 * fix. * QuantLinear is rocm compatible. * Much simpler logic after the overhead. * Updating logic + non flash. * Revert doc text. * Simple updates. * Fix integration mt0 (transformers update).
-
- 25 Oct, 2024 1 commit
-
-
OlivierDehaene authored
-
- 23 Oct, 2024 2 commits
-
-
OlivierDehaene authored
* feat: allow any supported payload on /invocations * update openAPI * update doc
-
OlivierDehaene authored
* feat: natively support Granite models * Update doc
-
- 17 Oct, 2024 1 commit
-
-
Daniël de Kok authored
* Support `e4m3fn` KV cache * Make check more obvious
-
- 15 Oct, 2024 1 commit
-
-
Nicolas Patry authored
-
- 14 Oct, 2024 2 commits
-
-
Omar Sanseviero authored
Update quicktour.md
-
Omar Sanseviero authored
* Small improvements for docs * Update _toctree.yml * Updating the doc (we keep the list actually). --------- Co-authored-by:Nicolas Patry <patry.nicolas@protonmail.com>
-
- 10 Oct, 2024 1 commit
-
-
vb authored
Update to most recent stable version of TGI.
-
- 08 Oct, 2024 1 commit
-
-
drbh authored
* Update ToolType input schema * lint * fix: run formatter * fix: allow tool choide to be null --------- Co-authored-by:Wauplin <lucainp@gmail.com>
-
- 04 Oct, 2024 1 commit
-
-
Daniël de Kok authored
* Add basic FP8 KV cache support This change adds rudimentary FP8 KV cache support. The support is enabled by passing `--kv-cache-dtype fp8_e5m2` to the launcher. Doing so uses this type for the KV cache. However support is still limited: * Only the `fp8_e5m2` type is supported. * The KV cache layout is the same as `float16`/`bfloat16` (HND). * The FP8 KV cache is only supported for FlashInfer. * Loading of scales is not yet supported. * Fix Cargo.toml
-
- 03 Oct, 2024 2 commits
-
-
Nicolas Patry authored
* New release 2.3.1 * Update doc number
- 02 Oct, 2024 3 commits
-
-
drbh authored
* feat: unroll notify_error if no tool is choosen * fix: expect simple message when no tool is selected * fix: improve test to avoid notify_error * fix: improve docs and indicate change in expected response * fix: adjust linting in test file
-
drbh authored
allow revision for lora adapters from launcher Co-authored-by:
Sida <sida@kulamind.com> Co-authored-by:
teamclouday <teamclouday@gmail.com>
-
Nicolas Patry authored
* Working loading state. * Preprocessing. * Working state ? (Broke idefics1 temporarily). * Cleaner condition. * Fix idefics. * Updating config, removing TODO * Mllama * Ugrade transformers 4.45 * Flashing mllama. * Starting to get there. * Working state. * Integrations tests for mllama (cutting to 10 tokens because there seems' to be instability after (meaning size of the batch matters. * Updating model link. * Earlier assert. * Fix vlm ? * remove log. * Force ignore all images but last. * Default dtype bfloat16. * Update integration test after switch to bf16. * Remove dead code. * Removed dead code. * Upgrade the flake to latest transformers/tokenizers * Move to hf tgi-nix * Upgrade to 0.5.0
-
- 30 Sep, 2024 3 commits
-
-
drbh authored
* feat: support phi3.5 moe model loading * fix: prefer llama base model and improve rotary logic * feat: return reasonable generation and add integration test * fix: run lint and update docs * fix: rerun lint for openapi docs * fix: prefer do_sample false unless temp is set by user, and update chat tests * fix: small typo adjustments * fix: consolidate long rope paths * fix: revert greedy by default and test changes * Vendor configuration so that we don't have to `trust_remote_code` * Use SparseMoELayer * Add support for dense MoE * Some type annotations * Add the usual model tests * Ruff. --------- Co-authored-by:
Daniël de Kok <me@danieldk.eu> Co-authored-by:
Nicolas Patry <patry.nicolas@protonmail.com>
-
Mohit Sharma authored
* style * update torch * ix issues * fix clone * revert mkl * added custom PA * style * fix style * style * hide env vart * fix mixtral model * add skinny kernel and merge fixes * fixed style * fix issue for sliding window models * addressed review comments * fix import * improved error messag * updated default value * remove import * fix imports after rebase * float16 dep * improve dockerfile * cleaned dockerfile
-
Ikram Ul Haq authored
-
- 24 Sep, 2024 2 commits
-
-
Nicholas Broad authored
specify how to call local adapters
-
Aritra Roy Gosthipaty authored
* chore: adding note for private models in quicktour doc * Update docs/source/quicktour.md Co-authored-by:
Omar Sanseviero <osanseviero@gmail.com> * Update docs/source/quicktour.md Co-authored-by:
vb <vaibhavs10@gmail.com> * Update docs/source/quicktour.md Co-authored-by:
vb <vaibhavs10@gmail.com> --------- Co-authored-by:
Omar Sanseviero <osanseviero@gmail.com> Co-authored-by:
vb <vaibhavs10@gmail.com>
-
- 20 Sep, 2024 1 commit
-
-
Nicolas Patry authored
* Preparing for release. * Upgrade version in docs.
-
- 19 Sep, 2024 2 commits
-
-
Daniël de Kok authored
-
Nicolas Patry authored
* Stream options. * Fetch stuff from nix integration test for easier testing. * Adding the assert. * Only send the usage when asked for. * Update the docs. * Impure test because we need network. * develop. * Optional usage. * Fixes. * Workflow
-
- 06 Sep, 2024 1 commit
-
-
Martin Iglesias Goyanes authored
* Add links to Adyen blogpost * Adding to toctree. * Update external.md * Update _toctree.yml --------- Co-authored-by:Nicolas Patry <patry.nicolas@protonmail.com>
-
- 05 Sep, 2024 1 commit
-
-
Nicolas Patry authored
-
- 29 Aug, 2024 2 commits
-
-
Wang, Yi authored
* update doc with intel cpu part Signed-off-by:
Wang, Yi A <yi.a.wang@intel.com> * Apply suggestions from code review we do not use latest ever in documentation, it causes too many issues for users. Release number get update on every release. --------- Signed-off-by:
Wang, Yi A <yi.a.wang@intel.com> Co-authored-by:
Nicolas Patry <patry.nicolas@protonmail.com>
-
drbh authored
* feat: add /v1/models endpoint * feat: add /v1/models endpoint * fix: remove unused type import * fix: revert route typo * fix: update docs with new endpoint * fix: add to redocly ignore and lint
-
- 28 Aug, 2024 1 commit
-
-
drbh authored
-
- 27 Aug, 2024 1 commit
-
-
drbh authored
* fix[router]: Fix tools not passed in chat template Signed-off-by:
GitHub <noreply@github.com> * feat: improve default tool serialization and lints * feat: refactor tool logic to include notify_error in prompt and adjust typing * fix: adjust non tool template apply * fix: simplify tool grammar logic and improve schema * feat: avoid skip tool test and avoid empty tool prompts * fix: increase test client timeout for grammar compilation tests --------- Signed-off-by:
GitHub <noreply@github.com> Co-authored-by:
Simone Rossi <simone.rossi.93@gmail.com>
-
- 16 Aug, 2024 2 commits
-
-
Hugo Larcher authored
* doc: Add metrics documentation and add a 'Reference' section * doc: Add API reference * doc: Refactor API reference * fix: Message API link * Bad rebase * Moving the docs. --------- Co-authored-by:Nicolas Patry <patry.nicolas@protonmail.com>
-
Nicolas Patry authored
-