Commits · ce28ee88d59093dba3daf3db710feb39f930ec2c · OpenDAS / text-generation-inference

14 Oct, 2024 1 commit

Small fixes for supported models (#2471) · ce28ee88

Omar Sanseviero authored Oct 14, 2024



* Small improvements for docs

* Update _toctree.yml

* Updating the doc (we keep the list actually).

---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

ce28ee88

06 Sep, 2024 1 commit

Add links to Adyen blogpost (#2500) · aaea212d

Martin Iglesias Goyanes authored Sep 06, 2024



* Add links to Adyen blogpost

* Adding to toctree.

* Update external.md

* Update _toctree.yml

---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

aaea212d

16 Aug, 2024 1 commit

doc: Add metrics documentation and add a 'Reference' section (#2230) · 53729b74

Hugo Larcher authored Aug 16, 2024



* doc: Add metrics documentation and add a 'Reference' section

* doc: Add API reference

* doc: Refactor API reference

* fix: Message API link

* Bad rebase

* Moving the docs.

---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

53729b74

19 Jul, 2024 1 commit
- add usage stats to toctree (#2260) · 40f5dc3e
  Erik Kaunismäki authored Jul 19, 2024
```
quick fix
```
  40f5dc3e
08 Jul, 2024 1 commit
- add doc for intel gpus (#2181) · 07e240ca
  Wang, Yi authored Jul 08, 2024
```
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
```
  07e240ca
25 Jun, 2024 1 commit

Enable multiple LoRa adapters (#2010) · 04e1af94

drbh authored Jun 25, 2024



* feat: first draft load multiple lora

* feat: load weights within layer and refactor lora pass

* fix: refactor and reduce lora math

* feat: baseline impl single request multi lora support

* feat: prefer lorax implementation and port loading logic

* fix: prefer adapter_data and refactors

* feat: perfer loraxs custom punica kernels and add mlp loras

* fix: adjust batch for bgmv

* fix: adjust adapter_segments logic when in batch

* fix: refactor and move changes to v3 proto

* fix: pass model_id for all flash causal lms

* fix: pass model_id for all causal and seq2seq lms

* fix: add model_id to model test

* feat: add lora support to mistral and refactors

* feat: prefer model id in request

* fix: include rust code for adapter id

* feat: bump launcher and add new lora docs

* feat: support base model generation and refactors

* fix: rename doc to retry ci build

* feat: support if vlm models

* fix: add adapter_data param and avoid missing layers

* fix: add adapter_data param to phi and neox

* fix: update all models forwards to include adapter_data

* fix: add model_id to IdeficsCausalLM

* Update lora.md

Fixed a typo

* Update lora.md

Fixing spam image

* fix: add lora kernel to dockerfile, support running without kernels and refactors

* fix: avoid dockerfile conflict

* fix: refactors and adjust flash llama lora logic

* fix: skip llama test due to CI issue (temp)

* fix: skip llama test CI (temp) 2

* fix: revert skips and prefer updated ci token for tests

* fix: refactors and helpful comments

* fix: add noop in TensorParallelAdapterRowLinear too

* fix: refactor and move shard_lora_weights logic

* fix: exit early if no adapter_data

---------
Co-authored-by: Derek <datavistics@gmail.com>

04e1af94

14 Jun, 2024 1 commit

Adding architecture document (#2044) · 445f3135

Alvaro Moran authored Jun 14, 2024



* doc: adding architecture document

* doc: add architecture to toctree

* fix: avoid cargo lock changes

* fix: avoid cargo lock tweak

---------
Co-authored-by: drbh <david.richard.holtz@gmail.com>

445f3135

23 May, 2024 1 commit

feat: add train medusa head tutorial (#1934) · a103e3e9

drbh authored May 23, 2024



This PR adds a tutorial to self distill and train medusa heads for a
specific model

---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

a103e3e9

17 May, 2024 2 commits

Add TGI monitoring guide through Grafana and Prometheus (#1908) · c4cf8b49
fxmarty authored May 17, 2024
```
As per title. It is very useful.
```
c4cf8b49

MI300 compatibility (#1764) · 232e8d52

fxmarty authored May 17, 2024

Adds support for AMD Instinct MI300 in TGI.

Most changes are:
* Support PyTorch TunableOp to pick the GEMM/GEMV kernels for decoding
https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/cuda/tunable.
TunableOp is disabled by default, and can be enabled with
`PYTORCH_TUNABLEOP_ENABLED=1`.
* Update ROCm dockerfile to PyTorch 2.3 (actually patched with changes
from https://github.com/pytorch/pytorch/pull/124362)
* Support SILU & Linear custom kernels contributed by AMD
* Update vLLM paged attention to https://github.com/fxmarty/rocm-vllm/,
branching out of a much more recent commit
https://github.com/ROCm/vllm/commit/3489ce7936c5de588916ae3047c44c23c0b0c308


* Support FA2 Triton kernel as recommended by AMD. Can be used by
specifying `ROCM_USE_FLASH_ATTN_V2_TRITON=1`.
* Update dockerfile to ROCm 6.1

By default, TunableOp tuning results are saved in `/data` (e.g.
`/data/tunableop_meta-llama-Llama-2-70b-chat-hf_tp1_rank0.csv`) in order
to avoid to have to rerun the tuning at each `docker run`.

Example:
```
Validator,PT_VERSION,2.3.0
Validator,ROCM_VERSION,6.1.0.0-82-5fabb4c
Validator,HIPBLASLT_VERSION,0.7.0-1549b021
Validator,GCN_ARCH_NAME,gfx942:sramecc+:xnack-
Validator,ROCBLAS_VERSION,4.1.0-cefa4a9b-dirty
GemmTunableOp_Half_TN,tn_8192_7_28672,Gemm_Rocblas_45475,0.132098
GemmTunableOp_Half_TN,tn_10240_4_8192,Gemm_Rocblas_45546,0.0484431
GemmTunableOp_Half_TN,tn_32000_6_8192,Default,0.149546
GemmTunableOp_Half_TN,tn_32000_3_8192,Gemm_Rocblas_45520,0.147119
GemmTunableOp_Half_TN,tn_8192_3_28672,Gemm_Rocblas_45475,0.132645
GemmTunableOp_Half_TN,tn_10240_3_8192,Gemm_Rocblas_45546,0.0482971
GemmTunableOp_Half_TN,tn_57344_5_8192,Gemm_Rocblas_45520,0.255694
GemmTunableOp_Half_TN,tn_10240_7_8192,Gemm_Rocblas_45517,0.0482522
GemmTunableOp_Half_TN,tn_8192_3_8192,Gemm_Rocblas_45546,0.0444671
GemmTunableOp_Half_TN,tn_8192_5_8192,Gemm_Rocblas_45546,0.0445834
GemmTunableOp_Half_TN,tn_57344_7_8192,Gemm_Rocblas_45520,0.25622
GemmTunableOp_Half_TN,tn_8192_2_28672,Gemm_Rocblas_45475,0.132122
GemmTunableOp_Half_TN,tn_8192_4_8192,Gemm_Rocblas_45517,0.0453191
GemmTunableOp_Half_TN,tn_10240_5_8192,Gemm_Rocblas_45517,0.0482514
GemmTunableOp_Half_TN,tn_8192_5_28672,Gemm_Rocblas_45542,0.133914
GemmTunableOp_Half_TN,tn_8192_2_8192,Gemm_Rocblas_45517,0.0446516
GemmTunableOp_Half_TN,tn_8192_1_28672,Gemm_Hipblaslt_TN_10814,0.131953
GemmTunableOp_Half_TN,tn_10240_2_8192,Gemm_Rocblas_45546,0.0481043
GemmTunableOp_Half_TN,tn_32000_4_8192,Gemm_Rocblas_45520,0.147497
GemmTunableOp_Half_TN,tn_8192_6_28672,Gemm_Rocblas_45529,0.134895
GemmTunableOp_Half_TN,tn_57344_2_8192,Gemm_Rocblas_45520,0.254716
GemmTunableOp_Half_TN,tn_57344_4_8192,Gemm_Rocblas_45520,0.255731
GemmTunableOp_Half_TN,tn_10240_6_8192,Gemm_Rocblas_45517,0.0484816
GemmTunableOp_Half_TN,tn_57344_3_8192,Gemm_Rocblas_45520,0.254701
GemmTunableOp_Half_TN,tn_8192_4_28672,Gemm_Rocblas_45475,0.132159
GemmTunableOp_Half_TN,tn_32000_2_8192,Default,0.147524
GemmTunableOp_Half_TN,tn_32000_5_8192,Default,0.147074
GemmTunableOp_Half_TN,tn_8192_6_8192,Gemm_Rocblas_45546,0.0454045
GemmTunableOp_Half_TN,tn_57344_6_8192,Gemm_Rocblas_45520,0.255582
GemmTunableOp_Half_TN,tn_32000_7_8192,Default,0.146705
GemmTunableOp_Half_TN,tn_8192_7_8192,Gemm_Rocblas_45546,0.0445489
```

---------
Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>

232e8d52

01 May, 2024 1 commit

fix: split docs and start conceptual page (#1836) · 6073ece4

drbh authored May 01, 2024

This PR improves the guidance docs and adds a section that explains how
grammars are applied on a technical level

6073ece4

30 Apr, 2024 1 commit
- feat: add vlm docs and simple examples (#1812) · b2c98275
  drbh authored Apr 30, 2024
```
This PR start to add documentation for visual language models
```
  b2c98275
05 Apr, 2024 1 commit

Pickle conversion now requires `--trust-remote-code`. (#1704) · c7e570e5

Nicolas Patry authored Apr 05, 2024

# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->

c7e570e5

28 Feb, 2024 3 commits

v1.4.3 (#1609) · e6bb3ff8
OlivierDehaene authored Feb 28, 2024

e6bb3ff8

Fixing x-compute-time. (#1606) · 910d0a90

Nicolas Patry authored Feb 28, 2024

# What does this PR do?

It was meant to be in seconds float
<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->

910d0a90

Support tools (#1587) · 9b6db5f7

drbh authored Feb 28, 2024

This work in progress PR begins to add support for tools. Tools relies
on grammar support and still has some unsolved challenges. Opening the
PR for visibility and feedback

9b6db5f7

24 Jan, 2024 1 commit

Add messages api compatibility docs (#1478) · 7872b8c5

drbh authored Jan 24, 2024

This PR adds a new page to the docs that describes the Messages API and
how to use it.

Additionally this page will contain cloud provider specific information
for enabling and using this feature. This PR includes a SageMaker
example/information.

7872b8c5

27 Sep, 2023 1 commit

Automatic docs for TGI (#1045) · 259a2300

Merve Noyan authored Sep 27, 2023



I had to open this PR since I initially worked from my fork, and it
requires a handful of work to trigger a new github action on my fork's
specific branch (couldn't find a way, at least, despite trying all of
them).

---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

259a2300

12 Sep, 2023 3 commits

Unsupported model serving docs (#906) · c8a01d75

Merve Noyan authored Sep 12, 2023


Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: Mishig <mishig.davaadorj@coloradocollege.edu>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>

c8a01d75

Quantization docs (#911) · e9ae6786

Merve Noyan authored Sep 12, 2023


Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

e9ae6786

Tensor Parallelism conceptual guide (#886) · 1f69fb9e

Merve Noyan authored Sep 12, 2023


Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

1f69fb9e

08 Sep, 2023 1 commit
- Paged Attention Conceptual Guide (#901) · 30a93a0d
  Merve Noyan authored Sep 08, 2023
  
  30a93a0d
07 Sep, 2023 1 commit

Safetensors conceptual guide (#905) · af1ed38f

Merve Noyan authored Sep 07, 2023

IDK what else to add in this guide, I looked for relevant code in TGI
codebase and saw that it's used in quantization as well (maybe I could
add that?)

af1ed38f

06 Sep, 2023 1 commit

docs: Flash Attention Conceptual Guide (#892) · f260eb72

Merve Noyan authored Sep 06, 2023



PR for conceptual guide on flash attention. I will add more info unless
I'm told otherwise.

---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

f260eb72

18 Aug, 2023 1 commit

Add streaming guide (#858) · bfa07061

Omar Sanseviero authored Aug 18, 2023


Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Merve Noyan <merveenoyan@gmail.com>

bfa07061

10 Aug, 2023 2 commits

Added CLI docs (#799) · e58ad6dd
Merve Noyan authored Aug 10, 2023
```
Added docs for CLI 
```
e58ad6dd

Setup for doc-builder and docs for TGI (#740) · 647ae7a7

Merve Noyan authored Aug 10, 2023



I added ToC for docs v1 & started setting up for doc-builder. cc @Narsil
@osanseviero

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: osanseviero <osanseviero@gmail.com>
Co-authored-by: Mishig <mishig.davaadorj@coloradocollege.edu>

647ae7a7