Commits · bd3a9d8e856cb7e2122f1a09d2fb0f44b7649dad · OpenDAS / text-generation-inference

"vscode:/vscode.git/clone" did not exist on "ce2ae984a19d6c52c1ee3d799834f42d4d38490c"

23 Jun, 2023 3 commits
- fix(router): add timeout on flume sends (#488) · bd3a9d8e
  OlivierDehaene authored Jun 23, 2023
  
  bd3a9d8e
- feat(server): Adding new ignore_rule for conversion. (#485) · 776d150c
  Nicolas Patry authored Jun 23, 2023
  
  776d150c
- feat(server): Update convert logic. (#483) · 49b4b33e
  Nicolas Patry authored Jun 23, 2023
```
Should be more robust to shared tensors (ok when using
      `from_pretrained). But forcing us to add new checks in our loading
      code (since the chosen key to keep might be different from
      `transformers`).

---------
Co-authored-by: Ubuntu <ubuntu@ip-172-31-41-161.ec2.internal>
```
  49b4b33e
20 Jun, 2023 2 commits
- fix(server): Fixing T5 in case the names are mixed up. (#475) · c9c65ab3
  Nicolas Patry authored Jun 20, 2023
  
  c9c65ab3
- fix(server): fix warpers on CPU (#472) · 53aa9194
  OlivierDehaene authored Jun 20, 2023
```
Closes #471
```
  53aa9194
19 Jun, 2023 1 commit
- feat(server): improve flash attention import errors (#465) · ece7ffa4
  OlivierDehaene authored Jun 19, 2023
```
@lewtun, is this enough?

Closes #458
Closes #456
```
  ece7ffa4
16 Jun, 2023 1 commit
- feat(router): add ngrok integration (#453) · f59fb8b6
  OlivierDehaene authored Jun 16, 2023
  
  f59fb8b6
12 Jun, 2023 3 commits

feat(server): pre-allocate past key values for flash causal LM (#412) · 5ce89059
OlivierDehaene authored Jun 12, 2023

5ce89059

fix(makefile): Fix typo and use POSIX comparison in the makefile (#443) · ca650e5b

sayf eddine hammemi authored Jun 12, 2023

# What does this PR do?

This PR fixes:
- The usage of non posix comparison which may fail depending on the
shell used (`=` will always work, `==` only with bash)
- Typo in the env variable name displayed in the error message
`BUILD_EXTENSION` instead of `BUILD_EXTENSIONS`

<!-- Remove if not applicable -->

Fixes #422

ca650e5b

docs(launcher): fix CUDA_VISIBLE_DEVICES helper comment (#441) · d4eb60f4

A.J authored Jun 12, 2023

# What does this PR do?
It solves a typo in the comment sections referencing the environment
variable `CUDA_VISIBLE_DEVICES`. No misspelling references to this
variable have been found in code logic leading to undefined behaviour or
bugs. This PR is not expected to perform any code logic modification.

d4eb60f4

09 Jun, 2023 1 commit
- feat(server): optimize dist ops (#434) · e496c9ba
  OlivierDehaene authored Jun 09, 2023
  
  e496c9ba
08 Jun, 2023 1 commit

feat(server): Rework model loading (#344) · abd58ff8

Nicolas Patry authored Jun 08, 2023

# What does this PR do?

Reworked the loading logic. Idea is to use cleaner loading code:

- Remove need for `no_init_weights`
- Remove all weird `bnb_linear` and `load_weights` and
`post_load_weights`.

New code layout:

- New class `Weights` in charge of handling loading the weights from
multiple files into appropiate tensors (potentially sharded)
- TP layers now are "shells", they contain the code to know what kind of
sharding we need + eventual `all_reduce`. They do not inherit from
linear, but they contain some kind of Linear instead
- the contained linear can be either FastLinear, BnbLinear or GPTq
Linear next.
- All modeling code is explictly made for sharding, process group is
just no-ops for non sharded code (removes a lot of test cases)

![Screenshot from 2023-05-19
23-19-59](https://github.com/huggingface/text-generation-inference/assets/204321/9a802654-74a3-488c-87a8-073743a6143f

)

---------
Co-authored-by: Ubuntu <ubuntu@ip-172-31-41-161.taildb5d.ts.net>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-41-161.ec2.internal>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>

abd58ff8

05 Jun, 2023 2 commits
- chore: update openapi schema · 19c41824
  OlivierDehaene authored Jun 05, 2023
  
  19c41824
- feat(server): batch tokenization for flash causal lm (#411) · 6abec14a
  OlivierDehaene authored Jun 05, 2023
  
  6abec14a
02 Jun, 2023 3 commits
- feat(server): only compute prefill logprobs when asked (#406) · 895c5f15
  OlivierDehaene authored Jun 02, 2023
```
Close #288
```
  895c5f15
- feat(launcher): parse oom signal (#404) · 83b84486
  OlivierDehaene authored Jun 02, 2023
  
  83b84486
- feat(sagemaker): add trust remote code to entrypoint (#394) · 62fc4010
  OlivierDehaene authored Jun 02, 2023
  
  62fc4010
01 Jun, 2023 4 commits
- v0.8.2 · e7248fe9
  OlivierDehaene authored Jun 01, 2023
  
  e7248fe9
- feat(server): load santacoder/starcoder models with safetensors (#393) · 95d35469
  OlivierDehaene authored Jun 01, 2023
```
Fix #366
```
  95d35469
- feat(server): remove trust_remote_code requirement for falcon models (#396) · c0928e6f
  OlivierDehaene authored Jun 01, 2023
  
  c0928e6f
- fix(server): fix has_position_ids (#395) · d69a0633
  OlivierDehaene authored Jun 01, 2023
```
Fix #389
```
  d69a0633
31 May, 2023 4 commits
- v0.8.1 · db2ebe39
  OlivierDehaene authored May 31, 2023
  
  db2ebe39
- fix(server): fix bnb quantization for CausalLM models (#385) · 337afb28
  OlivierDehaene authored May 31, 2023
  
  337afb28
- feat(server): add retry on download (#384) · 87dc034b
  OlivierDehaene authored May 31, 2023
  
  87dc034b
- increase health checks · 444400b4
  OlivierDehaene authored May 31, 2023
  
  444400b4
30 May, 2023 5 commits
- v0.8.0 · 081b9265
  OlivierDehaene authored May 30, 2023
  
  081b9265
- feat(server): support RefinedWeb models (#379) · b8b950b3
  OlivierDehaene authored May 30, 2023
  
  b8b950b3
- fix(server): fix quantization · bf7f1d54
  OlivierDehaene authored May 30, 2023
  
  bf7f1d54
- fix(launcher): parse num cuda devices from CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES · 49a6c8c1
  OlivierDehaene authored May 30, 2023
  
  49a6c8c1
- fix(launcher): parse num cuda devices from CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES · 146e72c3
  OlivierDehaene authored May 30, 2023
  
  146e72c3
26 May, 2023 2 commits
- Fix issue when load AutoModelForSeq2SeqLM model (#370) · 5fde8d99
  CL-Shang authored May 26, 2023
  
  5fde8d99
- feat(server): support vectorized warpers in flash causal lm (#317) · 62f91f78
  OlivierDehaene authored May 26, 2023
```
Co-authored-by: Joel Lamy-Poirier <joel.lamy-poirier@servicenow.com>
```
  62f91f78
25 May, 2023 1 commit
- feat(benchmarker): add summary tables (#368) · 951930fb
  OlivierDehaene authored May 25, 2023
  
  951930fb
24 May, 2023 1 commit
- feat: decrease IPC proto size (#367) · 218c9ada
  OlivierDehaene authored May 24, 2023
```
Closes #307 #308
```
  218c9ada
23 May, 2023 6 commits
- v0.7.0 (#353) · d31562f3
  OlivierDehaene authored May 23, 2023
  
  d31562f3
- feat(router): log input/ouput at debug level (#364) · 94200538
  OlivierDehaene authored May 23, 2023
```
@njhill FYI
```
  94200538
- feat(server): support trust_remote_code (#363) · e3e487dc
  OlivierDehaene authored May 23, 2023
  
  e3e487dc
- feat(server): do not use device_map auto on single GPU (#362) · e9669a40
  OlivierDehaene authored May 23, 2023
  
  e9669a40
- feat(server): support fp16 for t5 (#360) · cfaa8580
  OlivierDehaene authored May 23, 2023
```
Fixes #349
```
  cfaa8580
- chore(sever): update requirements (#357) · 94377efa
  OlivierDehaene authored May 23, 2023
```
Fixes #338
```
  94377efa