Commits · ca650e5bff1af8c8580c2be0d4dad37dd3285247 · OpenDAS / text-generation-inference

12 Jun, 2023 2 commits

fix(makefile): Fix typo and use POSIX comparison in the makefile (#443) · ca650e5b

sayf eddine hammemi authored Jun 12, 2023

# What does this PR do?

This PR fixes:
- The usage of non posix comparison which may fail depending on the
shell used (`=` will always work, `==` only with bash)
- Typo in the env variable name displayed in the error message
`BUILD_EXTENSION` instead of `BUILD_EXTENSIONS`

<!-- Remove if not applicable -->

Fixes #422

ca650e5b

docs(launcher): fix CUDA_VISIBLE_DEVICES helper comment (#441) · d4eb60f4

A.J authored Jun 12, 2023

# What does this PR do?
It solves a typo in the comment sections referencing the environment
variable `CUDA_VISIBLE_DEVICES`. No misspelling references to this
variable have been found in code logic leading to undefined behaviour or
bugs. This PR is not expected to perform any code logic modification.

d4eb60f4

09 Jun, 2023 1 commit
- feat(server): optimize dist ops (#434) · e496c9ba
  OlivierDehaene authored Jun 09, 2023
  
  e496c9ba
08 Jun, 2023 1 commit

feat(server): Rework model loading (#344) · abd58ff8

Nicolas Patry authored Jun 08, 2023

# What does this PR do?

Reworked the loading logic. Idea is to use cleaner loading code:

- Remove need for `no_init_weights`
- Remove all weird `bnb_linear` and `load_weights` and
`post_load_weights`.

New code layout:

- New class `Weights` in charge of handling loading the weights from
multiple files into appropiate tensors (potentially sharded)
- TP layers now are "shells", they contain the code to know what kind of
sharding we need + eventual `all_reduce`. They do not inherit from
linear, but they contain some kind of Linear instead
- the contained linear can be either FastLinear, BnbLinear or GPTq
Linear next.
- All modeling code is explictly made for sharding, process group is
just no-ops for non sharded code (removes a lot of test cases)

![Screenshot from 2023-05-19
23-19-59](https://github.com/huggingface/text-generation-inference/assets/204321/9a802654-74a3-488c-87a8-073743a6143f)

---------

Co-authored-by: Ubuntu <ubuntu@ip-1...

abd58ff8

05 Jun, 2023 2 commits
- chore: update openapi schema · 19c41824
  OlivierDehaene authored Jun 05, 2023
  
  19c41824
- feat(server): batch tokenization for flash causal lm (#411) · 6abec14a
  OlivierDehaene authored Jun 05, 2023
  
  6abec14a
02 Jun, 2023 3 commits
- feat(server): only compute prefill logprobs when asked (#406) · 895c5f15
  OlivierDehaene authored Jun 02, 2023
```
Close #288
```
  895c5f15
- feat(launcher): parse oom signal (#404) · 83b84486
  OlivierDehaene authored Jun 02, 2023
  
  83b84486
- feat(sagemaker): add trust remote code to entrypoint (#394) · 62fc4010
  OlivierDehaene authored Jun 02, 2023
  
  62fc4010
01 Jun, 2023 4 commits
- v0.8.2 · e7248fe9
  OlivierDehaene authored Jun 01, 2023
  
  e7248fe9
- feat(server): load santacoder/starcoder models with safetensors (#393) · 95d35469
  OlivierDehaene authored Jun 01, 2023
```
Fix #366
```
  95d35469
- feat(server): remove trust_remote_code requirement for falcon models (#396) · c0928e6f
  OlivierDehaene authored Jun 01, 2023
  
  c0928e6f
- fix(server): fix has_position_ids (#395) · d69a0633
  OlivierDehaene authored Jun 01, 2023
```
Fix #389
```
  d69a0633
31 May, 2023 4 commits
- v0.8.1 · db2ebe39
  OlivierDehaene authored May 31, 2023
  
  db2ebe39
- fix(server): fix bnb quantization for CausalLM models (#385) · 337afb28
  OlivierDehaene authored May 31, 2023
  
  337afb28
- feat(server): add retry on download (#384) · 87dc034b
  OlivierDehaene authored May 31, 2023
  
  87dc034b
- increase health checks · 444400b4
  OlivierDehaene authored May 31, 2023
  
  444400b4
30 May, 2023 5 commits
- v0.8.0 · 081b9265
  OlivierDehaene authored May 30, 2023
  
  081b9265
- feat(server): support RefinedWeb models (#379) · b8b950b3
  OlivierDehaene authored May 30, 2023
  
  b8b950b3
- fix(server): fix quantization · bf7f1d54
  OlivierDehaene authored May 30, 2023
  
  bf7f1d54
- fix(launcher): parse num cuda devices from CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES · 49a6c8c1
  OlivierDehaene authored May 30, 2023
  
  49a6c8c1
- fix(launcher): parse num cuda devices from CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES · 146e72c3
  OlivierDehaene authored May 30, 2023
  
  146e72c3
26 May, 2023 2 commits
- Fix issue when load AutoModelForSeq2SeqLM model (#370) · 5fde8d99
  CL-Shang authored May 26, 2023
  
  5fde8d99
- feat(server): support vectorized warpers in flash causal lm (#317) · 62f91f78
  OlivierDehaene authored May 26, 2023
```
Co-authored-by: Joel Lamy-Poirier <joel.lamy-poirier@servicenow.com>
```
  62f91f78
25 May, 2023 1 commit
- feat(benchmarker): add summary tables (#368) · 951930fb
  OlivierDehaene authored May 25, 2023
  
  951930fb
24 May, 2023 1 commit
- feat: decrease IPC proto size (#367) · 218c9ada
  OlivierDehaene authored May 24, 2023
```
Closes #307 #308
```
  218c9ada
23 May, 2023 9 commits
- v0.7.0 (#353) · d31562f3
  OlivierDehaene authored May 23, 2023
  
  d31562f3
- feat(router): log input/ouput at debug level (#364) · 94200538
  OlivierDehaene authored May 23, 2023
```
@njhill FYI
```
  94200538
- feat(server): support trust_remote_code (#363) · e3e487dc
  OlivierDehaene authored May 23, 2023
  
  e3e487dc
- feat(server): do not use device_map auto on single GPU (#362) · e9669a40
  OlivierDehaene authored May 23, 2023
  
  e9669a40
- feat(server): support fp16 for t5 (#360) · cfaa8580
  OlivierDehaene authored May 23, 2023
```
Fixes #349
```
  cfaa8580
- chore(sever): update requirements (#357) · 94377efa
  OlivierDehaene authored May 23, 2023
```
Fixes #338
```
  94377efa
- feat: add nightly load testing (#358) · 5f67923c
  OlivierDehaene authored May 23, 2023
  
  5f67923c
- fix(ci): fix security group (#359) · 0a649478
  oOraph authored May 23, 2023
```
# What does this PR do?
Switch security group used for ci
(open outbound rules)
Signed-off-by: Raphael <oOraph@users.noreply.github.com>
Co-authored-by: Raphael <oOraph@users.noreply.github.com>
```
  0a649478
- fix(server): t5 cannot run in f16 (#356) · 4f4c9c16
  OlivierDehaene authored May 23, 2023
```
Fix #349
```
  4f4c9c16
22 May, 2023 3 commits
- fix(server): fix init for flash causal lm (#352) · 91d9beec
  OlivierDehaene authored May 22, 2023
```
Fixes #347
```
  91d9beec
- feat(server): Support BLOOMChat-176B (#348) (#351) · e649bf9a
  OlivierDehaene authored May 22, 2023
```
@njhill, 
temporary workaround to be able to run our CI as secrets are not
available to runners run by external contributors. I will ask around to
see if there is a better way.
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
```
  e649bf9a
- fix: set MODEL_ID in sagemaker-entrypoint script (#343) · 1ba78207
  Xin Yang authored May 22, 2023
  
  1ba78207
16 May, 2023 2 commits
- fix(server): fix decode token (#334) · 5a582261
  OlivierDehaene authored May 16, 2023
```
Fixes #333

---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
```
  5a582261
- feat(integration-tests): improve comparison and health checks (#336) · dbdc587d
  OlivierDehaene authored May 16, 2023
  
  dbdc587d