chore: add pre-commit (#1569)

9946165e · OlivierDehaene · GitHub · 142cdabe · 9946165e · 9946165e
Unverified Commit 9946165e authored Feb 16, 2024 by OlivierDehaene Committed by GitHub Feb 16, 2024
20 changed files
--- a/.github/ISSUE_TEMPLATE/bug-report.yml
+++ b/.github/ISSUE_TEMPLATE/bug-report.yml
@@ -5,14 +5,14 @@ body:
    id: system-info
    attributes:
      label: System Info
-      description: | 
+      description: |
        Please share your system info with us (`text-generation-launcher --env` if installed locally).
-        The full command line used that causes issues: 
+        The full command line used that causes issues:
        OS version:
        Rust version (if self-compiling, `cargo version`):
        Model being used (`curl 127.0.0.1:8080/info | jq`):
          If local model please explicit the kind of model and/or equivalents.
-        Hardware used (GPUs, how many, on which cloud) (`nvidia-smi`): 
+        Hardware used (GPUs, how many, on which cloud) (`nvidia-smi`):
        Deployment specificities (Kubernetes, EKS, AKS, any particular deployments):
        The current version being used:
@@ -52,11 +52,11 @@ body:
      placeholder: |
        Steps to reproduce the behavior:
          1.
          2.
          3.
  - type: textarea
    id: expected-behavior

--- a/.github/ISSUE_TEMPLATE/feature-request.yml
+++ b/.github/ISSUE_TEMPLATE/feature-request.yml
@@ -19,7 +19,7 @@ body:
      label: Motivation
      description: |
        Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too.
  - type: textarea
    id: contribution

--- a/.github/workflows/autodocs.yml
+++ b/.github/workflows/autodocs.yml
@@ -6,15 +6,15 @@ on:
 jobs:
  update_docs:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout code
      uses: actions/checkout@v2
    - name: Install Launcher
      id: install-launcher
      run: cargo install --git https://github.com/${{ github.repository }} --branch ${{ github.head_ref }} text-generation-launcher
    - name: Check launcher Docs are up-to-date
      run: |
        echo text-generation-launcher --help

--- a/.github/workflows/build_pr_documentation.yml
+++ b/.github/workflows/build_pr_documentation.yml
@@ -16,4 +16,4 @@ jobs:
      commit_sha: ${{ github.event.pull_request.head.sha }}
      pr_number: ${{ github.event.number }}
      package: text-generation-inference
-      additional_args: --not_python_module 
+      additional_args: --not_python_module
--- a/.github/workflows/tests.yaml
+++ b/.github/workflows/tests.yaml
@@ -71,12 +71,11 @@ jobs:
          pip install pytest
          export HUGGING_FACE_HUB_TOKEN=${{ secrets.HUGGING_FACE_HUB_TOKEN }}
          pytest -s -vv server/tests
-      - name: Run Rust fmt
+      - name: Pre-commit checks
        run: |
-          cargo fmt --check
+          pip install pre-commit
-      - name: Run Rust clippy
+          pre-commit install
-        run: |
+          pre-commit run --all-files
-          cargo clippy
      - name: Run Rust tests
        run: |
          cargo test

--- a/.github/workflows/upload_pr_documentation.yml
+++ b/.github/workflows/upload_pr_documentation.yml
@@ -13,4 +13,4 @@ jobs:
      package_name: text-generation-inference
    secrets:
      hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
      comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}
\ No newline at end of file
--- a/.gitignore
+++ b/.gitignore
@@ -11,4 +11,3 @@ server/exllama_kernels/exllama_kernels/hip_func/
 *_hip.cuh
 server/exllama_kernels/exllama_kernels/hip_buffers.cuh
 server/exllama_kernels/exllama_kernels/exllama_ext_hip.cpp
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
+repos:
+-   repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.5.0
+    hooks:
+    -   id: check-yaml
+    -   id: end-of-file-fixer
+    -   id: trailing-whitespace
+        exclude: docs/source/basic_tutorials/launcher.md
+-   repo: https://github.com/psf/black
+    rev: 24.2.0
+    hooks:
+    -   id: black
+-   repo: https://github.com/doublify/pre-commit-rust
+    rev: v1.0
+    hooks:
+    -   id: fmt
+    -   id: cargo-check
+    -   id: clippy
--- a/README.md
+++ b/README.md
 <div align="center">
 <a href="https://www.youtube.com/watch?v=jlMAX2Oaht0">
  <img width=560 width=315 alt="Making TGI deployment optimal" src="https://huggingface.co/datasets/Narsil/tgi_assets/resolve/main/thumbnail.png">
 </a>
@@ -228,7 +228,7 @@ text-generation-launcher --model-id mistralai/Mistral-7B-Instruct-v0.2
 You can also quantize the weights with bitsandbytes to reduce the VRAM requirement:
 ```shell
-text-generation-launcher --model-id mistralai/Mistral-7B-Instruct-v0.2 --quantize 
+text-generation-launcher --model-id mistralai/Mistral-7B-Instruct-v0.2 --quantize
 ```
 4bit quantization is available using the [NF4 and FP4 data types from bitsandbytes](https://arxiv.org/pdf/2305.14314.pdf). It can be enabled by providing `--quantize bitsandbytes-nf4` or `--quantize bitsandbytes-fp4` as a command line argument to `text-generation-launcher`.

--- a/benchmark/Cargo.toml
+++ b/benchmark/Cargo.toml
@@ -29,4 +29,3 @@ tui = {package = "ratatui", version = "0.23", default-features = false, features
 tracing = "0.1.37"
 tracing-subscriber = { version = "0.3.17", features = ["json", "env-filter"] }
 hf-hub = "0.3.1"
--- a/benchmark/README.md
+++ b/benchmark/README.md
@@ -6,12 +6,12 @@
 </div>
-A lightweight benchmarking tool based inspired by [oha](https://github.com/hatoo/oha) 
+A lightweight benchmarking tool based inspired by [oha](https://github.com/hatoo/oha)
 and powered by [tui](https://github.com/tui-rs-revival/ratatui).
-## Install 
+## Install
-```shell 
+```shell
 make install-benchmark
 ```
@@ -27,4 +27,4 @@ Then run the benchmarking tool:
 ```shell
 text-generation-benchmark --tokenizer-name bigscience/bloom-560m
 ```
\ No newline at end of file
--- a/clients/python/.gitignore
+++ b/clients/python/.gitignore
@@ -155,4 +155,4 @@ dmypy.json
 cython_debug/
 transformers
 safetensors
\ No newline at end of file
--- a/clients/python/Makefile
+++ b/clients/python/Makefile
@@ -3,4 +3,4 @@ unit-tests:
 install:
 	pip install pip --upgrade
 	pip install -e .
\ No newline at end of file
--- a/clients/python/README.md
+++ b/clients/python/README.md
@@ -141,7 +141,7 @@ class Parameters:
    # Get decoder input token logprobs and ids
    decoder_input_details: bool
    # Return the N most likely tokens at each step
-    top_n_tokens: Optional[int] 
+    top_n_tokens: Optional[int]
 # Decoder input tokens
 class InputToken:
@@ -192,7 +192,7 @@ class BestOfSequence:
    # Generated tokens
    tokens: List[Token]
    # Most likely tokens
-    top_tokens: Optional[List[List[Token]]] 
+    top_tokens: Optional[List[List[Token]]]
 # `generate` details
@@ -236,7 +236,7 @@ class StreamResponse:
    # Generated token
    token: Token
    # Most likely tokens
-    top_tokens: Optional[List[Token]] 
+    top_tokens: Optional[List[Token]]
    # Complete generated text
    # Only available when the generation is finished
    generated_text: Optional[str]
@@ -248,4 +248,4 @@ class StreamResponse:
 class DeployedModel:
    model_id: str
    sha: str
 ```
\ No newline at end of file
--- a/clients/python/text_generation/types.py
+++ b/clients/python/text_generation/types.py
@@ -134,6 +134,7 @@ class Parameters(BaseModel):
                raise ValidationError("`value` cannot be empty for `json` grammar")
        return v
 class Request(BaseModel):
    # Prompt
    inputs: str

--- a/docs/index.html
+++ b/docs/index.html
@@ -27,4 +27,4 @@
            }
        </script>
    </body>
 </html>
\ No newline at end of file
--- a/docs/openapi.json
+++ b/docs/openapi.json
@@ -1290,4 +1290,4 @@
      "description": "Hugging Face Text Generation Inference API"
    }
  ]
 }
\ No newline at end of file
--- a/docs/source/basic_tutorials/consuming_tgi.md
+++ b/docs/source/basic_tutorials/consuming_tgi.md
@@ -23,7 +23,7 @@ You can simply install `huggingface-hub` package with pip.
 pip install huggingface-hub
 ```
-Once you start the TGI server, instantiate `InferenceClient()` with the URL to the endpoint serving the model. You can then call `text_generation()` to hit the endpoint through Python. 
+Once you start the TGI server, instantiate `InferenceClient()` with the URL to the endpoint serving the model. You can then call `text_generation()` to hit the endpoint through Python.
 ```python
 from huggingface_hub import InferenceClient
@@ -83,8 +83,8 @@ Gradio is a Python library that helps you build web applications for your machin
 pip install huggingface-hub gradio
 ```
-Assume you are serving your model on port 8080, we will query through [InferenceClient](consuming_tgi#inference-client). 
+Assume you are serving your model on port 8080, we will query through [InferenceClient](consuming_tgi#inference-client).
 ```python
 import gradio as gr
 from huggingface_hub import InferenceClient
@@ -110,30 +110,30 @@ gr.ChatInterface(
 ).queue().launch()
 ```
-The UI looks like this 👇 
+The UI looks like this 👇
 <div class="flex justify-center">
-    <img 
+    <img
-        class="block dark:hidden" 
+        class="block dark:hidden"
        src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/gradio-tgi.png"
    />
-    <img 
+    <img
-        class="hidden dark:block" 
+        class="hidden dark:block"
        src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/gradio-tgi-dark.png"
    />
 </div>
-You can try the demo directly here 👇 
+You can try the demo directly here 👇
 <div class="block dark:hidden">
-	<iframe 
+	<iframe
        src="https://merve-gradio-tgi-2.hf.space?__theme=light"
        width="850"
        height="750"
    ></iframe>
 </div>
 <div class="hidden dark:block">
-    <iframe 
+    <iframe
        src="https://merve-gradio-tgi-2.hf.space?__theme=dark"
        width="850"
        height="750"
@@ -152,4 +152,4 @@ You can read more about how to customize a `ChatInterface` [here](https://www.gr
 ## API documentation
-You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. The Swagger UI is also available [here](https://huggingface.github.io/text-generation-inference). 
+You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. The Swagger UI is also available [here](https://huggingface.github.io/text-generation-inference).
--- a/docs/source/basic_tutorials/non_core_models.md
+++ b/docs/source/basic_tutorials/non_core_models.md
@@ -2,19 +2,19 @@
 TGI supports various LLM architectures (see full list [here](../supported_models)). If you wish to serve a model that is not one of the supported models, TGI will fallback to the `transformers` implementation of that model. This means you will be unable to use some of the features introduced by TGI, such as tensor-parallel sharding or flash attention. However, you can still get many benefits of TGI, such as continuous batching or streaming outputs.
-You can serve these models using the same Docker command-line invocation as with fully supported models 👇 
+You can serve these models using the same Docker command-line invocation as with fully supported models 👇
 ```bash
 docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id gpt2
 ```
-If the model you wish to serve is a custom transformers model, and its weights and implementation are available in the Hub, you can still serve the model by passing the `--trust-remote-code` flag to the `docker run` command like below 👇 
+If the model you wish to serve is a custom transformers model, and its weights and implementation are available in the Hub, you can still serve the model by passing the `--trust-remote-code` flag to the `docker run` command like below 👇
 ```bash
 docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id <CUSTOM_MODEL_ID> --trust-remote-code
 ```
-Finally, if the model is not on Hugging Face Hub but on your local, you can pass the path to the folder that contains your model like below 👇 
+Finally, if the model is not on Hugging Face Hub but on your local, you can pass the path to the folder that contains your model like below 👇
 ```bash
 # Make sure your model is in the $volume directory

--- a/docs/source/basic_tutorials/preparing_model.md
+++ b/docs/source/basic_tutorials/preparing_model.md
 # Preparing the Model
-Text Generation Inference improves the model in several aspects. 
+Text Generation Inference improves the model in several aspects.
 ## Quantization
@@ -9,7 +9,7 @@ TGI supports [bits-and-bytes](https://github.com/TimDettmers/bitsandbytes#bitsan
 ## RoPE Scaling
-RoPE scaling can be used to increase the sequence length of the model during the inference time without necessarily fine-tuning it. To enable RoPE scaling, simply pass `--rope-scaling`, `--max-input-length` and `--rope-factors` flags when running through CLI. `--rope-scaling` can take the values `linear` or `dynamic`. If your model is not fine-tuned to a longer sequence length, use `dynamic`. `--rope-factor` is the ratio between the intended max sequence length and the model's original max sequence length. Make sure to pass `--max-input-length` to provide maximum input length for extension. 
+RoPE scaling can be used to increase the sequence length of the model during the inference time without necessarily fine-tuning it. To enable RoPE scaling, simply pass `--rope-scaling`, `--max-input-length` and `--rope-factors` flags when running through CLI. `--rope-scaling` can take the values `linear` or `dynamic`. If your model is not fine-tuned to a longer sequence length, use `dynamic`. `--rope-factor` is the ratio between the intended max sequence length and the model's original max sequence length. Make sure to pass `--max-input-length` to provide maximum input length for extension.
 <Tip>
@@ -19,4 +19,4 @@ We recommend using `dynamic` RoPE scaling.
 ## Safetensors
-[Safetensors](https://github.com/huggingface/safetensors) is a fast and safe persistence format for deep learning models, and is required for tensor parallelism. TGI supports `safetensors` model loading under the hood. By default, given a repository with `safetensors` and `pytorch` weights, TGI will always load `safetensors`. If there's no `pytorch` weights, TGI will convert the weights to `safetensors` format. 
+[Safetensors](https://github.com/huggingface/safetensors) is a fast and safe persistence format for deep learning models, and is required for tensor parallelism. TGI supports `safetensors` model loading under the hood. By default, given a repository with `safetensors` and `pytorch` weights, TGI will always load `safetensors`. If there's no `pytorch` weights, TGI will convert the weights to `safetensors` format.