1. 01 Feb, 2025 3 commits
    • Eldar Kurtic's avatar
      Fix target matching for fused layers with compressed-tensors (#12617) · 1867c258
      Eldar Kurtic authored
      Without this PR
      ---------------
      Quantizing models with llm-compressor and a recipe that explicitly lists
      names of layers produces a model that is not loadable by vLLM (i.e.
      `vllm serve <model>` fails with `raise ValueError(f"Unable to find
      matching target for {module} in the ...`).
      
      Example recipe:
      ```
      recipe = """
      quantization_stage:
        run_type: oneshot
        quantization_modifiers:
          GPTQModifier:
            ignore: ["lm_head"]
            config_groups:
              group_0:
                weights:
                  num_bits: 4
                  type: "int"
                  symmetric: true
                  strategy: "group"
                  group_size: 128
                targets: [
                  "model.layers.0.mlp.down_proj",
                  "model.layers.2.mlp.down_proj",
                  "model.layers.3.mlp.down_proj",
                  "model.layers.4.mlp.down_proj",
                  "model.layers.5.mlp.down_proj",
                  "model.layers.6.mlp.down_proj",
                  "model.layers.7.mlp.down_proj",
                  "model.layers.8.mlp.down_proj",
                  "model.layers.9.mlp.down_proj",
                  "model.layers.10.mlp.down_proj",
                  "model.layers.11.mlp.down_proj",
                  "model.layers.12.mlp.down_proj",
                  "model.layers.13.mlp.down_proj",
                  "model.layers.14.mlp.down_proj",
                  "model.layers.15.mlp.down_proj",
                  "model.layers.16.mlp.down_proj",
                  "model.layers.17.mlp.down_proj",
                  "model.layers.19.mlp.down_proj",
                  "model.layers.21.mlp.down_proj",
                  "model.layers.22.mlp.down_proj",
                  .
                  .
                  .
                ]
      """
      ```
      
      To reproduce the vLLM error: 
      ```bash
      vllm serve nm-testing/eldar-test
      ```
      
      With this PR
      ------------
      Models are loaded correctly without any errors.
      1867c258
    • fade_away's avatar
    • Robert Shaw's avatar
      [V1] Bugfix: Validate Model Input Length (#12600) · b1340f9d
      Robert Shaw authored
      SUMMARY:
      * avoid crashing the engine when we get an input longer than
      max_model_len
      
      FIX #12567(*link existing issues this PR will resolve*)
      b1340f9d
  2. 31 Jan, 2025 15 commits
  3. 30 Jan, 2025 7 commits
  4. 29 Jan, 2025 14 commits
  5. 28 Jan, 2025 1 commit