wrong_usages.feature 883 Bytes
Newer Older
xuxzh1's avatar
init  
xuxzh1 committed
1
2
3
4
5
6
7
8
9
10
# run with: ./tests.sh --no-skipped --tags wrong_usage
@wrong_usage
Feature: Wrong usage of llama.cpp server

  #3969 The user must always set --n-predict option
  # to cap the number of tokens any completion request can generate
  # or pass n_predict/max_tokens in the request.
  Scenario: Infinite loop
    Given a server listening on localhost:8080
    And   a model file tinyllamas/stories260K.gguf from HF repo ggml-org/models
xuxzh1's avatar
update  
xuxzh1 committed
11
12
    And   42 as server seed
    And   2048 KV cache size
xuxzh1's avatar
init  
xuxzh1 committed
13
14
15
    # Uncomment below to fix the issue
    #And   64 server max tokens to predict
    Then  the server is starting
xuxzh1's avatar
update  
xuxzh1 committed
16
    Then  the server is healthy
xuxzh1's avatar
init  
xuxzh1 committed
17
18
19
20
21
22
23
24
25
    Given a prompt:
      """
      Go to: infinite loop
      """
    # Uncomment below to fix the issue
    #And   128 max tokens to predict
    Given concurrent completion requests
    Then the server is idle
    Then all prompts are predicted