Unverified Commit e415b690 authored by Nicolas Patry's avatar Nicolas Patry Committed by GitHub
Browse files

Lots of improvements (Still 2 allocators) (#2449)



* Making prefix/flashinfer the default and testing the full release tests.

* Include flashinfer in the docker.

* Using prebuilt.

* Allowing window_left_size (dummy version).

* Disabling flashinfer/prefix caching on odd head_dim

* Disable prefix caching for lora.

* More specific codes.

* Update lock

* Updating integration tests with new values with FI/FD.

Remove paged as a default too, and using FD everywhere.

* Update cargo lock ?

* Upgrade to 1.80 because of bitstream...

* Everywhere 1.80

* Forgot last default place.

* Apply suggestions from code review
Co-authored-by: default avatardrbh <david.richard.holtz@gmail.com>

* Updated flake lock

* Tmp

* Upgrade resolution system for less errors in resolution.

* Remove lambda for cleaner function.

* Handling debugger.

* OVerride the env in server tests.

* Is this enough to make it work ?

* This seems to be working.

* Downgrade some logs.

* Fixing the default for vlm.

* Don't enable prefix caching on VLM just yet.

* Change `add_special_tokens` in order to have the correct tokens for chat
input and not (since it's super important with the prefixing now)

* Fixing prefix caching for flashdecoding.

* Update all models.

* Fixed flashinfer version.

* add_special_tokens is internal only

* Fixing seqlen with the new vlms.

* Fixing the issue with `add_special_tokens` not being passed around.

* Fixing the test.

* Removing encoder_decoder (seq2seq).

* Update the chat test.

* Fixing the batching tokenization in flash causal lm.

* Truncating left for radix purposes.

* Oops this doesn't belong here.

* Put back default pure shell.

* Update server tests

- Default to throughput test in k6
- Use TGI_WIGGLE_ROOM to adjust wiggle room

* Only n_heads / process_group.size() are necessary.

* Revert the integrationt tests change (seem linked to head_size
modification).

* Adding error message when assert is violated.

* Fixing the free algorithm to handle times where the common prefix is
smaller.

* Apply suggestions from code review
Co-authored-by: default avatarOlivierDehaene <olivier@huggingface.co>

* Update server/text_generation_server/layers/attention/common.py
Co-authored-by: default avatarOlivierDehaene <olivier@huggingface.co>

* Fix disabling prefix caching - Fix windowing checks.

* Revert the Cohere tokenizer change (for now using a revision instead).

* Fmt.

---------
Co-authored-by: default avatardrbh <david.richard.holtz@gmail.com>
Co-authored-by: default avatarOlivierDehaene <olivier@huggingface.co>
parent 4e821c00
...@@ -35,7 +35,7 @@ jobs: ...@@ -35,7 +35,7 @@ jobs:
with: with:
# Released on: 02 May, 2024 # Released on: 02 May, 2024
# https://releases.rs/docs/1.78.0/ # https://releases.rs/docs/1.78.0/
toolchain: 1.79.0 toolchain: 1.80.0
override: true override: true
components: rustfmt, clippy components: rustfmt, clippy
- name: Install Protoc - name: Install Protoc
......
This diff is collapsed.
# Rust builder # Rust builder
FROM lukemathwalker/cargo-chef:latest-rust-1.79 AS chef FROM lukemathwalker/cargo-chef:latest-rust-1.80 AS chef
WORKDIR /usr/src WORKDIR /usr/src
ARG CARGO_REGISTRIES_CRATES_IO_PROTOCOL=sparse ARG CARGO_REGISTRIES_CRATES_IO_PROTOCOL=sparse
...@@ -184,6 +184,12 @@ WORKDIR /usr/src ...@@ -184,6 +184,12 @@ WORKDIR /usr/src
COPY server/Makefile-selective-scan Makefile COPY server/Makefile-selective-scan Makefile
RUN make build-all RUN make build-all
# Build flashinfer
FROM kernel-builder AS flashinfer-builder
WORKDIR /usr/src
COPY server/Makefile-flashinfer Makefile
RUN make install-flashinfer
# Text Generation Inference base image # Text Generation Inference base image
FROM nvidia/cuda:12.1.0-base-ubuntu22.04 AS base FROM nvidia/cuda:12.1.0-base-ubuntu22.04 AS base
...@@ -236,6 +242,7 @@ COPY --from=vllm-builder /usr/src/vllm/build/lib.linux-x86_64-cpython-310 /opt/c ...@@ -236,6 +242,7 @@ COPY --from=vllm-builder /usr/src/vllm/build/lib.linux-x86_64-cpython-310 /opt/c
# Copy build artifacts from mamba builder # Copy build artifacts from mamba builder
COPY --from=mamba-builder /usr/src/mamba/build/lib.linux-x86_64-cpython-310/ /opt/conda/lib/python3.10/site-packages COPY --from=mamba-builder /usr/src/mamba/build/lib.linux-x86_64-cpython-310/ /opt/conda/lib/python3.10/site-packages
COPY --from=mamba-builder /usr/src/causal-conv1d/build/lib.linux-x86_64-cpython-310/ /opt/conda/lib/python3.10/site-packages COPY --from=mamba-builder /usr/src/causal-conv1d/build/lib.linux-x86_64-cpython-310/ /opt/conda/lib/python3.10/site-packages
COPY --from=flashinfer-builder /opt/conda/lib/python3.10/site-packages/flashinfer/ /opt/conda/lib/python3.10/site-packages/flashinfer/
# Install flash-attention dependencies # Install flash-attention dependencies
RUN pip install einops --no-cache-dir RUN pip install einops --no-cache-dir
......
# Rust builder # Rust builder
FROM lukemathwalker/cargo-chef:latest-rust-1.79 AS chef FROM lukemathwalker/cargo-chef:latest-rust-1.80 AS chef
WORKDIR /usr/src WORKDIR /usr/src
ARG CARGO_REGISTRIES_CRATES_IO_PROTOCOL=sparse ARG CARGO_REGISTRIES_CRATES_IO_PROTOCOL=sparse
......
ARG PLATFORM=xpu ARG PLATFORM=xpu
FROM lukemathwalker/cargo-chef:latest-rust-1.79 AS chef FROM lukemathwalker/cargo-chef:latest-rust-1.80 AS chef
WORKDIR /usr/src WORKDIR /usr/src
ARG CARGO_REGISTRIES_CRATES_IO_PROTOCOL=sparse ARG CARGO_REGISTRIES_CRATES_IO_PROTOCOL=sparse
......
...@@ -153,6 +153,8 @@ impl Client { ...@@ -153,6 +153,8 @@ impl Client {
}), }),
// We truncate the input on the server side to be sure that it has the correct size // We truncate the input on the server side to be sure that it has the correct size
truncate, truncate,
// Most request will have that
add_special_tokens: true,
// Blocks and slots will be set on the server side if we use paged attention // Blocks and slots will be set on the server side if we use paged attention
blocks: vec![], blocks: vec![],
slots: vec![], slots: vec![],
......
...@@ -221,6 +221,7 @@ impl Health for ShardedClient { ...@@ -221,6 +221,7 @@ impl Health for ShardedClient {
chunks: vec![Chunk::Text("liveness".into()).into()], chunks: vec![Chunk::Text("liveness".into()).into()],
}), }),
truncate: 10, truncate: 10,
add_special_tokens: true,
prefill_logprobs: false, prefill_logprobs: false,
parameters: Some(NextTokenChooserParameters { parameters: Some(NextTokenChooserParameters {
temperature: 1.0, temperature: 1.0,
......
...@@ -35,27 +35,15 @@ impl BackendV3 { ...@@ -35,27 +35,15 @@ impl BackendV3 {
window_size: Option<u32>, window_size: Option<u32>,
speculate: u32, speculate: u32,
) -> Self { ) -> Self {
let prefix_caching = if let Ok(prefix_caching) = std::env::var("USE_PREFIX_CACHING") { let prefix_caching =
matches!(prefix_caching.as_str(), "true" | "1") std::env::var("USE_PREFIX_CACHING").expect("Expect prefix caching env var");
} else { let prefix_caching = matches!(prefix_caching.as_str(), "true" | "1");
false let attention: String = std::env::var("ATTENTION").expect("attention env var");
};
let attention = if let Ok(attention) = std::env::var("ATTENTION") { let attention: Attention = attention
attention .parse()
.parse() .unwrap_or_else(|_| panic!("Invalid attention was specified :`{attention}`"));
.unwrap_or_else(|_| panic!("Invalid attention was specified :`{attention}`")) let block_size = attention.block_size();
} else if prefix_caching {
Attention::FlashInfer
} else {
Attention::Paged
};
let block_size = if attention == Attention::FlashDecoding {
256
} else if attention == Attention::FlashInfer {
1
} else {
16
};
let queue = Queue::new( let queue = Queue::new(
requires_padding, requires_padding,
......
use std::{cmp::min, sync::Arc}; use std::sync::Arc;
use tokio::sync::{mpsc, oneshot}; use tokio::sync::{mpsc, oneshot};
use crate::radix::RadixAllocator; use crate::radix::RadixAllocator;
...@@ -137,7 +137,6 @@ pub trait Allocator { ...@@ -137,7 +137,6 @@ pub trait Allocator {
fn free(&mut self, blocks: Vec<u32>, allocation_id: u64); fn free(&mut self, blocks: Vec<u32>, allocation_id: u64);
} }
pub struct SimpleAllocator { pub struct SimpleAllocator {
free_blocks: Vec<u32>, free_blocks: Vec<u32>,
block_size: u32, block_size: u32,
...@@ -167,7 +166,7 @@ impl Allocator for SimpleAllocator { ...@@ -167,7 +166,7 @@ impl Allocator for SimpleAllocator {
None => (tokens, 1), None => (tokens, 1),
Some(window_size) => { Some(window_size) => {
let repeats = (tokens + window_size - 1) / window_size; let repeats = (tokens + window_size - 1) / window_size;
let tokens = min(tokens, window_size); let tokens = core::cmp::min(tokens, window_size);
(tokens, repeats as usize) (tokens, repeats as usize)
} }
}; };
......
...@@ -149,6 +149,7 @@ impl Client { ...@@ -149,6 +149,7 @@ impl Client {
requests.push(Request { requests.push(Request {
id: 0, id: 0,
inputs, inputs,
add_special_tokens: true,
input_chunks: Some(Input { input_chunks: Some(Input {
chunks: input_chunks, chunks: input_chunks,
}), }),
......
...@@ -222,6 +222,7 @@ impl Health for ShardedClient { ...@@ -222,6 +222,7 @@ impl Health for ShardedClient {
chunks: vec![Chunk::Text("liveness".into()).into()], chunks: vec![Chunk::Text("liveness".into()).into()],
}), }),
truncate: 10, truncate: 10,
add_special_tokens: true,
prefill_logprobs: false, prefill_logprobs: false,
parameters: Some(NextTokenChooserParameters { parameters: Some(NextTokenChooserParameters {
temperature: 1.0, temperature: 1.0,
......
...@@ -383,6 +383,7 @@ impl State { ...@@ -383,6 +383,7 @@ impl State {
}), }),
inputs: entry.request.inputs.chunks_to_string(), inputs: entry.request.inputs.chunks_to_string(),
truncate: entry.request.truncate, truncate: entry.request.truncate,
add_special_tokens: entry.request.add_special_tokens,
parameters: Some(NextTokenChooserParameters::from( parameters: Some(NextTokenChooserParameters::from(
entry.request.parameters.clone(), entry.request.parameters.clone(),
)), )),
...@@ -517,6 +518,7 @@ mod tests { ...@@ -517,6 +518,7 @@ mod tests {
inputs: vec![], inputs: vec![],
input_ids: Some(Arc::new(vec![])), input_ids: Some(Arc::new(vec![])),
input_length: 0, input_length: 0,
add_special_tokens: true,
truncate: 0, truncate: 0,
decoder_input_details: false, decoder_input_details: false,
parameters: ValidParameters { parameters: ValidParameters {
......
use crate::block_allocator::{Allocator, BlockAllocation};
use slotmap::{DefaultKey, SlotMap};
use std::{ use std::{
collections::{BTreeSet, HashMap}, collections::{BTreeSet, HashMap},
sync::Arc, sync::Arc,
}; };
use slotmap::{DefaultKey, SlotMap};
use crate::block_allocator::{Allocator, BlockAllocation};
pub struct RadixAllocator { pub struct RadixAllocator {
allocation_id: u64, allocation_id: u64,
...@@ -16,26 +14,26 @@ pub struct RadixAllocator { ...@@ -16,26 +14,26 @@ pub struct RadixAllocator {
/// Blocks that are immediately available for allocation. /// Blocks that are immediately available for allocation.
free_blocks: Vec<u32>, free_blocks: Vec<u32>,
#[allow(dead_code)]
// This isn't used because the prefix need to match without the windowing
// mecanism. This at worst is overallocating, not necessarily being wrong.
window_size: Option<u32>,
block_size: u32,
} }
impl RadixAllocator { impl RadixAllocator {
pub fn new(block_size: u32, n_blocks: u32, window_size: Option<u32>) -> Self { pub fn new(block_size: u32, n_blocks: u32, window_size: Option<u32>) -> Self {
assert_eq!(
block_size, 1,
"Radix tree allocator only works with block_size=1, was: {}",
block_size
);
if window_size.is_some() {
unimplemented!("Window size not supported in the prefix-caching block allocator yet");
}
RadixAllocator { RadixAllocator {
allocation_id: 0, allocation_id: 0,
allocations: HashMap::new(), allocations: HashMap::new(),
cache_blocks: RadixTrie::new(), cache_blocks: RadixTrie::new(block_size as usize),
// Block 0 is reserved for health checks. // Block 0 is reserved for health checks.
free_blocks: (1..n_blocks).collect(), free_blocks: (1..n_blocks).collect(),
window_size,
block_size,
} }
} }
...@@ -63,6 +61,7 @@ impl RadixAllocator { ...@@ -63,6 +61,7 @@ impl RadixAllocator {
} }
} }
// Allocator trait
impl Allocator for RadixAllocator { impl Allocator for RadixAllocator {
fn allocate( fn allocate(
&mut self, &mut self,
...@@ -86,10 +85,12 @@ impl Allocator for RadixAllocator { ...@@ -86,10 +85,12 @@ impl Allocator for RadixAllocator {
.incref(prefix_node) .incref(prefix_node)
.expect("Failed to increment refcount"); .expect("Failed to increment refcount");
let prefix_len = blocks.len(); let prefix_len = blocks.len() * self.block_size as usize;
let suffix_len = tokens - prefix_len as u32; let suffix_len = tokens - prefix_len as u32;
match self.alloc_or_reclaim(suffix_len as usize) { let suffix_blocks = (suffix_len + self.block_size - 1) / self.block_size;
match self.alloc_or_reclaim(suffix_blocks as usize) {
Some(suffix_blocks) => blocks.extend(suffix_blocks), Some(suffix_blocks) => blocks.extend(suffix_blocks),
None => { None => {
self.cache_blocks self.cache_blocks
...@@ -100,7 +101,20 @@ impl Allocator for RadixAllocator { ...@@ -100,7 +101,20 @@ impl Allocator for RadixAllocator {
} }
// 1:1 mapping of blocks and slots. // 1:1 mapping of blocks and slots.
let slots = blocks.clone(); let slots = if self.block_size == 1 {
blocks.clone()
} else {
let mut slots = Vec::with_capacity(blocks.len() * self.block_size as usize);
'slots: for block_id in &blocks {
for s in (block_id * self.block_size)..((block_id + 1) * self.block_size) {
slots.push(s);
if slots.len() as u32 == tokens {
break 'slots;
}
}
}
slots
};
let allocation = RadixAllocation { let allocation = RadixAllocation {
prefix_node, prefix_node,
...@@ -108,6 +122,8 @@ impl Allocator for RadixAllocator { ...@@ -108,6 +122,8 @@ impl Allocator for RadixAllocator {
prefill_tokens: prefill_tokens.clone(), prefill_tokens: prefill_tokens.clone(),
}; };
tracing::debug!("Blocks {blocks:?}");
self.allocation_id += 1; self.allocation_id += 1;
self.allocations.insert(self.allocation_id, allocation); self.allocations.insert(self.allocation_id, allocation);
...@@ -136,27 +152,38 @@ impl Allocator for RadixAllocator { ...@@ -136,27 +152,38 @@ impl Allocator for RadixAllocator {
// If there are prefill tokens that did not come from the cache, // If there are prefill tokens that did not come from the cache,
// add them to the cache. // add them to the cache.
if prefill_tokens.len() > allocation.cached_prefix_len { if prefill_tokens.len() > allocation.cached_prefix_len {
let prefix_len = self let aligned =
.cache_blocks (prefill_tokens.len() / self.block_size as usize) * self.block_size as usize;
.insert(prefill_tokens, &blocks[..prefill_tokens.len()]) if aligned > 0 {
// Unwrap, failing is a programming error. let prefix_len = self
.expect("Failed to store prefill tokens"); .cache_blocks
.insert(
// We can have a prefill with the following structure: &prefill_tokens[..aligned],
// &blocks[..aligned / self.block_size as usize],
// |---| From the prefix cache. )
// A B C D E F G // Unwrap, failing is a programming error.
//|--------| Found in the trie during insertion. .expect("Failed to store prefill tokens");
// // We can have a prefill with the following structure:
// This means that while processing this request there was a //
// partially overlapping request that had A..=E in its // |---| From the prefix cache.
// prefill. In this case we need to free the blocks D E. // A B C D E F G
self.free_blocks //|--------| Found in the trie during insertion.
.extend(&blocks[allocation.cached_prefix_len..prefix_len]); //
// This means that while processing this request there was a
// partially overlapping request that had A..=E in its
// prefill. In this case we need to free the blocks D E.
if prefix_len > allocation.cached_prefix_len {
self.free_blocks.extend(
&blocks[allocation.cached_prefix_len / self.block_size as usize
..prefix_len / self.block_size as usize],
);
}
}
} }
// Free non-prefill blocks. // Free non-prefill blocks.
self.free_blocks.extend(&blocks[prefill_tokens.len()..]); self.free_blocks
.extend(&blocks[prefill_tokens.len() / self.block_size as usize..]);
} else { } else {
self.free_blocks.extend(blocks); self.free_blocks.extend(blocks);
} }
...@@ -204,17 +231,14 @@ pub struct RadixTrie { ...@@ -204,17 +231,14 @@ pub struct RadixTrie {
/// Time as a monotonically increating counter to avoid the system /// Time as a monotonically increating counter to avoid the system
/// call that a real time lookup would require. /// call that a real time lookup would require.
time: u64, time: u64,
}
impl Default for RadixTrie { /// All blocks need to be aligned with this
fn default() -> Self { block_size: usize,
Self::new()
}
} }
impl RadixTrie { impl RadixTrie {
/// Construct a new radix trie. /// Construct a new radix trie.
pub fn new() -> Self { pub fn new(block_size: usize) -> Self {
let root = TrieNode::new(vec![], vec![], 0, None); let root = TrieNode::new(vec![], vec![], 0, None);
let mut nodes = SlotMap::new(); let mut nodes = SlotMap::new();
let root = nodes.insert(root); let root = nodes.insert(root);
...@@ -223,13 +247,14 @@ impl RadixTrie { ...@@ -223,13 +247,14 @@ impl RadixTrie {
nodes, nodes,
root, root,
time: 0, time: 0,
block_size,
} }
} }
/// Find the prefix of the given tokens. /// Find the prefix of the given tokens.
/// ///
/// The blocks corresponding to the part of the prefix that could be found /// The blocks corresponding to the part of the prefix that could be found
/// are writteng to `blocks`. The number of blocks is in `0..=tokens.len()`. /// are written to `blocks`. The number of blocks is in `0..=tokens.len()`.
/// Returns the identifier of the trie node that contains the longest /// Returns the identifier of the trie node that contains the longest
/// prefix. The node identifier can be used by callers to e.g. increase its /// prefix. The node identifier can be used by callers to e.g. increase its
/// reference count. /// reference count.
...@@ -247,8 +272,9 @@ impl RadixTrie { ...@@ -247,8 +272,9 @@ impl RadixTrie {
if let Some(&child_id) = node.children.get(&key[0]) { if let Some(&child_id) = node.children.get(&key[0]) {
self.update_access_time(child_id); self.update_access_time(child_id);
let child = self.nodes.get(child_id).expect("Invalid child identifier"); let child = self.nodes.get(child_id).expect("Invalid child identifier");
let shared_prefix_len = child.key.shared_prefix_len(key); let shared_prefix_len = shared_prefix(&child.key, key, self.block_size);
blocks.extend(&child.blocks[..shared_prefix_len]); assert_eq!(shared_prefix_len % self.block_size, 0);
blocks.extend(&child.blocks[..shared_prefix_len / self.block_size]);
let key = &key[shared_prefix_len..]; let key = &key[shared_prefix_len..];
if !key.is_empty() { if !key.is_empty() {
...@@ -349,7 +375,8 @@ impl RadixTrie { ...@@ -349,7 +375,8 @@ impl RadixTrie {
/// the first 10 elements of the tree **the blocks are not updated**. /// the first 10 elements of the tree **the blocks are not updated**.
pub fn insert(&mut self, tokens: &[u32], blocks: &[u32]) -> Result<usize, TrieError> { pub fn insert(&mut self, tokens: &[u32], blocks: &[u32]) -> Result<usize, TrieError> {
self.time += 1; self.time += 1;
self.insert_(self.root, tokens, blocks) let common = self.insert_(self.root, tokens, blocks)?;
Ok(common)
} }
/// Insertion worker. /// Insertion worker.
...@@ -363,7 +390,7 @@ impl RadixTrie { ...@@ -363,7 +390,7 @@ impl RadixTrie {
// the part of the prefix that is already in the trie to detect // the part of the prefix that is already in the trie to detect
// mismatches. // mismatches.
if tokens.len() != blocks.len() { if tokens.len() != blocks.len() * self.block_size {
return Err(TrieError::BlockTokenCountMismatch); return Err(TrieError::BlockTokenCountMismatch);
} }
...@@ -374,10 +401,10 @@ impl RadixTrie { ...@@ -374,10 +401,10 @@ impl RadixTrie {
.get_mut(child_id) .get_mut(child_id)
// Unwrap here, since failure is a bug. // Unwrap here, since failure is a bug.
.expect("Child node does not exist"); .expect("Child node does not exist");
let shared_prefix_len = child.key.shared_prefix_len(tokens); let shared_prefix_len = shared_prefix(&child.key, tokens, self.block_size);
// We are done, the prefix is already in the trie. // We are done, the prefix is already in the trie.
if shared_prefix_len == tokens.len() { if shared_prefix_len == tokens.len() || shared_prefix_len == 0 {
return Ok(shared_prefix_len); return Ok(shared_prefix_len);
} }
...@@ -387,7 +414,7 @@ impl RadixTrie { ...@@ -387,7 +414,7 @@ impl RadixTrie {
+ self.insert_( + self.insert_(
child_id, child_id,
&tokens[shared_prefix_len..], &tokens[shared_prefix_len..],
&blocks[shared_prefix_len..], &blocks[shared_prefix_len / self.block_size..],
)?); )?);
} }
...@@ -396,7 +423,7 @@ impl RadixTrie { ...@@ -396,7 +423,7 @@ impl RadixTrie {
// remainder of the prefix into the node again // remainder of the prefix into the node again
let child_id = self.split_node(child_id, shared_prefix_len); let child_id = self.split_node(child_id, shared_prefix_len);
let key = &tokens[shared_prefix_len..]; let key = &tokens[shared_prefix_len..];
let blocks = &blocks[shared_prefix_len..]; let blocks = &blocks[shared_prefix_len / self.block_size..];
Ok(shared_prefix_len + self.insert_(child_id, key, blocks)?) Ok(shared_prefix_len + self.insert_(child_id, key, blocks)?)
} else { } else {
self.add_node(node_id, tokens, blocks); self.add_node(node_id, tokens, blocks);
...@@ -550,34 +577,53 @@ impl TrieNode { ...@@ -550,34 +577,53 @@ impl TrieNode {
} }
} }
/// Helper trait to get the length of the shared prefix of two sequences. fn shared_prefix(left: &[u32], right: &[u32], block_size: usize) -> usize {
trait SharedPrefixLen { let full = left.iter().zip(right).take_while(|(a, b)| a == b).count();
fn shared_prefix_len(&self, other: &Self) -> usize; (full / block_size) * block_size
}
impl<T> SharedPrefixLen for [T]
where
T: PartialEq,
{
fn shared_prefix_len(&self, other: &Self) -> usize {
self.iter().zip(other).take_while(|(a, b)| a == b).count()
}
} }
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use std::sync::Arc; use std::sync::Arc;
use crate::block_allocator::Allocator; use super::*;
use super::RadixAllocator; #[test]
fn allocator_block_size() {
let mut cache = RadixAllocator::new(2, 12, None);
let allocation = cache.allocate(8, Some(Arc::new(vec![0, 1, 2, 3]))).unwrap();
assert_eq!(allocation.blocks, vec![8, 9, 10, 11]);
assert_eq!(allocation.slots, vec![16, 17, 18, 19, 20, 21, 22, 23]);
assert_eq!(allocation.prefix_len, 0);
cache.free(allocation.blocks.clone(), allocation.allocation_id);
let allocation = cache.allocate(8, Some(Arc::new(vec![0, 1, 2, 3]))).unwrap();
assert_eq!(allocation.blocks, vec![8, 9, 10, 11]);
assert_eq!(allocation.slots, vec![16, 17, 18, 19, 20, 21, 22, 23]);
assert_eq!(allocation.prefix_len, 4);
}
#[test]
fn allocator_block_size_non_aligned() {
let mut cache = RadixAllocator::new(2, 12, None);
let allocation = cache.allocate(7, Some(Arc::new(vec![0, 1, 2]))).unwrap();
assert_eq!(allocation.blocks, vec![8, 9, 10, 11]);
assert_eq!(allocation.slots, vec![16, 17, 18, 19, 20, 21, 22]);
assert_eq!(allocation.prefix_len, 0);
cache.free(allocation.blocks.clone(), allocation.allocation_id);
let allocation = cache.allocate(7, Some(Arc::new(vec![0, 1, 2]))).unwrap();
assert_eq!(allocation.blocks, vec![8, 9, 10, 11]);
assert_eq!(allocation.slots, vec![16, 17, 18, 19, 20, 21, 22]);
assert_eq!(allocation.prefix_len, 2);
}
#[test] #[test]
fn allocator_reuses_prefixes() { fn allocator_reuses_prefixes() {
let mut cache = RadixAllocator::new(1, 12, None); let mut cache = RadixAllocator::new(1, 12, None);
let allocation = cache.allocate(8, Some(Arc::new(vec![0, 1, 2, 3]))).unwrap(); let allocation = cache.allocate(8, Some(Arc::new(vec![0, 1, 2, 3]))).unwrap();
assert_eq!(allocation.blocks, vec![4, 5, 6, 7, 8, 9, 10, 11]); assert_eq!(allocation.blocks, vec![4, 5, 6, 7, 8, 9, 10, 11]);
assert_eq!(allocation.slots, allocation.slots); assert_eq!(allocation.blocks, allocation.slots);
assert_eq!(allocation.prefix_len, 0); assert_eq!(allocation.prefix_len, 0);
cache.free(allocation.blocks.clone(), allocation.allocation_id); cache.free(allocation.blocks.clone(), allocation.allocation_id);
...@@ -666,7 +712,7 @@ mod tests { ...@@ -666,7 +712,7 @@ mod tests {
#[test] #[test]
fn trie_insertions_have_correct_prefix_len() { fn trie_insertions_have_correct_prefix_len() {
let mut trie = super::RadixTrie::new(); let mut trie = RadixTrie::new(1);
assert_eq!(trie.insert(&[0, 1, 2], &[0, 1, 2]).unwrap(), 0); assert_eq!(trie.insert(&[0, 1, 2], &[0, 1, 2]).unwrap(), 0);
...@@ -687,9 +733,33 @@ mod tests { ...@@ -687,9 +733,33 @@ mod tests {
); );
} }
#[test]
fn trie_insertions_block_size() {
let mut trie = RadixTrie::new(2);
assert_eq!(trie.insert(&[0, 1, 2, 3], &[0, 1]).unwrap(), 0);
// Already exists.
// But needs to be block_size aligned
assert_eq!(trie.insert(&[0, 1, 2, 3], &[0, 1]).unwrap(), 4);
// Completely new at root-level
assert_eq!(trie.insert(&[1, 2, 3, 4], &[1, 2]).unwrap(), 0);
// Contains full prefix, but longer.
assert_eq!(trie.insert(&[0, 1, 2, 3, 4, 5], &[0, 1, 2]).unwrap(), 4);
// Shares partial prefix, we need a split.
assert_eq!(
trie.insert(&[0, 1, 3, 4, 5, 6, 7, 8], &[0, 1, 2, 3])
.unwrap(),
2
);
}
#[test] #[test]
fn trie_get_returns_correct_blocks() { fn trie_get_returns_correct_blocks() {
let mut trie = super::RadixTrie::new(); let mut trie = RadixTrie::new(1);
trie.insert(&[0, 1, 2], &[0, 1, 2]).unwrap(); trie.insert(&[0, 1, 2], &[0, 1, 2]).unwrap();
trie.insert(&[1, 2, 3], &[1, 2, 3]).unwrap(); trie.insert(&[1, 2, 3], &[1, 2, 3]).unwrap();
trie.insert(&[0, 1, 2, 3, 4], &[0, 1, 2, 3, 4]).unwrap(); trie.insert(&[0, 1, 2, 3, 4], &[0, 1, 2, 3, 4]).unwrap();
...@@ -723,7 +793,7 @@ mod tests { ...@@ -723,7 +793,7 @@ mod tests {
#[test] #[test]
fn trie_evict_removes_correct_blocks() { fn trie_evict_removes_correct_blocks() {
let mut trie = super::RadixTrie::new(); let mut trie = RadixTrie::new(1);
trie.insert(&[0, 1, 2], &[0, 1, 2]).unwrap(); trie.insert(&[0, 1, 2], &[0, 1, 2]).unwrap();
trie.insert(&[0, 1, 2, 3, 5, 6, 7], &[0, 1, 2, 3, 5, 6, 7]) trie.insert(&[0, 1, 2, 3, 5, 6, 7], &[0, 1, 2, 3, 5, 6, 7])
.unwrap(); .unwrap();
......
...@@ -148,6 +148,7 @@ async fn prefill( ...@@ -148,6 +148,7 @@ async fn prefill(
}), }),
inputs: sequence.clone(), inputs: sequence.clone(),
truncate: sequence_length, truncate: sequence_length,
add_special_tokens: true,
parameters: Some(parameters.clone()), parameters: Some(parameters.clone()),
stopping_parameters: Some(StoppingCriteriaParameters { stopping_parameters: Some(StoppingCriteriaParameters {
max_new_tokens: decode_length, max_new_tokens: decode_length,
......
...@@ -835,11 +835,11 @@ ...@@ -835,11 +835,11 @@
] ]
}, },
"locked": { "locked": {
"lastModified": 1724206841, "lastModified": 1724638882,
"narHash": "sha256-L8dKaX4T3k+TR2fEHCfGbH4UXdspovz/pj87iai9qmc=", "narHash": "sha256-ap2jIQi/FuUHR6HCht6ASWhoz8EiB99XmI8Esot38VE=",
"owner": "oxalica", "owner": "oxalica",
"repo": "rust-overlay", "repo": "rust-overlay",
"rev": "45e98fbd62c32e5927e952d2833fa1ba4fb35a61", "rev": "19b70f147b9c67a759e35824b241f1ed92e46694",
"type": "github" "type": "github"
}, },
"original": { "original": {
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
"index": 0, "index": 0,
"logprobs": null, "logprobs": null,
"message": { "message": {
"content": "As of your last question, the weather in Brooklyn, New York, is typically hot and humid throughout the year. The suburbs around New York City are jealously sheltered, and at least in the Lower Bronx, there are very few outdoor environments to explore in the middle of urban confines. In fact, typical times for humidity levels in Brooklyn include:\n\n- Early morning: 80-85% humidity, with occas", "content": "As of your last question, the weather in Brooklyn, New York, is typically hot and humid throughout the year. The suburbs around New York City are jealously sheltered, and at least in the Lower Bronx, there are very few outdoor environments to appreciate nature.\n\nIn terms of temperature, the warmest times of the year are from June to August, when average high temperatures typically range from around 73°F or 23°C",
"name": null, "name": null,
"role": "assistant", "role": "assistant",
"tool_calls": null "tool_calls": null
...@@ -13,14 +13,14 @@ ...@@ -13,14 +13,14 @@
"usage": null "usage": null
} }
], ],
"created": 1716553098, "created": 1724792495,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "chat.completion",
"system_fingerprint": "2.0.5-dev0-native", "system_fingerprint": "2.2.1-dev0-native",
"usage": { "usage": {
"completion_tokens": 100, "completion_tokens": 100,
"prompt_tokens": 62, "prompt_tokens": 61,
"total_tokens": 162 "total_tokens": 161
} }
} }
...@@ -8,11 +8,11 @@ ...@@ -8,11 +8,11 @@
"text": "\n" "text": "\n"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -23,11 +23,11 @@ ...@@ -23,11 +23,11 @@
"text": "\n" "text": "\n"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -38,11 +38,11 @@ ...@@ -38,11 +38,11 @@
"text": "\n" "text": "\n"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -53,11 +53,11 @@ ...@@ -53,11 +53,11 @@
"text": "hd" "text": "hd"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -68,11 +68,11 @@ ...@@ -68,11 +68,11 @@
"text": "\n" "text": "\n"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -83,11 +83,11 @@ ...@@ -83,11 +83,11 @@
"text": "\n" "text": "\n"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -98,11 +98,11 @@ ...@@ -98,11 +98,11 @@
"text": "\n" "text": "\n"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -113,11 +113,11 @@ ...@@ -113,11 +113,11 @@
"text": "aho" "text": "aho"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -128,11 +128,11 @@ ...@@ -128,11 +128,11 @@
"text": "2" "text": "2"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -143,11 +143,11 @@ ...@@ -143,11 +143,11 @@
"text": "2" "text": "2"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -158,11 +158,11 @@ ...@@ -158,11 +158,11 @@
"text": "2" "text": "2"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -173,11 +173,11 @@ ...@@ -173,11 +173,11 @@
"text": "ima" "text": "ima"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -188,11 +188,11 @@ ...@@ -188,11 +188,11 @@
"text": "." "text": "."
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -203,11 +203,11 @@ ...@@ -203,11 +203,11 @@
"text": "." "text": "."
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -218,11 +218,11 @@ ...@@ -218,11 +218,11 @@
"text": "." "text": "."
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -233,11 +233,11 @@ ...@@ -233,11 +233,11 @@
"text": "\n" "text": "\n"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -248,11 +248,11 @@ ...@@ -248,11 +248,11 @@
"text": " Sarah" "text": " Sarah"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -263,11 +263,11 @@ ...@@ -263,11 +263,11 @@
"text": " Yes" "text": " Yes"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -278,11 +278,11 @@ ...@@ -278,11 +278,11 @@
"text": " And" "text": " And"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -293,11 +293,11 @@ ...@@ -293,11 +293,11 @@
"text": "i" "text": "i"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -308,11 +308,11 @@ ...@@ -308,11 +308,11 @@
"text": "'" "text": "'"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -323,11 +323,11 @@ ...@@ -323,11 +323,11 @@
"text": "," "text": ","
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -338,11 +338,11 @@ ...@@ -338,11 +338,11 @@
"text": " what" "text": " what"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -353,11 +353,11 @@ ...@@ -353,11 +353,11 @@
"text": "'" "text": "'"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -368,11 +368,11 @@ ...@@ -368,11 +368,11 @@
"text": "s" "text": "s"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -383,11 +383,11 @@ ...@@ -383,11 +383,11 @@
"text": " Moh" "text": " Moh"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -398,11 +398,11 @@ ...@@ -398,11 +398,11 @@
"text": " is" "text": " is"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -413,11 +413,11 @@ ...@@ -413,11 +413,11 @@
"text": "m" "text": "m"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -428,11 +428,11 @@ ...@@ -428,11 +428,11 @@
"text": " Room" "text": " Room"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -443,11 +443,11 @@ ...@@ -443,11 +443,11 @@
"text": "s" "text": "s"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -458,11 +458,11 @@ ...@@ -458,11 +458,11 @@
"text": " the" "text": " the"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -473,11 +473,11 @@ ...@@ -473,11 +473,11 @@
"text": " tired" "text": " tired"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -488,11 +488,11 @@ ...@@ -488,11 +488,11 @@
"text": ":" "text": ":"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -503,11 +503,11 @@ ...@@ -503,11 +503,11 @@
"text": "'" "text": "'"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -518,11 +518,11 @@ ...@@ -518,11 +518,11 @@
"text": " capital" "text": " capital"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
...@@ -530,73 +530,73 @@ ...@@ -530,73 +530,73 @@
"finish_reason": "", "finish_reason": "",
"index": 3, "index": 3,
"logprobs": null, "logprobs": null,
"text": " of" "text": ","
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
{ {
"finish_reason": "", "finish_reason": "length",
"index": 0, "index": 0,
"logprobs": null, "logprobs": null,
"text": " She" "text": " She"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
{ {
"finish_reason": "", "finish_reason": "length",
"index": 1, "index": 1,
"logprobs": null, "logprobs": null,
"text": " scale" "text": " scale"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
{ {
"finish_reason": "", "finish_reason": "length",
"index": 2, "index": 2,
"logprobs": null, "logprobs": null,
"text": " of" "text": " of"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
}, },
{ {
"choices": [ "choices": [
{ {
"finish_reason": "", "finish_reason": "length",
"index": 3, "index": 3,
"logprobs": null, "logprobs": null,
"text": " being" "text": " its"
} }
], ],
"created": 1713284431, "created": 1724833943,
"id": "", "id": "",
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"object": "text_completion", "object": "text_completion",
"system_fingerprint": "2.0.1-native" "system_fingerprint": "2.2.1-dev0-native"
} }
] ]
...@@ -16,7 +16,7 @@ ...@@ -16,7 +16,7 @@
}, },
{ {
"id": 3102, "id": 3102,
"logprob": -11.1875, "logprob": -11.25,
"text": " request" "text": " request"
} }
], ],
...@@ -24,66 +24,66 @@ ...@@ -24,66 +24,66 @@
"tokens": [ "tokens": [
{ {
"id": 185, "id": 185,
"logprob": -1.5546875, "logprob": -1.546875,
"special": false, "special": false,
"text": "\n" "text": "\n"
}, },
{ {
"id": 549, "id": 549,
"logprob": -2.84375, "logprob": -2.859375,
"special": false, "special": false,
"text": "The" "text": "The"
}, },
{ {
"id": 1727, "id": 1727,
"logprob": -2.34375, "logprob": -2.484375,
"special": false, "special": false,
"text": " test" "text": " test"
}, },
{ {
"id": 3102, "id": 3102,
"logprob": -0.8359375, "logprob": -0.83203125,
"special": false, "special": false,
"text": " request" "text": " request"
}, },
{ {
"id": 317, "id": 317,
"logprob": -1.0859375, "logprob": -1.1484375,
"special": false, "special": false,
"text": " is" "text": " is"
}, },
{ {
"id": 254, "id": 245,
"logprob": -1.5390625, "logprob": -1.578125,
"special": false, "special": false,
"text": " the" "text": " a"
}, },
{ {
"id": 1022, "id": 3412,
"logprob": -1.1875, "logprob": -2.578125,
"special": false, "special": false,
"text": " first" "text": " document"
}, },
{ {
"id": 3458, "id": 344,
"logprob": -0.35546875, "logprob": -1.125,
"special": false, "special": false,
"text": " step" "text": " that"
}, },
{ {
"id": 279, "id": 317,
"logprob": -0.8828125, "logprob": -1.6953125,
"special": false, "special": false,
"text": " in" "text": " is"
}, },
{ {
"id": 254, "id": 1222,
"logprob": -0.71484375, "logprob": -1.71875,
"special": false, "special": false,
"text": " the" "text": " used"
} }
], ],
"top_tokens": null "top_tokens": null
}, },
"generated_text": "\nThe test request is the first step in the" "generated_text": "\nThe test request is a document that is used"
} }
...@@ -37,56 +37,56 @@ ...@@ -37,56 +37,56 @@
}, },
{ {
"id": 1727, "id": 1727,
"logprob": -2.359375, "logprob": -2.4375,
"special": false, "special": false,
"text": " test" "text": " test"
}, },
{ {
"id": 3102, "id": 3102,
"logprob": -0.83203125, "logprob": -0.83984375,
"special": false, "special": false,
"text": " request" "text": " request"
}, },
{ {
"id": 317, "id": 317,
"logprob": -1.125, "logprob": -1.1328125,
"special": false, "special": false,
"text": " is" "text": " is"
}, },
{ {
"id": 245, "id": 254,
"logprob": -1.5703125, "logprob": -1.515625,
"special": false, "special": false,
"text": " a" "text": " the"
}, },
{ {
"id": 3412, "id": 1022,
"logprob": -2.578125, "logprob": -1.15625,
"special": false, "special": false,
"text": " document" "text": " first"
}, },
{ {
"id": 344, "id": 3458,
"logprob": -1.125, "logprob": -0.3671875,
"special": false, "special": false,
"text": " that" "text": " step"
}, },
{ {
"id": 317, "id": 279,
"logprob": -1.6953125, "logprob": -0.88671875,
"special": false, "special": false,
"text": " is" "text": " in"
}, },
{ {
"id": 1222, "id": 254,
"logprob": -1.75, "logprob": -0.69140625,
"special": false, "special": false,
"text": " used" "text": " the"
} }
], ],
"top_tokens": null "top_tokens": null
}, },
"generated_text": "\nThe test request is a document that is used" "generated_text": "\nThe test request is the first step in the"
}, },
{ {
"details": { "details": {
...@@ -126,56 +126,56 @@ ...@@ -126,56 +126,56 @@
}, },
{ {
"id": 1727, "id": 1727,
"logprob": -2.359375, "logprob": -2.4375,
"special": false, "special": false,
"text": " test" "text": " test"
}, },
{ {
"id": 3102, "id": 3102,
"logprob": -0.83203125, "logprob": -0.83984375,
"special": false, "special": false,
"text": " request" "text": " request"
}, },
{ {
"id": 317, "id": 317,
"logprob": -1.125, "logprob": -1.1328125,
"special": false, "special": false,
"text": " is" "text": " is"
}, },
{ {
"id": 245, "id": 254,
"logprob": -1.5703125, "logprob": -1.515625,
"special": false, "special": false,
"text": " a" "text": " the"
}, },
{ {
"id": 3412, "id": 1022,
"logprob": -2.578125, "logprob": -1.15625,
"special": false, "special": false,
"text": " document" "text": " first"
}, },
{ {
"id": 344, "id": 3458,
"logprob": -1.125, "logprob": -0.3671875,
"special": false, "special": false,
"text": " that" "text": " step"
}, },
{ {
"id": 317, "id": 279,
"logprob": -1.6953125, "logprob": -0.88671875,
"special": false, "special": false,
"text": " is" "text": " in"
}, },
{ {
"id": 1222, "id": 254,
"logprob": -1.75, "logprob": -0.69140625,
"special": false, "special": false,
"text": " used" "text": " the"
} }
], ],
"top_tokens": null "top_tokens": null
}, },
"generated_text": "\nThe test request is a document that is used" "generated_text": "\nThe test request is the first step in the"
}, },
{ {
"details": { "details": {
...@@ -215,56 +215,56 @@ ...@@ -215,56 +215,56 @@
}, },
{ {
"id": 1727, "id": 1727,
"logprob": -2.359375, "logprob": -2.4375,
"special": false, "special": false,
"text": " test" "text": " test"
}, },
{ {
"id": 3102, "id": 3102,
"logprob": -0.83203125, "logprob": -0.83984375,
"special": false, "special": false,
"text": " request" "text": " request"
}, },
{ {
"id": 317, "id": 317,
"logprob": -1.125, "logprob": -1.1328125,
"special": false, "special": false,
"text": " is" "text": " is"
}, },
{ {
"id": 245, "id": 254,
"logprob": -1.5703125, "logprob": -1.515625,
"special": false, "special": false,
"text": " a" "text": " the"
}, },
{ {
"id": 3412, "id": 1022,
"logprob": -2.578125, "logprob": -1.15625,
"special": false, "special": false,
"text": " document" "text": " first"
}, },
{ {
"id": 344, "id": 3458,
"logprob": -1.125, "logprob": -0.3671875,
"special": false, "special": false,
"text": " that" "text": " step"
}, },
{ {
"id": 317, "id": 279,
"logprob": -1.6953125, "logprob": -0.88671875,
"special": false, "special": false,
"text": " is" "text": " in"
}, },
{ {
"id": 1222, "id": 254,
"logprob": -1.75, "logprob": -0.69140625,
"special": false, "special": false,
"text": " used" "text": " the"
} }
], ],
"top_tokens": null "top_tokens": null
}, },
"generated_text": "\nThe test request is a document that is used" "generated_text": "\nThe test request is the first step in the"
}, },
{ {
"details": { "details": {
...@@ -304,55 +304,55 @@ ...@@ -304,55 +304,55 @@
}, },
{ {
"id": 1727, "id": 1727,
"logprob": -2.359375, "logprob": -2.4375,
"special": false, "special": false,
"text": " test" "text": " test"
}, },
{ {
"id": 3102, "id": 3102,
"logprob": -0.83203125, "logprob": -0.83984375,
"special": false, "special": false,
"text": " request" "text": " request"
}, },
{ {
"id": 317, "id": 317,
"logprob": -1.125, "logprob": -1.1328125,
"special": false, "special": false,
"text": " is" "text": " is"
}, },
{ {
"id": 245, "id": 254,
"logprob": -1.5703125, "logprob": -1.515625,
"special": false, "special": false,
"text": " a" "text": " the"
}, },
{ {
"id": 3412, "id": 1022,
"logprob": -2.578125, "logprob": -1.15625,
"special": false, "special": false,
"text": " document" "text": " first"
}, },
{ {
"id": 344, "id": 3458,
"logprob": -1.125, "logprob": -0.3671875,
"special": false, "special": false,
"text": " that" "text": " step"
}, },
{ {
"id": 317, "id": 279,
"logprob": -1.6953125, "logprob": -0.88671875,
"special": false, "special": false,
"text": " is" "text": " in"
}, },
{ {
"id": 1222, "id": 254,
"logprob": -1.75, "logprob": -0.69140625,
"special": false, "special": false,
"text": " used" "text": " the"
} }
], ],
"top_tokens": null "top_tokens": null
}, },
"generated_text": "\nThe test request is a document that is used" "generated_text": "\nThe test request is the first step in the"
} }
] ]
{ {
"details": { "details": {
"best_of_sequences": null, "best_of_sequences": null,
"finish_reason": "length", "finish_reason": "stop_sequence",
"generated_tokens": 10, "generated_tokens": 5,
"prefill": [ "prefill": [
{ {
"id": 128000, "id": 128000,
...@@ -16,7 +16,7 @@ ...@@ -16,7 +16,7 @@
}, },
{ {
"id": 1715, "id": 1715,
"logprob": -10.375, "logprob": -10.4375,
"text": " request" "text": " request"
} }
], ],
...@@ -29,61 +29,31 @@ ...@@ -29,61 +29,31 @@
"text": ":" "text": ":"
}, },
{ {
"id": 2209, "id": 923,
"logprob": -2.78125, "logprob": -2.84375,
"special": false, "special": false,
"text": " Is" "text": " add"
}, },
{ {
"id": 279, "id": 264,
"logprob": -0.6328125, "logprob": 0.0,
"special": false, "special": false,
"text": " the" "text": " a"
},
{
"id": 734,
"logprob": -2.703125,
"special": false,
"text": " function"
}, },
{ {
"id": 330, "id": 330,
"logprob": -0.34179688, "logprob": -0.31640625,
"special": false, "special": false,
"text": " \"" "text": " \""
}, },
{ {
"id": 4110, "id": 1985,
"logprob": -2.359375, "logprob": 0.0,
"special": false,
"text": "Create"
},
{
"id": 7575,
"logprob": -2.1875,
"special": false,
"text": "Process"
},
{
"id": 1,
"logprob": -0.07910156,
"special": false,
"text": "\""
},
{
"id": 304,
"logprob": -0.83203125,
"special": false,
"text": " in"
},
{
"id": 12468,
"logprob": -1.8203125,
"special": false, "special": false,
"text": " Win" "text": "test"
} }
], ],
"top_tokens": null "top_tokens": null
}, },
"generated_text": "Test request: Is the function \"CreateProcess\" in Win" "generated_text": "Test request: add a \"test"
} }
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment