_toctree.yml 2.18 KB
Newer Older
1
2
3
4
5
- sections:
  - local: index
    title: Text Generation Inference
  - local: quicktour
    title: Quick Tour
fxmarty's avatar
fxmarty committed
6
7
8
9
10
11
12
13
  - local: installation_nvidia
    title: Using TGI with Nvidia GPUs
  - local: installation_amd
    title: Using TGI with AMD GPUs
  - local: installation_gaudi
    title: Using TGI with Intel Gaudi
  - local: installation_inferentia
    title: Using TGI with AWS Inferentia
Wang, Yi's avatar
Wang, Yi committed
14
15
  - local: installation_intel
    title: Using TGI with Intel GPUs
16
  - local: installation
fxmarty's avatar
fxmarty committed
17
    title: Installation from source
18
19
  - local: supported_models
    title: Supported Models and Hardware
20
21
  - local: messages_api
    title: Messages API
22
23
  - local: architecture
    title: Internal Architecture
24
25
  - local: usage_statistics
    title: Usage Statistics
26
27
28
29
30
31
32
33
  title: Getting started
- sections:
  - local: basic_tutorials/consuming_tgi
    title: Consuming TGI
  - local: basic_tutorials/preparing_model
    title: Preparing Model for Serving
  - local: basic_tutorials/gated_model_access
    title: Serving Private & Gated Models
Merve Noyan's avatar
Merve Noyan committed
34
35
  - local: basic_tutorials/using_cli
    title: Using TGI CLI
Merve Noyan's avatar
Merve Noyan committed
36
  - local: basic_tutorials/launcher
37
    title: All TGI CLI options
38
39
  - local: basic_tutorials/non_core_models
    title: Non-core Model Serving
40
41
  - local: basic_tutorials/safety
    title: Safety
42
43
  - local: basic_tutorials/using_guidance
    title: Using Guidance, JSON, tools
44
45
  - local: basic_tutorials/visual_language_models
    title: Visual Language Models
46
47
  - local: basic_tutorials/monitoring
    title: Monitoring TGI with Prometheus and Grafana
48
49
  - local: basic_tutorials/train_medusa
    title: Train Medusa
50
  title: Tutorials
Omar Sanseviero's avatar
Omar Sanseviero committed
51
52
53
- sections:
  - local: conceptual/streaming
    title: Streaming
Merve Noyan's avatar
Merve Noyan committed
54
55
  - local: conceptual/quantization
    title: Quantization
56
57
  - local: conceptual/tensor_parallelism
    title: Tensor Parallelism
58
59
  - local: conceptual/paged_attention
    title: PagedAttention
60
61
  - local: conceptual/safetensors
    title: Safetensors
62
63
  - local: conceptual/flash_attention
    title: Flash Attention
Nicolas Patry's avatar
Nicolas Patry committed
64
65
66
  - local: conceptual/speculation
    title: Speculation (Medusa, ngram)
  - local: conceptual/guidance
drbh's avatar
drbh committed
67
68
69
70
    title: How Guidance Works (via outlines
  - local: conceptual/lora
    title: LoRA (Low-Rank Adaptation)

71

Omar Sanseviero's avatar
Omar Sanseviero committed
72
  title: Conceptual Guides