@@ -20,7 +20,7 @@ Please refer to the [GPU docs](./gpu.md).
## How can I specify the context window size?
By default, Ollama uses a context window size of 4096 tokens, unless you have a single GPU with <= 4 GB of VRAM, in which case it will default to 2048 tokens.
By default, Ollama uses a context window size of 2048 tokens.
This can be overridden with the `OLLAMA_CONTEXT_LENGTH` environment variable. For example, to set the default context window to 8K, use:
"OLLAMA_ORIGINS":{"OLLAMA_ORIGINS",AllowedOrigins(),"A comma separated list of allowed origins"},
"OLLAMA_SCHED_SPREAD":{"OLLAMA_SCHED_SPREAD",SchedSpread(),"Always schedule model across all GPUs"},
"OLLAMA_MULTIUSER_CACHE":{"OLLAMA_MULTIUSER_CACHE",MultiUserCache(),"Optimize prompt caching for multi-user scenarios"},
"OLLAMA_CONTEXT_LENGTH":{"OLLAMA_CONTEXT_LENGTH",ContextLength(),"Context length to use unless otherwise specified (default 4096 or 2048 with low VRAM)"},
"OLLAMA_CONTEXT_LENGTH":{"OLLAMA_CONTEXT_LENGTH",ContextLength(),"Context length to use unless otherwise specified (default: 2048)"},
"OLLAMA_NEW_ENGINE":{"OLLAMA_NEW_ENGINE",NewEngine(),"Enable the new Ollama engine"},