Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
ollama
Commits
f14aa543
Unverified
Commit
f14aa543
authored
Jul 22, 2024
by
Daniel Hiltgen
Committed by
GitHub
Jul 22, 2024
Browse files
Merge pull request #5855 from dhiltgen/remove_max_vram
Remove no longer supported max vram var
parents
f8fedbda
cc269ba0
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
2 additions
and
16 deletions
+2
-16
cmd/cmd.go
cmd/cmd.go
+0
-1
envconfig/config.go
envconfig/config.go
+0
-13
integration/concurrency_test.go
integration/concurrency_test.go
+2
-2
No files found.
cmd/cmd.go
View file @
f14aa543
...
...
@@ -1344,7 +1344,6 @@ func NewCLI() *cobra.Command {
envVars
[
"OLLAMA_TMPDIR"
],
envVars
[
"OLLAMA_FLASH_ATTENTION"
],
envVars
[
"OLLAMA_LLM_LIBRARY"
],
envVars
[
"OLLAMA_MAX_VRAM"
],
})
default
:
appendEnvDocs
(
cmd
,
envs
)
...
...
envconfig/config.go
View file @
f14aa543
...
...
@@ -43,8 +43,6 @@ var (
MaxRunners
int
// Set via OLLAMA_MAX_QUEUE in the environment
MaxQueuedRequests
int
// Set via OLLAMA_MAX_VRAM in the environment
MaxVRAM
uint64
// Set via OLLAMA_MODELS in the environment
ModelsDir
string
// Set via OLLAMA_NOHISTORY in the environment
...
...
@@ -89,7 +87,6 @@ func AsMap() map[string]EnvVar {
"OLLAMA_LLM_LIBRARY"
:
{
"OLLAMA_LLM_LIBRARY"
,
LLMLibrary
,
"Set LLM library to bypass autodetection"
},
"OLLAMA_MAX_LOADED_MODELS"
:
{
"OLLAMA_MAX_LOADED_MODELS"
,
MaxRunners
,
"Maximum number of loaded models per GPU"
},
"OLLAMA_MAX_QUEUE"
:
{
"OLLAMA_MAX_QUEUE"
,
MaxQueuedRequests
,
"Maximum number of queued requests"
},
"OLLAMA_MAX_VRAM"
:
{
"OLLAMA_MAX_VRAM"
,
MaxVRAM
,
"Maximum VRAM"
},
"OLLAMA_MODELS"
:
{
"OLLAMA_MODELS"
,
ModelsDir
,
"The path to the models directory"
},
"OLLAMA_NOHISTORY"
:
{
"OLLAMA_NOHISTORY"
,
NoHistory
,
"Do not preserve readline history"
},
"OLLAMA_NOPRUNE"
:
{
"OLLAMA_NOPRUNE"
,
NoPrune
,
"Do not prune model blobs on startup"
},
...
...
@@ -194,16 +191,6 @@ func LoadConfig() {
TmpDir
=
clean
(
"OLLAMA_TMPDIR"
)
userLimit
:=
clean
(
"OLLAMA_MAX_VRAM"
)
if
userLimit
!=
""
{
avail
,
err
:=
strconv
.
ParseUint
(
userLimit
,
10
,
64
)
if
err
!=
nil
{
slog
.
Error
(
"invalid setting, ignoring"
,
"OLLAMA_MAX_VRAM"
,
userLimit
,
"error"
,
err
)
}
else
{
MaxVRAM
=
avail
}
}
LLMLibrary
=
clean
(
"OLLAMA_LLM_LIBRARY"
)
if
onp
:=
clean
(
"OLLAMA_NUM_PARALLEL"
);
onp
!=
""
{
...
...
integration/concurrency_test.go
View file @
f14aa543
...
...
@@ -69,7 +69,7 @@ func TestIntegrationConcurrentPredictOrcaMini(t *testing.T) {
reqLimit
:=
len
(
req
)
iterLimit
:=
5
vram
:=
os
.
Getenv
(
"OLLAMA_MAX_VRAM"
)
vram
:=
os
.
Getenv
(
"OLLAMA_MAX_VRAM"
)
// TODO - discover actual VRAM
if
vram
!=
""
{
max
,
err
:=
strconv
.
ParseUint
(
vram
,
10
,
64
)
require
.
NoError
(
t
,
err
)
...
...
@@ -106,7 +106,7 @@ func TestIntegrationConcurrentPredictOrcaMini(t *testing.T) {
// Stress the system if we know how much VRAM it has, and attempt to load more models than will fit
func
TestMultiModelStress
(
t
*
testing
.
T
)
{
vram
:=
os
.
Getenv
(
"OLLAMA_MAX_VRAM"
)
vram
:=
os
.
Getenv
(
"OLLAMA_MAX_VRAM"
)
// TODO - discover actual VRAM
if
vram
==
""
{
t
.
Skip
(
"OLLAMA_MAX_VRAM not specified, can't pick the right models for the stress test"
)
}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment