Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
nni
Commits
3d2abd4a
Unverified
Commit
3d2abd4a
authored
Aug 26, 2020
by
SparkSnail
Committed by
GitHub
Aug 26, 2020
Browse files
Fix remote & kubeflow it (#2828)
parent
9f44d54a
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
14 additions
and
13 deletions
+14
-13
src/nni_manager/training_service/remote_machine/remoteMachineTrainingService.ts
...ng_service/remote_machine/remoteMachineTrainingService.ts
+13
-12
test/config/training_service.yml
test/config/training_service.yml
+1
-1
No files found.
src/nni_manager/training_service/remote_machine/remoteMachineTrainingService.ts
View file @
3d2abd4a
...
...
@@ -89,6 +89,19 @@ class RemoteMachineTrainingService implements TrainingService {
this
.
sshConnectionPromises
=
[];
// initialize gpuScheduler
this
.
gpuScheduler
=
new
GPUScheduler
(
this
.
machineExecutorManagerMap
);
if
(
this
.
trialConfig
===
undefined
)
{
throw
new
Error
(
"
trial config not initialized!
"
);
}
// Copy codeDir to remote machine
for
(
const
[
rmMeta
,
executorManager
]
of
this
.
machineExecutorManagerMap
.
entries
())
{
const
executor
:
ShellExecutor
=
await
executorManager
.
getExecutor
(
this
.
initExecutorId
);
if
(
executor
!==
undefined
)
{
this
.
machineCopyExpCodeDirPromiseMap
.
set
(
rmMeta
,
executor
.
copyDirectoryToRemote
(
this
.
trialConfig
.
codeDir
,
executor
.
getRemoteCodePath
(
getExperimentId
()))
);
}
}
}
while
(
!
this
.
stopping
)
{
while
(
this
.
jobQueue
.
length
>
0
)
{
...
...
@@ -328,20 +341,8 @@ class RemoteMachineTrainingService implements TrainingService {
try
{
// Validate to make sure codeDir doesn't have too many files
await
validateCodeDir
(
remoteMachineTrailConfig
.
codeDir
);
// Copy codeDir to remote machine
for
(
const
[
rmMeta
,
executorManager
]
of
this
.
machineExecutorManagerMap
.
entries
())
{
const
executor
:
ShellExecutor
=
await
executorManager
.
getExecutor
(
this
.
initExecutorId
);
if
(
executor
!==
undefined
)
{
this
.
machineCopyExpCodeDirPromiseMap
.
set
(
rmMeta
,
executor
.
copyDirectoryToRemote
(
remoteMachineTrailConfig
.
codeDir
,
executor
.
getRemoteCodePath
(
getExperimentId
()))
);
}
}
}
catch
(
error
)
{
this
.
log
.
error
(
error
);
return
Promise
.
reject
(
new
Error
(
error
));
}
...
...
test/config/training_service.yml
View file @
3d2abd4a
...
...
@@ -10,7 +10,7 @@ kubeflow:
kubeflowConfig
:
operator
:
tf-operator
apiVersion
:
v1
alpha2
apiVersion
:
v1
storage
:
azureStorage
keyVault
:
vaultName
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment