Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
text-generation-inference
Commits
fa43fb71
Commit
fa43fb71
authored
Nov 08, 2022
by
OlivierDehaene
Browse files
fix(server): Fix Transformers fork version
parent
4236e41b
Changes
5
Hide whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
16 additions
and
13 deletions
+16
-13
.dockerignore
.dockerignore
+2
-1
Dockerfile
Dockerfile
+2
-0
aml/deployment.yaml
aml/deployment.yaml
+5
-5
server/Makefile
server/Makefile
+6
-6
server/text_generation/models/bloom.py
server/text_generation/models/bloom.py
+1
-1
No files found.
.dockerignore
View file @
fa43fb71
aml
target
\ No newline at end of file
target
server/transformers
\ No newline at end of file
Dockerfile
View file @
fa43fb71
...
...
@@ -2,6 +2,7 @@ FROM rust:1.64 as router-builder
WORKDIR
/usr/src
COPY
rust-toolchain.toml rust-toolchain.toml
COPY
proto proto
COPY
router router
...
...
@@ -13,6 +14,7 @@ FROM rust:1.64 as launcher-builder
WORKDIR
/usr/src
COPY
rust-toolchain.toml rust-toolchain.toml
COPY
launcher launcher
WORKDIR
/usr/src/launcher
...
...
aml/deployment.yaml
View file @
fa43fb71
...
...
@@ -8,7 +8,7 @@ environment_variables:
MODEL_NAME
:
bigscience/bloom
NUM_GPUS
:
8
environment
:
image
:
db4c2190dd824d1f950f5d1555fbadf0.azurecr.io/text-generation-inference:0.
2
image
:
db4c2190dd824d1f950f5d1555fbadf0.azurecr.io/text-generation-inference:0.
3.1
inference_config
:
liveness_route
:
port
:
3000
...
...
@@ -25,14 +25,14 @@ request_settings:
max_concurrent_requests_per_instance
:
256
liveness_probe
:
initial_delay
:
600
timeout
:
2
0
timeout
:
9
0
period
:
120
success_threshold
:
1
failure_threshold
:
3
failure_threshold
:
5
readiness_probe
:
initial_delay
:
600
timeout
:
2
0
timeout
:
9
0
period
:
120
success_threshold
:
1
failure_threshold
:
3
failure_threshold
:
5
instance_count
:
1
server/Makefile
View file @
fa43fb71
...
...
@@ -7,13 +7,13 @@ gen-server:
touch
text_generation/pb/__init__.py
install-transformers
:
# Install specific version of transformers
# Install specific version of transformers
with custom cuda kernels
rm
transformers
||
true
rm
transformers-
7302a24535e8dc5637ea5b4e4572fc971d404098
||
true
curl
-L
-O
https://github.com/OlivierDehaene/transformers/archive/
7302a24535e8dc5637ea5b4e4572fc971d404098
.zip
unzip
7302a24535e8dc5637ea5b4e4572fc971d404098
.zip
rm
7302a24535e8dc5637ea5b4e4572fc971d404098
.zip
mv
transformers-
7302a24535e8dc5637ea5b4e4572fc971d404098
transformers
rm
transformers-
b55f16c5b71aeef47a66a4270e19c154f050a7a7
||
true
curl
-L
-O
https://github.com/OlivierDehaene/transformers/archive/
b55f16c5b71aeef47a66a4270e19c154f050a7a7
.zip
unzip
b55f16c5b71aeef47a66a4270e19c154f050a7a7
.zip
rm
b55f16c5b71aeef47a66a4270e19c154f050a7a7
.zip
mv
transformers-
b55f16c5b71aeef47a66a4270e19c154f050a7a7
transformers
cd
transformers
&&
python setup.py
install
install-torch
:
...
...
server/text_generation/models/bloom.py
View file @
fa43fb71
...
...
@@ -38,7 +38,7 @@ class BLOOMSharded(CausalLM):
self
.
master
=
self
.
rank
==
0
if
torch
.
cuda
.
is_available
():
device
=
torch
.
device
(
f
"cuda:
{
self
.
rank
}
"
)
dtype
=
torch
.
float16
dtype
=
torch
.
b
float16
else
:
device
=
torch
.
device
(
"cpu"
)
dtype
=
torch
.
float32
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment