Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
b105f2c6
Commit
b105f2c6
authored
Aug 21, 2020
by
Morgan Funtowicz
Browse files
Update ONNX doc to match the removal of --optimize argument.
Signed-off-by:
Morgan Funtowicz
<
funtowiczmo@gmail.com
>
parent
e5f45227
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
9 additions
and
9 deletions
+9
-9
docs/source/serialization.rst
docs/source/serialization.rst
+9
-9
No files found.
docs/source/serialization.rst
View file @
b105f2c6
...
@@ -52,15 +52,17 @@ Below are some of the operators which can be enabled to speed up inference throu
...
@@ -52,15 +52,17 @@ Below are some of the operators which can be enabled to speed up inference throu
*
Skip
connection
LayerNormalization
fusing
*
Skip
connection
LayerNormalization
fusing
*
FastGeLU
approximation
*
FastGeLU
approximation
Some
of
the
optimizations
performed
by
ONNX
runtime
can
be
hardware
specific
and
thus
lead
to
different
performances
if
used
on
another
machine
with
a
different
hardware
configuration
than
the
one
used
for
exporting
the
model
.
For
this
reason
,
when
using
``
convert_graph_to_onnx
.
py
``
optimizations
are
not
enabled
,
ensuring
the
model
can
be
easily
exported
to
various
hardware
.
Optimizations
can
then
be
enabled
when
loading
the
model
through
ONNX
runtime
for
inference
.
Fortunately
,
you
can
let
ONNXRuntime
find
all
the
possible
optimized
operators
for
you
.
Simply
add
``--
optimize
``
when
exporting
your
model
through
``
convert_graph_to_onnx
.
py
``.
Example
:
..
note
::
When
quantization
is
enabled
(
see
below
),
``
convert_graph_to_onnx
.
py
``
script
will
enable
optimizations
on
the
model
..
code
-
block
::
bash
because
quantization
would
modify
the
underlying
graph
making
it
impossible
for
ONNX
runtime
to
do
the
optimizations
afterwards
.
python
convert_graph_to_onnx
.
py
--
framework
<
pt
,
tf
>
--
model
bert
-
base
-
cased
--
optimize
bert
-
base
-
cased
.
onnx
..
note
::
..
note
::
For
more
information
about
the
optimizations
enabled
by
ONNXRuntime
,
please
have
a
look
at
the
(`
ONNXRuntime
Github
<
https
://
github
.
com
/
microsoft
/
onnxruntime
/
tree
/
master
/
onnxruntime
/
python
/
tools
/
transformers
>`
_
)
For
more
information
about
the
optimizations
enabled
by
ONNXRuntime
,
please
have
a
look
at
the
(`
ONNXRuntime
Github
<
https
://
github
.
com
/
microsoft
/
onnxruntime
/
tree
/
master
/
onnxruntime
/
python
/
tools
/
transformers
>`
_
)
...
@@ -112,8 +114,6 @@ Example of quantized BERT model export:
...
@@ -112,8 +114,6 @@ Example of quantized BERT model export:
above
command
will
contain
the
original
ONNX
model
storing
`
float32
`
weights
.
above
command
will
contain
the
original
ONNX
model
storing
`
float32
`
weights
.
The
second
one
,
with
``-
quantized
``
suffix
,
will
hold
the
quantized
parameters
.
The
second
one
,
with
``-
quantized
``
suffix
,
will
hold
the
quantized
parameters
.
..
note
::
The
quantization
export
gives
the
best
performances
when
used
in
combination
with
``--
optimize
``.
TorchScript
TorchScript
=======================================
=======================================
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment